Natural Language Processing is an interdisciplinary concept that takes the fundamentals of computational linguistics and Artificial Intelligence to understand how human languages interact with technology.
NLP requires an in-depth understanding of various terminologies and concepts to apply them tangibly to real-world scenarios. Some of these basic concepts include Part-of-Speech(POS) Tagging, Statistical Language Modeling, Syntactic, Semantic and Sentiment Analysis, Normalization, Tokenization, Dependency Parsing, and Constituency Parsing, among others.
In this article, we will look at the fundamentals of Dependency Parsing to gain perspective on how it is implemented in NLP.
Dependency Parsing (DP) refers to examining the dependencies between the words of a sentence to analyze its grammatical structure. Based on this, a sentence is broken into several components. The mechanism is based on the concept that there is a direct link between every linguistic unit of a sentence. These links are termed dependencies.
Let’s take for example the sentence “I prefer the morning flight through Denver.”
The diagram below explains the dependency structure of the sentence:
The relations between every linguistic unit, or word, of the sentence, is indicated using directed arcs in a typed dependency structure. As labelled in the diagram, the root of the tree “prefer” forms the head of the above sentence.
The relationship between any two words is marked by a dependency tag. For instance, the word “flight” modifies the meaning of the noun “Denver.” Therefore, you can notice a dependency from flight -> Denver where the flight is the head and Denver is the child or dependent. It is denoted by nmod which represents a nominal modifier.
This forms the case for dependency between every two words where one acts as the head and the other is the dependent. Currently, the Universal Dependency V2 taxonomy consists of 37 universal syntactic relations as specified in the table below:
|acl||clausal modifier of a noun (adnominal clause)|
|acl:relcl||relative clause modifier|
|advcl||adverbial clause modifier|
|advmod:emph||emphasizing word, intensifier|
|advmod:lmod||locative adverbial modifier|
|compound:lvc||light verb construction|
|compound:prt||phrasal verb particle|
|compound:svc||serial verb compounds|
|csubj:pass||clausal passive subject|
|det:numgov||pronominal quantifier governing the case of the noun|
|det:nummod||pronominal quantifier agreeing in case with the noun|
|expl:pass||reflexive pronoun used in reflexive passive|
|expl:pv||reflexive clitic with an inherently reflexive verb|
|fixed||fixed multiword expression|
|flat||flat multiword expression|
|nmod:poss||possessive nominal modifier|
|nsubj:pass||passive nominal subject|
|nummod:gov||numeric modifier governing the case of the noun|
|xcomp||open clausal complement|
Dependency Parsing using NLTK
Dependency Parsing can be carried out using the Natural Language Toolkit (NLTK) package which is a collection of libraries and codes used in the statistical Natural Language Processing (NLP) of human language.
We can use NLTK to achieve dependency parsing through one of the following methods:
- Probabilistic, projective dependency parser: These parsers use the knowledge of human language gleaned from hand-parsed sentences to predict new sentences. They are known to make mistakes and work with a restricted set of training data.
- Stanford parser: This is a natural language parser implemented on Java. You need the Stanford CoreNLP parser to perform dependency parsing. The parser includes several languages including English, Chinese, German, and Arabic.
Here’s how you can use the parser:
from nltk.parse.stanford import StanfordDependencyParser
path_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser.jar’
path_models_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar’
dep_parser = StanfordDependencyParser(
path_to_jar = path_jar, path_to_models_jar = path_models_jar
result = dep_parser.raw_parse(‘I shot an elephant in my sleep’)
dependency = result.next()
The output of the above program is as follows:
((u’shot’, u’VBD’), u’nsubj’, (u’I’, u’PRP’)),
((u’shot’, u’VBD’), u’dobj’, (u’elephant’, u’NN’)),
((u’elephant’, u’NN’), u’det’, (u’an’, u’DT’)),
((u’shot’, u’VBD’), u’prep’, (u’in’, u’IN’)),
((u’in’, u’IN’), u’pobj’, (u’sleep’, u’NN’)),
((u’sleep’, u’NN’), u’poss’, (u’my’, u’PRP$’))
Constituency Parsing is based on context-free grammars. Here, the parse tree includes sentences broken into sub-phrases, each belonging to a grammar category. Every linguistic unit or word in a sentence acts as a terminal node, which has its parent node and a part-of-speech tag.
For example, the phrase “a cat” and “a box under the bed” are noun phrases, whereas “write a letter” and “drive a car” are verb phrases.
Let’s consider an example sentence “I shot an elephant in my pyjamas.” Here is a graphical representation of the constituency parse tree:
The parse tree on the left refers to shooting an elephant wearing pyjamas and the parse tree on the right indicates the subject shooting an elephant while in his pyjamas.
The entire sentence is broken into sub-phases until we have terminal words remaining. VP denotes a verb phrase and NP denotes noun phrases.
Dependency Parsing vs Constituency Parsing
Constituency parsing can also be implemented using the Stanford parser. It essentially parses a given sentence as per the constituency parser and subsequently converts the constituency parse tree into a dependency tree.
If your goal is to break a sentence into sub-phrases, you should implement constituency parsing. However, if you want to explore the dependencies between the words in a sentence, you should use dependency parsing.
Checkout: NLP Project Ideas & Topics
If you found this article helpful, you should check out upGrad’s 6-month PG Certification in Machine Learning and NLP that offers personalised mentorship from industry experts of Flipkart, Gramener, and Zee5.
The program is designed for engineers, software/ IT, Data & other professionals looking to get a foothold in Data Science and Machine Learning. This Post Graduate Certification from IIIT BBangalorethe boasts a 58% average salary hike is all you need to land advanced positions of Data Analyst, Data Scientist, ML Engineer, and NLP Engineer in top companies. Block your seat today at just Rs. 3,034 per month!