Dependency Parsing in NLP [Explained with Examples]

Natural Language Processing is an interdisciplinary concept that takes the fundamentals of computational linguistics and Artificial Intelligence to understand how human languages interact with technology.

NLP requires an in-depth understanding of various terminologies and concepts to apply them tangibly to real-world scenarios. Some of these basic concepts include Part-of-Speech(POS) Tagging, Statistical Language Modeling, Syntactic, Semantic and Sentiment Analysis, Normalization, Tokenization, Dependency Parsing, and Constituency Parsing, among others. 

In this article, we will look at the fundamentals of Dependency Parsing to gain perspective on how it is implemented in NLP. 

Dependency Parsing

Dependency Parsing (DP) refers to examining the dependencies between the words of a sentence to analyze its grammatical structure. Based on this, a sentence is broken into several components. The mechanism is based on the concept that there is a direct link between every linguistic unit of a sentence. These links are termed dependencies. 

Let’s take for example the sentence I prefer the morning flight through Denver.”

The diagram below explains the dependency structure of the sentence: 

Source

The relations between every linguistic unit, or word, of the sentence, is indicated using directed arcs in a typed dependency structure. As labelled in the diagram, the root of the tree “prefer” forms the head of the above sentence. 

The relationship between any two words is marked by a dependency tag. For instance, the word “flight” modifies the meaning of the noun “Denver.” Therefore, you can notice a dependency from flight -> Denver where the flight is the head and Denver is the child or dependent. It is denoted by nmod which represents a nominal modifier. 

This forms the case for dependency between every two words where one acts as the head and the other is the dependent. Currently, the Universal Dependency V2 taxonomy consists of 37 universal syntactic relations as specified in the table below:

Dependency Tag Description
acl clausal modifier of a noun (adnominal clause)
acl:relcl relative clause modifier
advcl adverbial clause modifier
advmod adverbial modifier
advmod:emph emphasizing word, intensifier
advmod:lmod locative adverbial modifier
amod adjectival modifier
appos appositional modifier
aux auxiliary
aux:pass passive auxiliary
case case-marking
cc coordinating conjunction
cc:preconj preconjunct
ccomp clausal complement
clf classifier
compound compound
compound:lvc light verb construction
compound:prt phrasal verb particle
compound:redup reduplicated compounds
compound:svc serial verb compounds
conj conjunct
cop copula
csubj clausal subject
csubj:pass clausal passive subject
dep unspecified dependency
det determiner
det:numgov pronominal quantifier governing the case of the noun
det:nummod pronominal quantifier agreeing in case with the noun
det:poss possessive determiner
discourse discourse element
dislocated dislocated elements
expl expletive
expl:impers impersonal expletive
expl:pass reflexive pronoun used in reflexive passive
expl:pv reflexive clitic with an inherently reflexive verb
fixed fixed multiword expression
flat flat multiword expression
flat:foreign foreign words
flat:name names
goeswith goes with
iobj indirect object
list list
mark marker
nmod nominal modifier
nmod:poss possessive nominal modifier
nmod:tmod temporal modifier
nsubj nominal subject
nsubj:pass passive nominal subject
nummod numeric modifier
nummod:gov numeric modifier governing the case of the noun
obj object
obl oblique nominal
obl:agent agent modifier
obl:arg oblique argument
obl:lmod locative modifier
obl:tmod temporal modifier
orphan orphan
parataxis parataxis
punct punctuation
reparandum overridden disfluency
root root
vocative vocative
xcomp open clausal complement

Dependency Parsing using NLTK

Dependency Parsing can be carried out using the Natural Language Toolkit (NLTK) package which is a collection of libraries and codes used in the statistical Natural Language Processing (NLP) of human language. 

We can use NLTK to achieve dependency parsing through one of the following methods: 

  1. Probabilistic, projective dependency parser: These parsers use the knowledge of human language gleaned from hand-parsed sentences to predict new sentences. They are known to make mistakes and work with a restricted set of training data. 
  2. Stanford parser: This is a natural language parser implemented on Java. You need the Stanford CoreNLP parser to perform dependency parsing. The parser includes several languages including English, Chinese, German, and Arabic. 

Here’s how you can use the parser: 

from nltk.parse.stanford import StanfordDependencyParser

path_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser.jar’

path_models_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar’

dep_parser = StanfordDependencyParser(

   path_to_jar = path_jar, path_to_models_jar = path_models_jar

)

result = dep_parser.raw_parse(‘I shot an elephant in my sleep’)

dependency = result.next()

list(dependency.triples())

The output of the above program is as follows: 

[

   ((u’shot’, u’VBD’), u’nsubj’, (u’I’, u’PRP’)),

   ((u’shot’, u’VBD’), u’dobj’, (u’elephant’, u’NN’)),

   ((u’elephant’, u’NN’), u’det’, (u’an’, u’DT’)),

   ((u’shot’, u’VBD’), u’prep’, (u’in’, u’IN’)),

   ((u’in’, u’IN’), u’pobj’, (u’sleep’, u’NN’)),

   ((u’sleep’, u’NN’), u’poss’, (u’my’, u’PRP$’))

]

Constituency Parsing

Constituency Parsing is based on context-free grammars. Here, the parse tree includes sentences broken into sub-phrases, each belonging to a grammar category. Every linguistic unit or word in a sentence acts as a terminal node, which has its parent node and a part-of-speech tag.

For example, the phrase “a cat” and “a box under the bed” are noun phrases, whereas “write a letter” and “drive a car” are verb phrases.

Let’s consider an example sentenceI shot an elephant in my pyjamas.” Here is a graphical representation of the constituency parse tree:

Source

The parse tree on the left refers to shooting an elephant wearing pyjamas and the parse tree on the right indicates the subject shooting an elephant while in his pyjamas. 

The entire sentence is broken into sub-phases until we have terminal words remaining. VP denotes a verb phrase and NP denotes noun phrases. 

Dependency Parsing vs Constituency Parsing

Constituency parsing can also be implemented using the Stanford parser. It essentially parses a given sentence as per the constituency parser and subsequently converts the constituency parse tree into a dependency tree. 

If your goal is to break a sentence into sub-phrases, you should implement constituency parsing. However, if you want to explore the dependencies between the words in a sentence, you should use dependency parsing. 

Checkout: NLP Project Ideas & Topics

Conclusion

If you found this article helpful, you should check out upGrad’s 6-month PG Certification in Machine Learning and NLP that offers personalised mentorship from industry experts of Flipkart, Gramener, and Zee5. 

The program is designed for engineers, software/ IT, Data & other professionals looking to get a foothold in Data Science and Machine Learning. This Post Graduate Certification from IIIT BBangalorethe boasts a 58% average salary hike is all you need to land advanced positions of Data Analyst, Data Scientist, ML Engineer, and NLP Engineer in top companies. Block your seat today at just Rs. 3,034 per month!

Prepare for a Career of the Future

PG DIPLOMA IN MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE
LEARN MORE @ UPGRAD

Leave a comment

Your email address will not be published.

Accelerate Your Career with upGrad

Our Popular Machine Learning Course

×