Morphology is the study of the internal structure of words and how they are formed and modified. Morphology focuses on how the components within a word (stems, root words, prefixes, suffixes, etc.) are arranged or modified to create different meanings. The Natural Language API uses morphological analysis to infer grammatical information about the words provided to it.
Note that morphology is distinct from syntax (though they may influence each other). For example, the future tense is expressed in English by adding the word "will" before a verb as in the sentence "I will get my umbrella." However, morphologically speaking, neither "will" nor "get" by themselves indicate a future tense, as the words themselves have not changed; instead, the future tense is denoted through syntax rules. Other languages, however, may (and many do) modify those words directly to create future tense verbs.
Within the Natural Language API, context via grammar can affect the morphological analysis of words/tokens, but only if the morphological analysis itself is confined to a single token. (Proper nouns, however, can be recognized across word boundaries.)
Morphology information is returned in the syntactic analysis response's
partOfSpeech field. Additionally, the syntactic relationship between words
is returned in the syntactic analysis response's
Parts of Speech
Within a syntactic request, part-of-speech and morphological information are
returned within the response's
partOfSpeech field. The
contains a set of sub-fields with Part-of-Speech (POS) information as well
as more explicit morphological information. These subfields are listed below.
Morphology varies greatly between languages. Languages such as Spanish, where word endings are changed often to change meaning, will exhibit more morphological features; languages such as English, which rely more on word placement and syntax, will exhibit less. For example, English nouns have lost most distinct morphological cases, as most nouns do not change their word form to indicate cases (except for the nominative, genitive, and accusative on personal pronouns). As a result, morphological analysis depends heavily on the source language, and an understanding of what morphology is supported within that language.
tagdenotes the part of speech using a coarse-grained POS tag (NOUN, VERB, etc.), and provides top-level surface syntax information. POS tags are helpful if you want to create patterns and/or reduce ambiguity for subsequent language analysis (for example, “train” tagged as a NOUN versus a VERB).
numberdenotes a word's grammatical number indicating its count distinction. In English, the suffix "s" is usually used to distinguish plural forms of nouns from singular, for example. Some languages, such as Arabic, have the notion of a dual number as well. This field may contain the following values:
SINGULARdenotes one quantity.
PLURALdenotes more than one quantity.
DUALdenotes precisely two quantities.
persondenotes a word's grammatical person indicating a speaker's relationship to an event. In English, person is most often used on pronouns to distinguish between speakers (first person), those spoken to (second person), and others (third person). This field may contain the following values:
FIRSTperson denotes the first person (the speaker).
SECONDperson denotes the second person (the spoken to).
THIRDperson denotes an "other" person outside of the conversation.
REFLEXIVE_PERSONdenotes use of a reflexive pronoun
genderdenotes a noun's grammatical gender. This field may contain the following values:
casedenotes a word's grammatical case and its relationship to its containing sentence. Note that English does not exhibit many explicit morphological cases, as the information normally conveyed through cases is typically indicated by word order. This field may contain the following values:
ACCUSATIVEcase indicates the direct object of a transitive verb.
ADVERBIALcase indicates an adverbial form of an adjective. Note that English uses separate words to distinguish adverbs ("well") and adjectives ("good") rather than using an explicit adverbial case.
COMPLEMENTIVEcase (Chinese) indicates a word necessary to complete the meaning of a potential, descriptive, or resultative expression using a conjunctive particle.
DATIVEcase indicates an indirect object or the direct object being given something. In English, the dative case is obviated through use of the preposition "to" as in the phrase "He gave the ball to Bobby."
GENITIVEcase indicates possession. Note that in English, the "'s" clitic is used to denote this usage instead of through a strict genitive case.
INSTRUMENTALcase indicates whether a noun is the instrument by which a subject completes an action. In English, the instrumental case is obviated through use of the preposition "with" as in the phrase "He hit him with a baseball bat."
LOCATIVEcase indicates a word's use inferring a location. In English, the locative case is obviated through use of prepositions such as "in", "on", etc. as in the phrase "The cow is in the barn."
NOMINATIVEcase indicates the subject of a verb. In English, the subjetc of a verb is instead indicated through word order.
OBLIQUEcase indicates a word's use as an object to either a verb or preposition.
PARTITIVEcase indicates a word's "partialness" or lack of specific identity.
PREPOSITIONALcase indicates the object of a proposition.
REFLEXIVE_CASEindicates the identity of an object of a verb to its subject. Most languages do not use a reflexive case, as this usage is indicated through use of special reflexive pronouns instead (such as "himself", "myself", etc.")
RELATIVE_CASE(Chinese) indicates the complementizer of a relative clause connecting a noun with a verb or adjective. Examples: 工作 [的] 地方 (work  place :: "place [where I] work"). 便宜 的 餐馆 (inexpensive  restaurants :: restaurants [that are] inexpensive).
VOCATIVEcase indicates a noun being used to address someone or something, usually when spoken to.
tensedenotes a verb's grammatical tense, which indicates the verb's reference to a position in time. Note that
tenseis distinct from
aspect, which also deals with a verb's relationship to time, but focuses on the characteristics of that time flow, rather than its position. The
PLUPERFECTtenses in many languages more accurately refer to specific combinations of tense and aspect. This field may contain the following values:
CONDITIONAL_TENSEis an alternate term for the more prevalent morphological term of "conditional mood." (See
FUTUREdenotes an action taking place in the future. Note that in English, the future tense is most often denoted by adding the word "will" to a verb phrase.
PASTdenotes an action taking place in the past.
PRESENTdenotes an action taking place in the present.
IMPERFECTdenotes an action taking place in the past, but which was not completed at that tense's frame of reference. Note that in English, the imperfect tense is most often denoted by adding a gerund form of a verb to the past tense as in "I was walking." An imperfect tense event takes place in the past, but is not completed relative to that past tense.
PLUPERFECTdenotes an action that has taken place in the past, and was also completed at that tense's frame of reference. For example, "I had walked" takes place in the past, but was also complete during the past tense's frame of reference.
aspectdenotes a verb's grammatical aspect, its expression of time flow. Unlike
tense, which focuses on a verb's position within time,
aspectfocuses on the characteristics of that time flow where it occurs. This field may contain the following values:
PERFECTIVEaspect denotes an event that is "completed" either because it has completely happened in the past or will completely happen in the future.
IMPERFECTIVEaspect denotes an event that is incomplete, either because it is continuous or because it is repeated.
PROGRESSIVEaspect denotes an event that is continuous. A progressive aspect is generally treated as a special case of the more general imperfective aspect (which also covers repetition).
mooddenotes a verb's grammatical mood, which indicates attitude about an underlying action. This field may contain the following values:
CONDITIONAL_MOODindicates an action which is contingent. Note that in English, verb forms are not conditional; instead, conditional behavior is noted through use of the word "would" combined with the verb's infinitive.
IMPERATIVEindicates a command or request through the second person.
INDICATIVEindicates a statement of fact, more generally known as a "realis mood."
INTERROGATIVEindicates a question.
JUSSIVEindicates a command or request through either the first or third person. English does not have a jussive mood, though exhortations that begin with a real or implied "Let us" convey this jussive mood.
SUBJUNCTIVEindicates a quality of uncertainty related to an action, also known as an "irrealis" mood (contrasted with the "realis" indicative mood). English does not have a specific subjunctive mood; instead, words such as "want", "wish", "hope", etc. convey the import of the subjunctive mood.
voicedenotes a verb's grammatical voice, the relationship between an action and a subject and/or object. This field may contain the following values:
ACTIVEvoice indicates an action whose subject is performing the action.
CAUSATIVEvoice indicates an action whose effect is being performed on the subject. In English, no direct causative voice exists; instead, such causation is indicated through use of the verb "make", as in "Mom made me go to school."
PASSIVEvoice indicates an action whose effect is being performed on the subject. In many cases, a passive "agent" is unspoken or unknown.
reciprocitydenotes a word's (typically a pronoun's) reciprocity, indicating the pronoun refers to a noun phrase elsewhere within the sentence. This field may contain the following values:
RECIPROCALindicates the pronoun is reciprocal.
NON_RECIPROCALindicates the pronoun is not reciprocal.
properdenotes whether a noun is part of a proper name. Note that many proper names consist of several words; if this phrase is detected as a proper name, each token will be detected as proper as well. (For example, both "Wrigley" and "Field" in the proper name "Wrigley Field" will have their proper attribute set to
PROPER. This field may contain the following values:
PROPERdenotes that the token is part of a proper name.
NOT_PROPERdenotes that the token is not part of a proper name.
formdenotes additional morphological forms that don't neatly fit into the previous set of common forms (
person, etc.) Most of these forms are specific to unique languages. This field may contain the following values:
ADNOMIAL(Korean/Japanese) indicates a word ending (Korean) or verb (Japanese) that modifies a noun phrase. Examples: 밥을 먹는 사람 [someone who eats rice] and 書く人 [someone who writes].
AUXILIARY(Korean) indicates a word ending that connects two adjacent main and auxiliary predicates: 밥을 먹게 하다 [make (someone) to eat]
COMPLEMENTIZER(Korean) indicates a word ending that connects two or more different clauses: 밥을 먹고 물을 마신다 [ (I) eat rice and drink water]
FINAL_ENDING(Korean/Japanese) indicates a word ending that finalizes the clause or sentence coming at the end of the clause or sentence. Examples: 밥을 먹는다 [(I) eat rice] and 手紙を書く [write a letter].
GERUND(Korean/Japanese) indicates a word ending that nominalizes verbs or adjectives: (Korean) 밥 먹기 [eating rice] or connects verbs with various auxiliary verbs: (Japanese) 書きたい [want to write]
REALIS(Japanese) indicates conditional and subjunctive forms with a conjunctive particle “ば”: 書けば [if (I) write].
IRREALIS(Japanese) indicates connecting verbs with negative, passive, or causitive auxilliary verbs: 書かない [do not write], 書かれる [to be written], 書かせる [make (someone) write].
ORDER(Japanese) indicates a command verb, similar to imperitive: 書け! [write!]
SPECIFIC(Japanese) indicates special forms that cannot be covered by the six categories above. The most common use of this form is a derivation of a noun from an adjective by adding a suffix to the form: かわいさ [cuteness]
SHORT(Russian) indicates a short-form adjective or participle.
LONG(Russian) indicates a long-form adjective or participle, as distinct from the above
Note that the Natural Language API provides morphological information on a per-token basis (not per phrase). Morphological constructs that cross word boundaries may not be supported.
For each sentence within the text provided to the Natural Language API for syntactic analysis, the API constructs a dependency tree, describing the syntactic structure of that sentence. Generally, when analyzing this dependency graph, you will want to iterate over each sentence's constituent tokens.
A diagram of the dependency tree for this single sentence from John F. Kennedy's Inaugural speech appears below:
Note that the dependency tree includes a
ROOT element, which corresponds to
the main verb in the sentence.
label field (of type
explains the syntactic relationship of this token to the token referenced in
In the above example,
headTokenIndex = 0 for "do" and the second
clause's "ask", indicating that these words modify the first clause's
word ("Ask"). The token's
label values specify the type of relationship. For
example, "country" has a
NSUBJ (noun subject) relationship to "do" in the
first clause, while "you" has that same relationship to "do" in the second
Note that although parse trees do not cross sentence boundaries,
headTokenIndex is an index into the token list of the entire document, not
just the current sentence. For the
ROOT word "Ask", the
its own index.
Sentences and tokens within the Natural Language API are indexed using zero-based offset values within the text as a whole. The following pseudo-code provides a common pattern to use when performing iterative operations on the syntactic analysis response:
index = 0 for sentence in self.sentences: content = sentence['text']['content'] sentence_begin = sentence['text']['beginOffset'] sentence_end = sentence_begin + len(content) - 1 while index < len(self.tokens) and self.tokens[index]['text']['beginOffset'] <= sentence_end: # This token is in this sentence index += 1
For more information about dependency trees, consult the Universal Dependency Treebank project. In addition, Universal Dependency Annotation for Multilingual Processing contains background information on the methodology used to interpret such a dependency tree.