0f77c4a4f60b4db4eafe2b2d1fc2b7b7.ppt
- Количество слайдов: 99
Natural Language Processing (5) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010 -2011 zhaohai@cs. sjtu. edu. cn webpage: http: //bcmi. sjtu. edu. cn/~zhaohai/lessons/nlp 2011/index. html 1
Outline q Syntactic Processing Ø Basic English Syntax Ø Grammar and Parsing (1) Ø Grammar and Parsing (2) 2
Syntactic Processing (1) Basic English Syntax (1) Word Forms q The study of morphology concerns the construction of words from more basic components corresponding roughly to meaning units. q As we have known previously, there are two basic ways that new words are formed, traditionally classified as inflectional forms and derivational forms. 3
Syntactic Processing (2) Basic English Syntax (2) Word Classes (Part-of-Speech) q Open Class Words: The new words from these classes are regularly introduced into the language as it evolves. e. g. nouns, verbs, adjectives and adverbs. q Closed Class Words: These classes are fixed in the sense that new words in these classes are rarely introduced into the language. Such as articles, pronouns, prepositions, particles, quantifiers, conjunctions, and so on. 4
Syntactic Processing (3) Basic English Syntax (3) Head and Complement of Phrases A word in any of the four open classes may be used to form the basis of a phrase. This word is called the head of the phrase and indicates the type of thing, activity, or quality that the phrase describes. The phrase or set of phrases needed to complete the meaning of such a head is called the complement of the head. Noun Phrases The president of the company His desire to succeed Several challenges from the opposing team Verb Phrases Adjective Phrases Adverbial Phrases looked up the chimney easy to assemble rapidly like a bat believed that the world happy that he’d won intermittently was flat the prize throughout the day ate the pizza angry as a hippo inside the house 5
Syntactic Processing (4) Basic English Syntax (4) Elements of Simple Noun Phrases (1) q count nouns – nouns that describe specific objects or sets of objects. q mass nouns – nouns that describe composites or substances. q specifiers – they indicate how many such objects are being described, as well as how the objects being described relate to the speaker and hearer. q qualifiers – they describe the general class of objects identified by the head. 6
Syntactic Processing (5) Basic English Syntax (5) Elements of Simple Noun Phrases (2) q adjectives – words that attribute qualities to objects yet do not refer to the qualities themselves (for example, angry is an adjective that attributes the quality of anger to something). q noun modifiers – mass or count nouns used to modify another noun. 7
Syntactic Processing (6) Basic English Syntax (6) Specifiers q ordinals – such as the words first and second. q cardinals – such as the words one and two. q determiners – they can be divided into the following general classes: articles (the, a, and an); demonstratives (this, that, these, and those); possessives (John’s and the fat man’s; her, my, and whose); wh-determiners (which and what); quantifying determiners (some, every, most, no, any, both, and half). 8
Syntactic Processing (7) Basic English Syntax (7) Verb Phrases and Simple Sentences (1) Form Examples Example Uses Hit the ball! base hit, cry, go, be I want to go. The dog cries every day. simple present hit, cries, go, am I am thirsty. simple past present participle past participle hit, cried, went, was I was thirsty. hitting, crying, going, being I'm going to the store. hit, cried, gone, been I’ve been there before. I went to the store. Being the last in line aggravates me. The cake was gone. 9
Syntactic Processing (8) Basic English Syntax (8) Verb Phrases and Simple Sentences (2) A sentence is used to assert, query, or command. The way a sentence is called its mode. Mood Example declarative (or assertion) The cat is sleeping. yes/no question Is the cat sleeping? wh-question What is sleeping? or Which cat is sleeping? imperative (or command) Shoot the cat! 10
Syntactic Processing (9) Basic English Syntax (9) Verb Subcategories Verbs can be divided into several different classes: q auxiliary verbs – such as be, do, and have; q modal verbs – such as will, can, and could; q main verbs – such as eat, ran, and believe. The auxiliary and modal verbs usually take a verb phrase as a complement, which produces a sequence of verbs, each the head of its own verb phrase. These sequences are used to form sentences with different tenses. 11
Syntactic Processing (10) Basic English Syntax (10) Basic Tenses Tense The Verb Sequence Example simple present He walks to the store. simple past He walked to the store. simple future will + infinitive He will walk to the store. present perfect have in present + past participle He has walked to the store. future perfect will + have in infinitive I will have walked to + past participle the store. past perfect (or pluperfect) have in past + past participle I had walked to the store. 12
Syntactic Processing (11) Basic English Syntax (11) Transitivity q Depending on the verb, a wide variety of complement structures are allowed. For example, certain verbs may stand alone with no complement. These are called intransitive verbs such as laugh (e. g. , Jack laughed) and run (e. g. , He will have been running). q Another common complement form requires a noun phrase to follow the verb. These are called transitive verbs such as find (e. g. , , Jack found a key). Notice that find cannot be intransitive, whereas laugh cannot be transitive (e. g. , , *Jack laughed a key is not a reasonable sentence). 13
Syntactic Processing (12) Basic English Syntax (12) Passives q Transitive verbs allow another form of verb group called the passive form, which is constructed using a be auxiliary followed by the past participle. Such as, the clue will be found by me. q Some verbs allow two noun phrases to follow them in a sentence. In such sentences the second NP corresponds to the object NP and is called the direct object. The other NP is called the indirect object. Generally, such sentences have an equivalent sentence where the indirect object appears with a preposition, as in Jack gave a book to Sue or Jack found a key for me. 14
Syntactic Processing (13) Basic English Syntax (13) Particles q Some verb forms are constructed from a verb and an additional word called a particle. q Particles generally overlap with the class of prepositions. Some examples are up, out, over, and in. With verbs such as look, take, or put, you can construct many different verbs by combining the verb with a particle (e. g. , look up, look out, look over, and so on). q In some sentences the difference between a particle and a preposition results in two different readings for the same sentence. 15
Syntactic Processing (14) Basic English Syntax (14) Clausal Complements (1) q Many verbs allow clauses as complements. Clauses share most of the same properties of sentences and may have a subject, indicate tense, and occur in passivized forms. q One common clause form consists of a sentence form preceded by the complementizer that, as in that Jack ate the pizza. This clause will be identified by the expression S[that], indicating a special subclass of S structures. The passive is possible, as in Sam knows that the pizza was eaten by Jack. 16
Syntactic Processing (15) Basic English Syntax (15) Clausal Complements (2) q Another clause type involves the infinitive form of the verb. q The VP[inf] clause is simply a VP starting in the infinitive form, as in the complement of the verb wish in Jack wishes to eat the pizza. q An infinitive sentence S[inf] form is also possible where the subject is indicated by a for phrase, as in Jack wishes for Sam to eat the pizza. 17
Syntactic Processing (16) Basic English Syntax (16) Clausal Complements (3) q Another important class of clauses are sentences with complementizers that are wh-words, such as who, what, where, why, whether, and how many. q These question clauses, S[wh], can be used as a complement of verbs such as know, as in Sam knows whether we went to the party and The police know who committed the crime. 18
Syntactic Processing (17) Basic English Syntax (17) Prepositional Phrase Complements q Many verbs require complements that involve a specific prepositional phrase (PP). For example, the verb give takes a complement consisting of an NP and a PP with the preposition to, as in Jack gave the book to the library. Its complement structure is NP + PP[to]. q Other structures are listed below: NP + PP[about]; NP + PP[on]; NP + Location; PP[with] + PP[about] etc. 19
Syntactic Processing (18) Basic English Syntax (18) More Complex Forms of Noun Phrases (1) q The noun phrase can be combined by sentences, verb phrases or prepositional phrases etc. such as: PRON + PP[of]: their love of France PRON + PP[on]: his reliance on handouts ART + N + PP[with]: a familiarity with computers PRON + VP[inf]: his desire to release the guinea pig PRON + S[inf]: my hope for John to open the case again 20
Syntactic Processing (19) Basic English Syntax (19) More Complex Forms of Noun Phrases (2) S[that]: That George had the ring … VP[inf]: To own a car … S[inf]: For us to complete a project on time … VP[ing]: Giving up the game … S[ing]: John’s giving up the game … ART + N + S[rel]: The man who gave Bill the money … 21
Syntactic Processing (20) Basic English Syntax (20) Adjective Phrases (1) q Some simple adjective phrases (ADJPs) consisting of a single adjective. q More complex adjective phrases are also possible, as adjectives may take many of the same complement forms that occur with verbs. 22
Syntactic Processing (21) Basic English Syntax (21) Adjective Phrases (2) ADJ + PP[with]: pleased with the prize ADJ + PP[at]: angry at the committee ADJ + S[that]: angry that he was left behind ADJ + VP[inf]: willing to lead the chorus 23
Syntactic Processing (22) Basic English Syntax (22) Adverbs q indicators of degree (for example, very, rather, too) q location phrases (for example, here, everywhere) q manner in which something is done (for example, slowly, hesitantly) q time of something (for example, now, yesterday) q frequency of something (for example, frequently, rarely, never) 24
Syntactic Processing (23) Basic English Syntax (23) Adverb Phrases q manner (e. g. , this way) q temporal (e. g. , before the fight started) q duration (e. g. , for three hours) q location (e. g. , in the box) q degree (e. g. , too late) q frequency (e. g. every time that John comes for a visit) 25
Syntactic Processing (24) Basic English Syntax (24) References C. L. Baker. 1989. English Syntax. Cambridge, MA: MIT Press. 26
Syntactic Processing (25) Grammars and Parsing (1) Grammar A grammar of a language indicates the syntactic rules for combining words into well-formed phrases and clauses. Such as context-free grammars (CFGs), which consists entirely of rules with a single symbol on the left-hand side. CFGs are a very important class of grammars for two reasons: q The formalism is powerful enough to describe most of the structure in natural languages, q It is restricted enough so that efficient parsers can be built to analyze sentences. 27
Syntactic Processing (26) Grammars and Parsing (2) A Simple Grammar (CFGs) (1) Input: John ate the cat. 28
Syntactic Processing (27) Grammars and Parsing (3) A Simple Grammar (CFGs) (2) q Terminal symbols - Symbols that cannot be further decomposed in a grammar, namely the words in the preceding example; q Nonterminal symbols - The other symbols, such as NP, VP, and S; q Lexical symbols - The grammatical symbols such as N and V that describe word categories (part-of-speech). 29
Syntactic Processing (28) Grammars and Parsing (4) Parsing (1) q Parsing is the "de-linearization" of linguistic input; that is, the use of grammatical rules and other knowledge sources to determine the functions of words in an input sentence. q Usually a parser produces a data structure like a derivation tree to represent the structural meaning of a sentence. There are two basic methods of searching during a parsing: 30
Syntactic Processing (29) Grammars and Parsing (5) Parsing (2) Ø A top-down strategy starts with the S symbol and then searches through different ways to rewrite the symbols until an input sentence is generated, or until all possibilities have been explored. Ø A bottom-up strategy starts with the words in an sentence and uses the rewrite rules backward to reduce the sequence of symbols until it consists solely of S. 31
Syntactic Processing (30) Grammars and Parsing (6) Top-Down Parsing S NP VP (rewriting S) NAME VP (rewriting NP) John VP (rewriting NAME) John V NP (rewriting VP) John ate NP (rewriting V) John ate ART N (rewriting NP) John ate the N (rewriting ART) John ate the cat (rewriting N) 32
Syntactic Processing (31) Grammars and Parsing (7) Bottom-Up Parsing NAME ate the cat (rewriting John) NAME V the cat (rewriting ate) NAME V ART cat (rewriting the) NAME V ART N (rewriting cat) NP V ART N (rewriting NAME) NP V NP (rewriting ART N) NP VP (rewriting V NP) S (rewriting NP VP) 33
Syntactic Processing (32) Grammars and Parsing (8) Tree Structure for Parsing 34
Syntactic Processing (33) Grammars and Parsing (9) References A. V. Aho and J. D. Ullman. 1972. Theory of Parsing, Translation and Compiling. Englewood Cliffs, NJ: Prentice. Hall. N. Chomsky. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. 35
Outline q Syntactic Processing Ø Basic English Syntax Ø Grammar and Parsing (1) Ø Grammar and Parsing (2) 36
Syntactic Processing (37) Grammars and Parsing (10) What Makes a Good Grammar ? There are three criterions for a good grammar: q Generality: the range of sentences in which the grammar analyzes correctly. q Selectivity: the range of non-sentences in which it identifies as problematic. q Understandability: the simplicity of the grammar itself. 37
Syntactic Processing (38) Grammars and Parsing (11) Generality (1) Chomsky delineated four types of grammars, numbered 0 to 3: q Type 0: Most general, with no restrictions on the form that rewrite rules can take. Languages generated by type 0 grammars are exactly those recognized by Turing machines. q Type 1: Context-sensitive. The right-hand side of the rewrite rules must contain at least as many symbols as the left-hand side. 38
Syntactic Processing (39) Grammars and Parsing (12) Generality (2) q Type 2: Context-free. The left-hand side of a rewrite rule must contain exactly one symbol, a non-terminal symbol. q Type 3: Regular expressions. All rewrite rules are of the form X → a. Y or X → a. Exactly those languages recognized by finite state automata. 39
Syntactic Processing (40) Grammars and Parsing (13) Generality (3) 1) S → NP VP 2) NP → ART N 3) NP → ART ADJ N 4) VP → V 5) VP → V NP q This grammar is a CFG, which depicts a sentence structure. In this sentence, the words are involved in four word categories - N, V, ART, and ADJ. The ART must be prior to ADJ and N, etc. 40
Syntactic Processing (41) Grammars and Parsing (14) Selectivity (1) Ex. 1: If you decide that a group of words forms a particular constituent, try to construct a new sentence that involves that group of words in a conjunction with another group of words. This is a good test because for the most part only constituents of the same type can be conjoined. The following sentences, for example, are acceptable: 41
Syntactic Processing (42) Grammars and Parsing (15) Selectivity (2) a) I ate a hamburger and a hot dog. (NP-NP) b) I ate a cold and well burned hot dog. (ADJP-ADJP) c) I ate the hot dog slowly and very carefully. (ADVP-ADVP) 42
Syntactic Processing (43) Grammars and Parsing (16) Selectivity (3) Ex. 2: However, the following sentences cannot be accepted: a) *I ate a hamburger and on the stove. b) *I ate a cold hot dog and well burned. c) *I ate the hot dog slowly and a hamburger. q To summarize, if the proposed constituent doesn't conjoin in some sentences with a constituent of the same class, it is probably incorrect. 43
Syntactic Processing (44) Grammars and Parsing (17) Understandability q In small grammars, one structural analysis of a sentence may appear as understandable as another, and little can be said as to why one is superior to the other. q As you attempt to extend a grammar to cover a wide range of sentences, the analysis that retains its simplicity and generality is more desirable. 44
Syntactic Processing (45) Grammars and Parsing (18) Grammar Developing q As you develop a grammar, each constituent is used in more and more different ways. As a result, you have a growing number of tests that can be performed to see if a new analysis is reasonable or not. q Sometimes the analysis of a new form might force you to back up and modify the existing grammar. When a new rule is proposed for a grammar, you must carefully consider its interaction with existing rules. 45
Syntactic Processing (46) Grammars and Parsing (19) A Top-Down Parser q A top-down parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that matches the classes of the words in an input sentence. q The state of the parse at any given time can be represented as a list of symbols that are the result of operations applied so far, called the symbol list. 46
Syntactic Processing (47) Grammars and Parsing (20) Symbol List For example, the parser starts in the state (S) and after applying the rule S → NP VP the symbol list will be (NP VP). If it then applies the rule NP → ART N, the symbol list will be (ART N VP), and so on. 47
Syntactic Processing (48) Grammars and Parsing (21) Backtracking q Backtracking presents that a parsing algorithm that is guaranteed to find a parse if there is one must systematically explore every possible new state. q Using this approach, you generate all possible new states. One of these is picked to be the next state and the rest are saved as backup states. q If you ever reach a situation where the current state cannot lead to a solution, you simply pick a new current state from the list of backup states. 48
Syntactic Processing (49) Grammars and Parsing (22) Possibilities List (1) q The algorithm manipulates a list of possible states, called the possibilities list. q The first element of this list is the current state, which consists of a symbol list and a word position. q In the sentence, and the remaining elements of the search state are the backup states, each indicating an alternate symbollist—word-position pair. 49
Syntactic Processing (50) Grammars and Parsing (23) Possibilities List (2) For example, the possibilities list (((N) 2) ((NAME) 1) ((ADJ N) 1)) indicates that the current state consists of the symbol list (N) at position 2, and that there are two possible backup states: one consisting of the symbol list (NAME) at position 1 and the other consisting of the symbol list (ADJ N) at position 1. 50
Syntactic Processing (51) Grammars and Parsing (24) A Simple Top-Down Parsing Algorithm (1) 1. Select the current state: Take the first state off the possibilities list and call it C. If the possibilities list is empty, then the algorithm fails (that is, no successful parse is possible). 2. If C consists of an empty symbol list and the word position is at the end of the sentence, then the algorithm succeeds. 3. Otherwise, generate the next possible states: 51
Syntactic Processing (52) Grammars and Parsing (25) A Simple Top-Down Parsing Algorithm (2) 3. 1. If the first symbol on the symbol list of C is a lexical symbol, and the next word in the sentence can be in that class, then create a new state by removing the first symbol from the symbol list and updating the word position, and add it to the possibilities list. 3. 2. Otherwise, if the first symbol on the symbol list of C is a non-terminal, generate a new state for each rule in the grammar that can rewrite that nonterminal symbol and add them all to the possibilities list. 52
Syntactic Processing (53) Grammars and Parsing (26) Sample 1 (Top-Down Parsing Algorithm) (1) Grammar: Lexicon: 1. S → NP VP cried: V 2. NP → ART N dogs: N, V 3. NP → ART ADJ N the: ART 4. VP → V 5. VP → V NP Input: 1 The 2 dogs 3 cried 4 53
Syntactic Processing (54) Grammars and Parsing (27) Sample 1 (Top-Down Parsing Algorithm) (2) Step Current State Comment Backup States 1. ((S) 1) initial position 2. ((NP VP) 1) rewriting S by rule 1 3. ((ART N VP) 1) rewriting NP by rules 2 & 3 ((ART ADJ N VP) 1) 4. matching ART with the ((N VP) 2) ((ART ADJ N VP) 1) 5. matching N with dogs ((VP) 3) ((ART ADJ N VP) 1) 6. rewriting VP by rules 4 & 5 ((V) 3) ((V NP) 3) ((ART ADJ N VP) 1) 7. the parse succeeds as V is matched to cried, leaving an empty grammatical symbol list with an empty sentence 54
Syntactic Processing (55) Grammars and Parsing (28) Sample 2 (Top-Down Parsing Algorithm) (1) Consider the same algorithm and grammar operating on the sentence 1 The 2 old 3 man 4 cried 5 In this case assume that the word old is ambiguous between an ADJ and a N and that the word man is ambiguous between a N and a V (as in the sentence The sailors man the boats). Specifically, the lexicon is the: ART; old: ADJ, N; man: N, V; cried: V. 55
Syntactic Processing (56) Grammars and Parsing (29) Sample 2 for Top-Down Parsing Algorithm (2) 56
Syntactic Processing (57) Grammars and Parsing (30) Parsing as a Search Procedure The possibilities list is initially set to the start state of the parse. Then search procedure repeat the following steps until you have success or failure: 1. Select the first state from the possibilities list (and remove it from the list). 2. Generate the new states by trying every possible option from the selected state (there may be none if we are on a bad path). 3. Add the states generated in step 2 to the possibilities list. 57
Syntactic Processing (58) Grammars and Parsing (31) LIFO and FIFO depth-first strategy breadth-first strategy queue (first-in first-out) stack (last-in first out) 58
Syntactic Processing (59) Grammars and Parsing (32) Search Tree 59
Syntactic Processing (60) Grammars and Parsing (33) Difference between Two Strategies q The main difference between depth-first and breadth-first searches is the order in which the two possible interpretations of the first NP are examined. q With the depth-first strategy, one interpretation is considered and expanded until it fails; only then is the second one considered. q With the breadth-first strategy, both interpretations are considered alternately, each being expanded one step at a time. 60
Syntactic Processing (61) Grammars and Parsing (34) Left-Recursive Rule q In certain cases it is possible to put these simple search strategies into an infinite loop. For example, consider a leftrecursive rule that could be a first account of the possessive in English: NP → NP ‘s N q With a naive depth-first strategy, a state starting with the nonterminal NP would be rewritten to a new state beginning with NP ‘s N. So this is a recursive procedure. Unless an explicit check were incorporated into the parser, it would rewrite NPs forever! 61
Syntactic Processing (62) Grammars and Parsing (35) A Bottom-UP Chart Parser (1) q The main difference between top-down and bottom-up parsers is the way the grammar rules are used. Ex. NP → ART ADJ N a top-down parser a bottom-up parser (find NP in sentences) (find the ART ADJ N sequence in sentences) 62
Syntactic Processing (63) Grammars and Parsing (36) A Bottom-UP Chart Parser (2) q A bottom-up parser could be built by formulating this matching process as a search process. q The state would simply consist of a symbol list, starting with the words in the sentence. q Successor states could be generated by exploring all possible ways to Ø rewrite a word by its possible lexical categories; Ø replace a sequence of symbols that matches the right-hand side of a grammar rule by its left-hand side symbol. 63
Syntactic Processing (64) Grammars and Parsing (37) Chart q Definition: The chart is a data structure, which allows the parser to store the partial results of the matching it has done so far. Why we need a chart? q Note: Sometimes a bottom-up parser would be prohibitively expensive, as it would tend to try the same matches again and again, thus duplicating much of its work unnecessarily. To avoid this problem, we can use a chart to resolve it. 64
Syntactic Processing (65) Grammars and Parsing (38) Key q Definition: The key is a constituent in a grammar rule. q To find rules that match a string involving the key, look for rules that start with the key, or for rules that have already been started by earlier keys and require the present key either to complete the rule or to extend the rule. 65
Syntactic Processing (66) Grammars and Parsing (39) A Sample of Keys (1) Grammar: 1. S → NP VP 2. NP → ART ADJ N 3. NP → ART N 4. NP → ADJ N 5. VP → AUX VP 6. VP → V NP 66
Syntactic Processing (67) Grammars and Parsing (40) A Sample of Keys (2) q Assume a sentence is being parsed and it starts with an ART. q With this ART as the key, rules 2 and 3 are matched because they start with ART. To record this for analyzing the next key, you need to record that rules 2 and 3 could be continued at the point after the ART. q You denote this fact by writing the rule with a dot (◦), indicating what has been so far. Thus you record 2'. NP → ART ◦ ADJ N 3'. NP → ART ◦ N 67
Syntactic Processing (68) Grammars and Parsing (41) A Sample of Keys (3) q If the next input key is an ADJ, then rule 4 may be started, and the modified rule 2' may be extended to give 2''. NP -> ART ADJ ◦ N 68
Syntactic Processing (69) Grammars and Parsing (42) Active Arcs (1) q Definition: The active arcs maintains the record of rules that have matched partially but are not completed. q Comparison: The chart maintains the record of all the constituents derived from the sentence so far in the parse. For example, after seeing an initial ART followed by an ADJ in the preceding example, you would have the chart shown in the following figure: 69
Syntactic Processing (70) Grammars and Parsing (43) Active Arcs (2) 70
Syntactic Processing (71) Grammars and Parsing (44) Agenda q Definition: The basic operation of a chart-based parser involves combining an active arc with a completed constituent. q The result is either a new completed constituent or a new active arc that is an extension of the original active arc. q New completed constituents are maintained on a list called the agenda until they themselves are added to the chart. 71
Syntactic Processing (72) Grammars and Parsing (45) The Arc Extension Algorithm 72
Syntactic Processing (73) Grammars and Parsing (46) An Bottom-Up Chart Parsing Algorithm The algorithm: Do until there is no input left: 1. If the agenda is empty, look up the interpretations for the next word in the input and add them to the agenda. 2. Select a constituent from the agenda (let’s call it constituent C from position p 1 to p 2). 3. For each rule in the grammar of form X → C X 1. . . Xn, add an active arc of form X → ◦ C X 1. . Xn from position p 1 to p 2. 4. Add C to the chart using the arc extension algorithm above. 73
Syntactic Processing (74) Grammars and Parsing (47) A Sample (1) Lexicon: the: ART large: ADJ can: N, AUX, V hold: N, V water: N, V Input: The large can hold the water. 74
Syntactic Processing (75) Grammars and Parsing (48) A Sample (2) 75
Syntactic Processing (76) Grammars and Parsing (49) A Sample (3) 76
Syntactic Processing (77) Grammars and Parsing (50) A Sample (4) 77
Syntactic Processing (78) Grammars and Parsing (51) A Sample (5) 78
Syntactic Processing (82) Grammars and Parsing (52) Efficiency q Chart-based parsers can be considerably more efficient than parsers that rely on a search because the same constituent is never constructed more than once. q A chart-based parser in the worst case would build every possible constituent between every possible pair of positions. It has a worst-case complexity of K*n 3, where n is the length of the sentence and K is a constant depending on the algorithm. q But a pure top-down or bottom-up search strategy could require up to Cn, where C is a constant that depends on the specific algorithm. 79
Syntactic Processing (83) Grammars and Parsing (53) Transition Network and Its Grammars (1) q It is based on the notion of a transition network consisting of nodes and labeled arcs. One of the nodes is specified as the initial state, or start state. q Starting at the initial state, you can traverse an arc if the current word in the sentence is in the category on the arc. If the arc is followed, the current word is updated to the next word. q A phrase is legal if there is a path from the starting node to a pop arc that accounts for every word in the phrase. 80
Syntactic Processing (84) Grammars and Parsing (54) Transition Network and Its Grammars (2) This network recognizes the same set of sentences as the following CFG grammar: NP → ART NP 1 → ADJ NP 1 → N 81
Syntactic Processing (85) Grammars and Parsing (55) Recursive Transition Network (1) q Simple transition networks are often called finite state machines (FSMs). FSMs are equivalent in expressive power to regular grammars (type 3), and thus are not powerful enough to describe all languages that can be described by a CFG. q To get the descriptive power of CFGs, you need a notion of recursion in the network grammar. A recursive transition network (RTN) is like a simple transition network, except that it allows arc labels to refer to other networks as well as word categories. 82
Syntactic Processing (86) Grammars and Parsing (56) Recursive Transition Network (2) The network for simple English sentences can be expressed as follows. Uppercase labels refer to networks. Although not shown in this example, RTNs allow true recursion—that is, a network might have an arc labeled with its own name. The purple cow ate the grass is accepted as a legal sentence. 83
Syntactic Processing (87) Grammars and Parsing (57) The Arc Labels for RTNs Arc Type Example How Used CAT noun WRD of succeeds only if current word is identical to the label PUSH NP succeeds only if named network can be successfully traversed JUMP jump always succeeds POP pop succeeds and signals the successful end of the network succeeds only if current word is of the named category Ex. NP: PUSH arc; art, adj, noun, verb: CAT arc; pop: POP arc 84
Syntactic Processing (88) Grammars and Parsing (58) Top-Down Parsing with Recursive Transition Networks An algorithm for parsing with RTNs can be developed along the same lines as the algorithms for parsing CFGs. The state of the parse at any moment can be represented by the following: q current position - a pointer to the next word to be parsed. q current node - the node at which you are located in the network. q return points - a stack of nodes in other networks where you will continue. 85
Syntactic Processing (89) Grammars and Parsing (59) An Algorithm for Searching an RTN (1) Case 1: If arc names word category and next word in a sentence is in that category, Then (1) update current position to start at the next word; (2) update current node to the destination of the arc. Case 2: If arc is a push arc to a network N, Then (1) add the destination of the arc onto return points; (2) update current node to the starting node in network N. 86
Syntactic Processing (90) Grammars and Parsing (60) An Algorithm for Searching an RTN (2) Case 3: If arc is a pop arc and return points list is not empty, Then (1) remove first return point and make it current node. Case 4: If arc is a pop arc, return points list is empty and there are no words left, Then (1) parse completes successfully. 87
Syntactic Processing (91) Grammars and Parsing (61) A RTN Grammar Note: The numbers on the arcs simply indicate the order in which arcs will be tried when more than one arc leaves a node. 88
Syntactic Processing (92) Grammars and Parsing (62) A Trace of a Top-Down Parse Input: 1 The 2 old 3 man 4 cried 5 1 89
Syntactic Processing (93) Grammars and Parsing (63) a Top-Down RTN Parse with Backtracking (1) Consider this technique in operation on the following sentence: 1 One 2 saw 3 the 4 man 5 The parser initially attempts to parse the sentence as beginning with the NP one saw, but after failing to find a verb, it backtracks and finds a successful parse starting with the NP one. At each stage the current parse state is shown in the form of a triple (current node, current position, return points), together with possible states for backtracking. The figure also shows the arcs used to generate the new state and backup states. 90
Syntactic Processing (94) Grammars and Parsing (64) a Top-Down RTN Parse with Backtracking (2) Step Current State Arc to be Followed Backup States 1. (S, 1, NIL) S/1 NIL 2. (NP, 1, (S 1)) NP/2 (& NP/3 for backup) NIL 3. (NP 1, 2, (S 1)) NPl/2 (NP 2, 2, (S 1)) 4. (NP 2, 3, (S 1)) NP 2/l (NP 2, 2, (S 1)) 5. (S 1, 3, NIL) no arc can be followed (NP 2, 2, (S 1)) 6. (NP 2, 2, (S 1)) NP 2/l NIL 7. (S 1, 2, NIL) S 1/l NIL 8. (S 2, 3, NIL) S 2/2 NIL 9. (NP, 3, (S 2)) NP/1 NIL 10. (NP 1, 4, (S 2)) NP 1/2 NIL 11. (NP 2, 5, (S 2)) NP 2/1 NIL 12. (S 2, 5, NIL) S 2/1 NIL 13. parse succeeds NIL 91
Syntactic Processing (95) Grammars and Parsing (65) Well-Formed Substring Table (1) q An RTN parser can be constructed to use a chart-like structure to gain the advantages of chart parsing. In RTN systems, the chart is often called the well-formed substring table (WFST). q Each time a pop is followed, the constituent is placed on the WFST, and every time a push is found, the WFST is checked before the subnetwork is invoked. 92
Syntactic Processing (96) Grammars and Parsing (66) Well-Formed Substring Table (2) q If the chart contains constituent(s) of the type being pushed for, these constituent are used and the subnetwork is not reinvoked. q An RTN using a WFST has the same complexity as the chart parser described previously: K*n 3, where n is the length of the sentence. 93
Syntactic Processing (79) Assignments (10) 1. A very useful test for determining the syntactic role of words and phrases is the conjunction test. Conjunctions, such as and or, tend to join together two phrases of the same type. For instance, you can conjoin nouns, as in the man and woman; noun phrases, as in the angry men and the large dogs; and adjective phrases, as in the cow, angry and confused, broke the gate. In each of the following sentences, identify the type of phrase being conjoined, and underline each phrase. 94
Syntactic Processing (80) Assignments (10) He was tired and hungrier than a herd of elephants. We have never walked home or to the store from here. The dog returned quickly and dropped the stick. 2. Wh-questions are questions that use a class of words that includes what, where, who, when, whose, which, and how. For each of these words, give the syntactic categories (for example, verb, noun group, adjective, quantifier, 95
Syntactic Processing (81) Assignments (10) prepositional phrase, and so on) in which the words can be used. Justify each classification with some examples that demonstrate it. Use both positive and negative arguments as necessary (such as ”it is one of these because. . . , ” or ”it can’t be one of these even though it looks like it might, because . . . ”). 96
Syntactic Processing (82) Assignments (10) 3. Given the CFG grammar and a lexicon in the following, please show a trace in the format of the figure mentioned previously adopting a top-down CFG parser for the sentence The man walked the old dog. Grammar: Lexicon: 1. S → NP VP the: ART 2. NP → ART N man: N, V 3. NP → ART ADJ N walked: V 4. VP → V old: ADJ 5. VP → V NP dog: N, V 97
Syntactic Processing (83) Assignments (10) 4. Consider the following CFG: S → NP V S → NP AUX V NP → ART N Trace one of the chart parsers in processing the sentence 1 The 2 man 3 is 4 laughing 5 with the lexicon entries: 98
Syntactic Processing (84) Assignments (10) the: ART man: N is: AUX laughing: V Show every step of the parse, giving the parse stack, and drawing the chart each time a nonterminal constituent is added to the chart. 99
0f77c4a4f60b4db4eafe2b2d1fc2b7b7.ppt