fe78c3c104ac1943ff081d117dba20ba.ppt
- Количество слайдов: 43
The Chomsky Hierarchy
sentences The sentence as a string of words E. g I saw the lady with the binoculars string = a b c d e b f
The relations of parts of a string to each other may be different I saw the lady with the binoculars is stucturally ambiguous Who has the binoculars?
[ I ] saw the lady [ with the binoculars ] = [a] b c d [e b f] I saw [ the lady with the binoculars] = a b [c d e b f]
How can we represent the difference? By assigning them different structures. We can represent structures with 'trees'. I read the book
a. I saw the lady with the binoculars S NP VP V NP NP PP I saw the lady with the binoculars I saw [the lady with the binoculars]
b. I saw the lady with the binoculars S NP VP VP PP I saw the lady with the binoculars I [ saw the lady ] with the binoculars
birds fly S NP VP N birds V fly Syntactic rules S → NP NP → N VP → V Graphs and trees VP
S NP VP birds a fly b ab Graphs and trees = string
S A B a b ab S → A B A → a B → b Graphs and trees
Rules Assumption: natural language grammars are a rule-based systems What kind of grammars describe natural language phenomena? What are the formal properties of grammatical rules?
Chomsky (1957) Syntactic Struc-tures. The Hague: Mouton Chomsky, N. and G. A. Miller (1958) Finite-state languages Information and Control 1, 99 -112 Chomsky (1959) On certain formal properties of languages. Information and Control 2, 137 -167
Rules in Linguistics 1. PHONOLOGY /s/ → [θ] V ___V Rewrite /s/ as [θ] when /s/ occurs in context V ____ V With: V = auxiliary node s, θ = terminal nodes
Rules in Linguistics 2. SYNTAX S → NP VP VP → V NP → N Rewrite S as NP VP in any context With: S, NP, VP = auxiliary nodes V, N = terminal node
PHONOLOGY (sound system) Maltese – Word-final devoicing Orthography Pronunciation (spelling) (sound) Sabet sab [sa-bet] [sap] Ħobża ħobż [hob-za] [hops] Vjaġġi vjaġġ [vjağ-ği] [vjačč] voiced [+vd] voiceless [-vd] [b, z, ğ] [p, s, č] [+vd] → [-vd] /____ # (for # = end of word)
MORPHOLOGY (word formation) Maltese – Progressive assimilation in 3 fsg imprefective (present) Marker for verb in 3 rd person feminine singular imperfective t- (3 fsgimpf = she) e. g. she breaks = t-kisser I break = n-kisser t-kisser t-ressaq 3 fsg-break 3 fsg-move she breaks she moves s-sakkar d-dur 3 fsg-lock 3 fsg-turn she locks she turns *t-sakkar * t-dur t → s, d, etc. /____ [s, d, etc. | [+cor] μ [3 fsg] (with μ = morpheme, C = consonant, cor = coronal
SYNTAX (phrase/sentence formation) SENTENCE: The boy kissed the girl SUBJECT PREDICATE NOUN PHRASE VERB PHRASE ART + NOUN VERB + NOUN PHRASE S → NP VP VP → V NP NP → ART N
SEMANTICS (meaning) The lion attacks the hunter ATTACK (a, b) a λy [ATTACK (y, b)] λz λy [ATTACK (y, z)] b (with a = the lion, b = the hunter)
Chomsky Hierarchy 0. Type 0 (recursively enumerable) languages Only restriction on rules: left-hand side cannot be the empty string (* Ø ……. ) 1. Context-Sensitive languages - Context-Sensitive (CS) rules 2. Context-Free languages - Context-Free (CF) rules 3. Regular languages - Non-Context-Free (CF) rules 0 ⊇ 1 ⊇ 2 ⊇ 3 a ⊇ b meaning a properly includes b (a is a superset of b), i. e. b is a proper subset of a or b is in a
Generative power 0. Type 0 (recursively enumerable) languages - only restriction on rules: left-hand side cannot be the empty string (* Ø ……. ) - is the most powerful system 3. Type 3(regular language) - is the least powerful
Superset/subset relation S 1 a b S 2 a c b d f g S 1 is a subset of S 2 ; S 2 is a subset of S 1
Rule Type – 3 Name: Regular Example: Finite State Automata (Markov-process Grammar) Rule type: a) right-linear A x. B or A x with: A, B = auxiliary nodes and x = terminal node b) or left-linear A Bx or A x Generates: ambn with m, n 1 Cannot guarantee that there as many a’s as b’s; no embedding
A regular grammar for natural language sentences S → the A A → cat B A → mouse B A → duck B B → bites C B → sees C B → eats C C → the D D → boy D → girl D → monkey the cat bites the boy the mouse eats the monkey the duck sees the girl
Regular grammars Grammar 1: A → a B B → b A Grammar 3: A → a B B → b A Grammar 5: S → a A S → b B A → a S B → b b S S → Grammar 2: A → a A → B a B → A b Grammar 4: A → a A → B a B → b B → A b Grammar 6: A → A a A → B a B → b B → A b A → a
Grammars: non-regular Grammar 6: S → A B S → b B A → a S B → b b S S→ Grammar 7: A → a A → B a B → b A
Finite-State Automaton article noun NP 1 adjective NP 2
NP article NP 1 adjective NP 1 noun NP → article NP 1 →adjective NP 1 → noun NP 2
A parse tree S NP N root node VP V nonterminal nodes NP DET terminal nodes N
Rule Type – 2 Name: Context Free Example: Phrase Structure Grammars/ Push-Down Automata Rule type: A with: A = auxiliary node = any number of terminal or auxiliary nodes Recursiveness (centre embedding) allowed: A A
CF Grammar A Context Free grammar consists of: a) a finite terminal vocabulary VT b) a finite auxiliary vocabulary VA c) an axiom S VA d) a finite number of context free rules of form A → γ, where A VA and γ {VA VT}* In natural language syntax S is interpreted as the start symbol for sentence, as in S → NP VP
CF Grammars The following languages cannot be generated by a regular grammar Language 1: Language 2: anbn mirror image ab abaaba aabb abba Context-Free rules: A → a b A → b A b
Natural language Is English regular or CF? If centre embedding is required, then it cannot be regular Centre Embedding: 1. [The cat] [likes tuna fish] a b 2. The cat the dog chased likes tuna fish a a b b 3. The cat the dog the rat bit chased likes tuna fish a a b b b 4. The cat the dog the rat the elephant admired bit chased likes tuna fish a a b b b ab aabb aaabbb aaaabbbb
Centre embedding S = ab NP the cat a VP likes tuna b
S NP NP the cat a = aabb VP likes S tuna b NP VP the chased dog b a
NP S VP likes NP S tuna the b cat NP VP a chased NP S b the dog NP VP a the bit rat b a = aaabbb
Natural language Is English regular or CF? If centre embedding is required, then it cannot be regular
Centre Embedding 1. [The cat][likes tuna fish] a b = ab 2. [The cat] [the dog] [chased] [likes tuna fish] a a b b = aabb
[The cat] a [likes tuna fish] b 2. [The cat] [the dog] [chased] [likes. . . ] a b b
3. [The cat] [the dog] [the rat] [bit] [chased] [likes. . . ] a a b b b 4. [The cat] [the dog] [the rat] [the elephant] [admired] [bit] [chased] [likes . . ] = a a b b b aaabbb aaaabbbb
Natural language 2 More Centre Embedding: 1. If S 1, then S 2 a a 2. Either S 3, or S 4 b 3. The man who said S 5 is arriving today 4. The man who said S 6 is arriving the day after Sentence with embedding: If either the man who said S 5 is arriving today or the man who said S 5 is arriving tomorrow, then the man who said S 6 is arriving the day after ab b a = abba
Natural language 2 More Centre Embedding: 1. If S 1, then S 2 a a 2. Either S 3, or S 4 b b Sentence with embedding: If either the man is arriving today or the woman is arriving tomorrow, then the child is arriving the day after. a = [if b = [either the man is arriving today] b = [or the woman is arriving tomorrow]] a = [then the child is arriving the day after] = abba
CS languages The following languages cannot be generated by a CF grammar (by pumping lemma): anbmcndm Swiss German: A string of dative nouns (e. g. aa), followed by a string of accusative nouns (e. g. bbb), followed by a string of dative-taking verbs (cc), followed by a string of accusative-taking verbs (ddd) = aabbbccddd = anbmcndm
Swiss German: Jan sait das (Jan says that) … mer em Hans es Huus hälfed aastriiche we Hans/DAT the house/ACC helped paint we helped Hans paint the house abcd NPdat NPacc Vdat Vacc a a b b c c d d


