
feabc27aa12232502ba5cfe2d745ebcf.ppt
- Количество слайдов: 23
Phrase structures Dr. K. Umaraj Assistant Professor Dept. of Linguistics Madurai Kamaraj University Madurai – 21
COMPUTER AND THE BRAIN Variable /Constant/ Values of a variable Structure of the Brain( Right Hemisphere and Left Hemisphere)Aphasia Lexicon and Computational System with finite number rules we can produce infinite of sentences(Chomsky) (Language Acquisition device)
COMPUTATIONAL LINGUISTICS • The goal of the Computational linguistics /Natural Language Processing (NLP) is to build a computer software that will analyze, understand generate natural language like a human being. • There are two ways in which we can model a language q. Rule based modeling of the natural
Rule Based • A rule is a statement phrased in a positive form that describes possible structures – structures that are claimed to be acceptable to a native speaker. Rules are useful way to express generalizations about the data. • S = NP + VP NP = adjective + Noun VP = V+NP PP = P+NP
Contraints • A constraint is formal statement of structures that are impossible –structure that are claimed to be unacceptable to native speakers. • Adjective will not go with verb • *(ku. Ntana) o. Tinaan • *(ku. NTaana paiyan oru )o. Tinaan( N+adj) is not possible • In English second case will not come in
SEMANTICS • I arrived the house • *I (arrived (the package) • I killed the man • *I killed the water
CORPUS BASED • In Corpus based analysis the system itself derives the rules. Advantage of corpus based tagging is • consistent • more economy ( less rules ) • time saving (no need of trained linguist) • more suitable to handle highly agglutinative languages
CORPUS BASED • Sandhi problems are tackled by providing stem & suffix alternant • Repeated procedure to tackle the problems of conjoined words • language independent • can be used for grammatical tagging can be used for spell – checkers • Can be used to check the grammaticality of
CORPUS BASED Methodology • 1) Look up the dictionary and POS for each word • E. g nii nan. Raaka pa. Ti • Pronoun + Adverb+ root verb/Noun • Avan pa. Ti ee. Rukiraan • Pronoun+noun/root verb+verb • 2) Pick the most likely tag for this word.
Types Of Taggers • Bigram tagger • Make predictions based on the preceding tag • The basic unit is the preceding tag and the current tag • Trigram tagger • We would expect more accurate predictions if more context is taken into account • N-gram tagger
Combination of Rule and Corpus based • Transformational ( Brill taggers) • Combination of Rule-based and corpus based – Like rule-based because rules are used to specify tags in a certain environment – Like corpus approach because machine learning is used—with tagged corpus as input • Input: – tagged corpus – dictionary (with most frequent tags) • Usually constructed from the tagged corpus
Brill tagger • Basic Idea: – Set the most probable tag for each word as a start value – Change tags according to rules of type “if word-1 is a determiner and word is a verb then change the tag to noun” in a specific order – Training is done on tagged corpus: – Write a set of rule templates – Among the set of rules, find one with highest score – Continue from 2 until lowest score threshold is passed
Uses of POS tags • It helps to develop Historical Grammar and Historical Dictionary for Tamil that presents the corpus of Sangam Literature to Modern Tamil. It is a supplement for updating the current Tamil Electronic corpus. It helps to develop Thesaurus for Tamil Language. • The entire lexical data base (e_text and e_dictionary) can be used as develop further
• To test the linguistic hypothesis. To develop the following NLP tools • Text to Speech Synthesizer(TTS) • Automatic Speech recognizer(ASR) • Machine Translation system • Summarization of text • OCR, Spell checker , Grammar checkers and other NLP tools
Procedure for doing POS tagging • In general, the process of POS tagging may be carried out on a piece of text at three separate stages as the followings (a) Stage 1: Manual or automatic pre-editing of text, (b) Stage 2: Manual or automatic tag assignment to words, and (c) Stage 3: Manual post-editing of tagged text.
Tagset There are different types of tagsets are available 1) Flat tagset 2) Hierachical tagset 3) BIS tagset. Following is the list of tagsets used for the Tamil language: Ganesan’s POS tagset, Amrita POS Tagset,
Tagset Rajendran’s ILMT-Tagset, Vasu Ranganathan’s Tamil Tagset, LDC-IL-Tamil hierarchical tagset CIIL, IIIT Hyderabad tagset, Microsoft research labs. Bangalore tagset
Tagset 14 tagsets developed by CIIL Mysore are Pronoun (P), Demonstrative (D), Noun(N)Nominal Modifier (J), Verb (V), Adverb (A)Participle (L) Postposition (PP), Particle ©, Numeral (NUM), Reduplication (RDP), Residual Punctuation (PU) (RD), Unknown (UNK),
Tagset Pronoun (P), Demonstrative (D), Noun(N) Nominal Modifier (J), Verb (V), Adverb (A), Participle (L), Postposition (PP) Particle (C), Numeral (NUM) Reduplication (RDP), Residual (RD) Unknown (UNK), Punctuation (PU)
Issues 1)En. Ru • In Modern Tamil, the word ‘en. Ru’ occurs in the following way. So based on the syntactic of the sentence we have to assign POS tags for the words. • 1. wh-adjective - Example: en. Ru nii vantaai “ in which day you came ” • 2. Adverb - Example: tollai en. Ru tiirum? ‘When will this problem solve’ 3. Conjunction
En. Ru • a. Quatative • Example: avan naa. Lai varuvan en. Ru connaan ‘He said that he will come tomorrow’. • b. Called • Raman en. Ru oru aa. L iruki. Raara? ‘a person called Raman is here? • c. Purposive
En. Ru The subordinate verb end with a future tense marker Example : naan pa. Nattai unga. Lukku ko. Tukkaalam en. Ru ninaitteen ‘I thought I can give the money to you d. Onomotpoeic Example : avan kutu ena oo. Tinaan ‘He rans quickly’ e. Giving focus naan unakku en. Ru vaangiya peena?
Aka • Tamil has adverbial suffix which is responsible to make any word into adverb example cuttamaaka (cleanly), cukamaaka (happily), nicaimaaka (surely) amaitiyaaka ( silently) but there are other cases in Tamil where this adverbial marker ‘aga’ can come with the Proper noun avan raamanaka vantaan ( he came as Lord Rama). If we tag it as adverb the important information like