lublin.ppt
- Количество слайдов: 36
Iryna Biskub Computational Linguistics: Past Imperfect – Future Indefinite?
Computational Linguistics: n Computational Linguistics n Computational Lexicography n Computer Corpus Linguistics
Computational Linguistics Cognitive Science Approach Engineering Approach
Branches of Computational Linguistics n Machine Translation n Linguistic Information Retrieval Systems n Man – Machine Interfaces
Information Retrieval Systems Computational lexicons Associative Lexicon (Language Competence + Language Performance) Lexical Database (Word Knowledge) Lexical Knowledge Bank (Word Knowledge + World Knowledge)
The objectives of Computational Linguistics 1. Automatic tagging - automatically disambiguating part-of-speech labels in text. n Two main approaches to automatic tagging: - Probabilistic taggers - are trained on disambiguated n text and vary as to how much training text is needed and how much human effort is required in the training process. - Rule-based taggers – rely strictly on linguistic categorization and level hierarchy of language
Part-of-Speech Tags (COBUILD Corpus) NOUN a macro tag: stands for any noun tag walk/NOUN VERB a macro tag: stands for any verb tag dog@/VERB common noun peer/NN noun plural needs/NNS will not show the word as a 3 rd person singular verb. JJ adjective sound/JJ not as a verb or noun. DT definite and indefinite article This is used in word strings, as we shall see in Session 6. It gives a, an and the. IN preposition This is used in word strings, when you want a word plus preposition. RB adverb Is there an adverb derived from prohibit? prohibit*/RB Or from ration*/RB? VB base-form verb trigger/VB or impact/VB VBN past participle verb read/VBN – useful if studying passive or perfect aspect. And you can separate out adjectival uses. VBG -ing form verb read/VBG – useful if studying continuous aspect. And you can separate out adjectival uses. VBD past tense verb set can be present and past. set/VBD only shows concordances where it is a past tense verb. CC coordinating conjunction e. g. and, but CS subordinating conjunction e. g. while, because PPS personal pronoun subject case e. g. she, I PPO personal pronoun object case e. g. her, me PPP possessive pronoun e. g. hers, mine DTG determiner-pronoun e. g. many, all, both, some NN NNS
POS Tags: BNC n n n n n n n n APP$ AT AT 1 ATA BCS BTO CC CCB CF CS CSA CSH CSN CST CSW DA same) DA 1 DA 2 R DA 2 T DAR DAT DB DB 2 DD DD 1 DD 2 DDQ$ DDQV possessive pronoun, pre-nominal (my, your, our) article (the, no) singular article (a, an, every) after-article (other, only) before-conjunction (in order, even, preceding that, if etc) before-infinitive marker (in order, so as, preceding to) coordinating conjunction (and, or) coordinating conjunction (but) semi-coordinating conjunction (so, then, yet) subordinating conjunction (if, because, unless) as as conjunction the introducing comparative clauses than as conjunction that as conjunction whether as conjunction after-determiner (capable of pronominal function) (such, former, singular after-determiner (little, much) plural after-determiner (few, several, many) comparative plural after-determiner (fewer) superlative plural after-determiner (fewer) comparative after-determiner (more, less) superlative after-determiner (most, least) before determiner (capable of pronominal function) (all, half) plural before-determiner (capable of pronominal function) (both) determiner (capable of pronominal function) (any, some) singular determiner (this, that, another) plural determiner (these, those) wh-determiner (which, what) wh-determiner, genitive (whose) wh-determiner, wh-ever determiner, (whichever, whatever) wh-ever n n n n n n n n JJ JJR JJT JK LE MC MC$ MC-MC MC 1 MC 2 MD MF NC 2 ND 1 NN NN 1 NN 2 NNJ 1 NNJ 2 NNL 1 NNL 2 NNO 1 NNO 2 NNS 1 NNS 2 NNSA 1 NNSA 2 general adjective general comparative adjective (older, better, stronger) general superlative adjective (oldest, best, strongest) adjective catenative (able in be able to, willing in be willing to) leading coordinator (both in both. . and, either in either. . or) both. . and, either. . or) cardinal number, neutral for number (two, three. . ) genitive cardinal number, neutral for number (10 s, 100 s) hyphenated number (40 -50, 1770 -1827) singular cardinal number (one) plural cardinal number (tens, hundreds) ordinal number (first, second, next, last) fraction, neutral for number (quarters, two-thirds) plural cited word (ifs in two ifs and a but) singular noun of direction (north, southeast) common noun, neutral for number (sheep, cod, headquarters) singular common noun (book, girl) genitive singular common noun (domini) plural common noun (books, girls) organization noun, neutral for number (co. , group) organization noun, singular (no known examples) organization noun, plural (groups, councils, unions) locative noun, neutral for number (is. ) singular locative noun (island, street) plural locative noun (islands, streets) numeral noun, neutral for number (dozen, hundred) numeral noun, singular (no known examples) numeral noun, plural (hundreds, thousands) noun of style or title, neutral for number (no known examples) noun of style or title, singular (mrs, president, rev) (mrs, noun of style or title, plural (messrs, presidents) (messrs, following noun of style or title, abbreviatory (m. a. ) following plural noun of style or title, abbreviatory
The objectives of Computational Linguistics n 2. Parsers Computational linguistics uses parsers for the automatic analysis of language. The term 'parser' is derived from the Latin word pars meaning ‘part’, as in part of speech. Parsing in its most basic form consists in: n the automatic decomposition of a complex sign into its elementary components; n the automatic classification of the components via lexical lookup; n the automatic composition of the classified components via syntactic rules in order to arrive at an overall grammatical analysis of the complex sign.
Automatic Sentence Parsing 1– 2– 3– LKB
Grammatical Parsing: BNC n n n n n n n n Fa Fa& Fa+ Fc Fc& Fc+ Fn Fn& Fn+ Fr Fr& Fr+ G G& G+ J J& J+ N N& N+ Nn Nn& Nn+ Nr Nr& Nr+ Nv Nv& Nv+ P Adverbial clause First conjunct of an adverbial clause Second conjunct of an adverbial clause Comparative clause First conjunct of a comparative clause Second conjunct of a comparative clause Noun clause First conjunct of a noun clause Second conjunct of a noun clause Relative clause First conjunct of a relative clause Second conjunct of a relative clause Genitive First conjunct of a genetive Second conjunct of a genetive Adjective phrase First conjunct of an adjectival phrase Second conjunct of an adjectival phrase Noun phrase First conjunct of a noun phrase Second conjunct of a noun phrase Metalinguistic constituent First conjunct of a metalinguistic constituent Second conjunct of a metalinguistic constituent Temporal adverbial noun phrase First conjunct of a temporal adverbial noun phrase Second conjunct of a temporal adverbial noun phrase Non-temporal adverbial noun phrase First conjunct of a non-temporal adverbial noun phrase Second conjunct of a non-temporal adverbial noun phrase Prepositional phrase n n n n n P& P+ S S& S+ Si Si& Si+ Tg Tg& Tg+ Ti Ti& Ti+ Tn Tn& Tn+ V V& V+ First conjunct of a prepositional phrase Second conjunct of a prepositional phrase Direct speech Main conjuncts of compound sentence Interpolated or appended sentence First conjunct of an interpolated or appended sentence Second conjunct of an interpolated or appended sentence -ing clause First conjunct of an -ing clause -ing Second conjunct of an -ing clause -ing to + infinitive clause First conjunct of a to + infinitive clause Second conjunct of a to + infinitive clause Past participle clause First conjunct of a past participle clause Second conjunct of a past participle clause Verb phrase First conjunct of a verb phrase Second conjunct of a verb phrase
Parsing: Example n n n n n n n 0000001 --------------------------0000001 010 To 03 [II] TO RP@ 0000001 020 DANIEL 03 NP 1 0000001 030 FINCH 03 [NP 1] NN 1 0000001 031 , 03 , 0000001 040 EARL 03 [NN 1] NP 1 0000001 050 OF 03 IO 0000001 060 NOTTINGHAM 06 [NN 1] NP 1 NNJ 0000001 061. 03. 0000002 001 --------------------------0000002 010 9 th 14 MD 0000002 020 January 03 NPM 1 0000002 030 1702/3 13 MF 0000004 010 My 03 APP$ 0000004 020 Lord 03 [NN 1@] NP 1 VV 0@ 0000005 010 I 03 [PPIS 1] MC 1 NP 1@ ZZ 1@ 0000005 020 am 03 [VBM] RA@ 0000005 030 Exceding 06 [VVG] NP 1 NN 1@ JJ@ 0000005 040 Senceible 06 [JJ] NP 1 0000005 050 That 03 [CST] DD 1 0000005 060 I 03 [PPIS 1] MC 1 NP 1@ ZZ 1@ 0000005 070 have 03 VH 0 0000005 080 Given 03 [VVN] JJ@ 0000005 090 her 03 [APP$] PPHO 1 0000005 100 Majtie 06 [NN 1@] NP 1 0000005 110 and 03 CC 0000006 010 The 03 AT 0000006 020 Govornment 06 [NP 1] NN 1 n n n n 0000006 030 Offence 0000006 031 , 0000006 040 and 0000006 050 Severall 0000006 060 Poor 0000006 070 and 0000006 080 Some 0000006 090 Inno 0000007 010 cent 0000007 020 People 0000007 030 being 0000007 040 in 0000007 050 Trouble 0000007 060 on 0000007 070 my 0000007 080 Account 06 [NP 1] NN 1@ 03 , 03 CC 06 [NP 1] NN 1@ VV 0@ 03 JJ 03 CC 03 DD 35 [JJ] NN 1 03 NNU 1 03 NN 03 [VBG] NN 1% 03 [II] RP@ 03 [NN 1] VV 0@ 03 [II] RP@ 03 APP$ 03 [NN 1] VV 0
Computational Lexical Frame Composition (pop: ) model(p 1): model(p 2): model(p 3): model(pop ): companion(verb): referent(pop ): model(pop*): p 1 + p 2 + p 3 /p/ /o/ /p/ verb c, pop* event - popconsists of three phonemes, -the first is an instance of /p/, - the second is an instance of /o/, -the third is another instance of /p/. - popis an instance of a verb - verbs (e. g. pop take a сompanion ) which precedes them. - poprefers to pop* -pop* is an instance of an event, and involves a ‘changer’, amely the referent of the companion c
Probabilistic Modeling in Expert Parsing IS S 1 S 2 S 3 S 4
The objectives of Computational Linguistics 3. Word-sense disambiguation Automatic word-sense disambiguation depends on the linguistic context encountered during processing. Statistical methods exploit the distributional characteristics of words in large texts and require training, which can come from several sources, including human intervention. n
The objectives of Computational Linguistics 4. Formal Semantics n Formal semantics is rooted in the philosophy of language and has as its goal a complete and rigorous description of the meaning of sentences in natural language. It concentrates on the structural aspects of meaning. n Lexical semantics has recently become increasingly important in natural language processing. This approach to semantics is concerned with psychological facts associated with the meanings of words. A very interesting application of lexical semantics is Word. Net (G. Miller 1990), which is a lexical database that attempts to model cognitive processes. n
WORDNET Lexical Database (J. Miller, Princeton University) 2 senses of task Sense 1 undertaking, project, task, labor - (any piece of work that is undertaken or attempted; "he prepared for great undertakin gs") => work - (activity directed toward making or doing something; "she checked several points nee ding further work") Sense 2 job, task, chore - (a specific piece of work required to be done as a duty or for a specific fee; "estimates of the city's loss on that job ranged as high as a million dollars"; "the job of repairing t he engine took several hours"; "the endless task of classifying the samples"; "the farme r's morning chores") => duty - (work that you are obliged to perform for moral or legal reasons; "the duties of the jo b")
Practical Applications of Computational Linguistics 1. Indexing retrieval textualdatabases and in n n Textual databases electronically store texts such as publications of daily newspapers, medical journals, and court decisions The user of such a database should be able to find exactly those documents and passages with comfort and speed which are relevant for the specific task in question. The World Wide Web (WWW) may also be viewed as a large, unstructured textual database, which daily demonstrates to a growing number of users the difficulties of successfully finding the information desired.
Practical Applications of Computational Linguistics n 2. Machine translation Especially in the European Union, currently with 25 different countries, the potential utility of automatic or even semi-automatic translation systems is tremendous.
Practical Applications of Computational Linguistics n 3. Automatic textproduction Large companies which continually bring out new products such as engines, video recorders, farming equipment, etc. , must constantly modify the associated product descriptions and manuals. A similar situation holds for lawyers, tax accountants, personnel officers, etc. , who must deal with large amounts of correspondence in which most of the letters differ only in a few, well-defined places. Here techniques of automatic text production can help, ranging from simple templates to highly flexible and interactive systems using sophisticated linguistic knowledge.
Practical Applications of Computational Linguistics n 4. Automatic checking text Applications in this area range from simple spelling checkers (based on word form lists) via word form recognition (based on a morphological parser) to syntax checkers based on syntactic parsers which can find errors in word order, agreement, etc.
Practical Applications of Computational Linguistics n 5. Automatic content analysis The printed information on this planet is said to double every 10 years. Even in specialized fields such as natural science, law, or economics, the constant stream of relevant new literature is so large that researchers and professionals do not nearly have enough time to read it all. A reliable automatic content analysis in the form of brief summaries would be very useful. Automatic content analysis is also a pre conditionfor conceptbased indexing, needed for accurate retrieval from textual databases, as well as for adequate machine translation.
Practical Applications of Computational Linguistics n n 6. Automatic tutoring There are numerous areas of teaching in which much time is spent on drill exercises such as the more or less mechanical practicing of regular and irregular paradigms in foreign languages. These may be done just as well on the computer, providing the students with more fun (if they are presented as a game, for example) and the teacher with additional time for other, more sophisticated activities such as conversation. Furthermore, these systems may produce automatic protocols detailing the most frequent errors and the amount of time needed for various phases of the exercise. This constitutes a valuable heuristics for improving the automatic tutoring system ergonometrically.
Practical Applications of Computational Linguistics n 7. Automatic dialog information and systems These applications range from automatic information services for train schedules via queries and storage in medical databases to automatic tax consulting.
Natural Language Processing Areas of Research: 1. 2. 3. Knowledge acquisition natural language (NL) texts of various from kinds, from interactions with human beings, and from other sources. Language processing requires lexical, grammatical, semantic, and pragmatic knowledge. Interaction with multiple underlying to give NL systems the utility systems and flexibility demanded by people using them. Single application systems are limited in both usefulness and the language that is necessary to communicate with them. 3. Partial understanding gleaned from multi-sentence language, or from fragments of language. Approaches to language understanding that re quire perfect input or that try to produce perfect output seem doomed to failure because novel language, incomplete language, and errorful lan guage are the norm, not the exception.
Natural Language Processing n 1. 2. The limitations today's practical language processing of technology may be summarized as follows: Domains must be narrow enough so that the constraints on the relevant semantic concepts and relations can be expressed using current knowl edge representation techniques, i. e. , primarily in terms of types and sorts. Processing may be viewed abstractly as the application of recursive tree rewriting rules, including filtering out trees not matching a certain pattern. Handcrafting is necessary, particularly in the grammatical components of systems (the component technology that exhibits least dependence on the application domain). Lexicons and axiomatizations of critical facts must be developed for each domain, and these remain time-consuming tasks.
Natural Language Processing 3. The user must still adapt to the machine, but, as the products testify, the user can do so effectively. 4. Current systems have limited discourse capabilities that are almost exclusively handcrafted. Thus current systems are limited to viewing in teraction, translation, and writing and reading text as processing a sequence of either isolated sentences or loosely related paragraphs. Consequently, the user must adapt to such limited discourse.
Knowledge acquisition for language processing Types of knowledge: n Domain model major classes of entities in the domain and. The the relations among them must be specified. In a Navy command con trol domain, example concepts are. Naval unit, vessel, surface vessel, submarine, carrier, combat readiness ratings, and equipment classes. Class-subclass relationships must be specified, e. g. , every carrier is a surface vessel, and every surface vessel is a vessel and a Naval unit. Other important relationships among concepts must be specified. For instance, each vessel has a single overall combat readiness rating, and each Navy unit has an equipment loadout (a list of equipment classes).
Knowledge acquisition for language processing n Lexical syntax. Syntactic information about each word of the domain includes its part of speech (e. g. , noun, verb, adjective, adverb, proper noun), its related forms (e. g. , the plural of ship is regular ships, but the plural of sheep and child are irregular sheep and children), and its grammatical properties (e. g. , the verb sleep is intransitive). Lexical semantics. For each word, its semantics must be specified as a concept in the domain model, a relation in the domain model, or some formula made up of concepts and relations of the domain model.
Knowledge acquisition for language processing n Mappings to the target application. Transformations specify how to map each concept or relation of the domain model into an appropriate piece of code for the underlying application system.
Linguistic Means of Knowledge Interpretation n n Grammar rules. Most rules of English grammar are domain independent, but almost every domain encountered in practice either turns up instances of general tries that had not been encountered in previous domains, or requires that some domain-specific additions be made to the grammar. General semantic interpretation. Some semantic rules may be. rules con sidered to be domain independent, such as the general entity/property relationship that is often expressed with the general verb "have" or the general preposition "of. " To the extent that such general rules can be found and em bedded in a system, they do not have to be redone for every new domain.
The Rule of LDA The success of all current NLP systems depends on the so-called the Limited Domain Assumptio n, which may be stated as follows: one does not have to acquire domain-dependent information about words that do not denote some concept or relation in the domain. n Another way of looking at this assumption is that it says understanding can be confined to a limited domain. n
The Rule of LDA The Limited Domain Assumption simplifies the problem of NLP in three ways: (1) formal modelling of the concepts and relationships of the domain is feasible; (2) enumer ation of critical non-linguistic knowledge is possible; (3) both lexical and semantic ambiguity are limited. Reducing lexical ambiguity reduces the search space and improves effectiveness of most NL systems.
The Bank of English by COBUILD KWIC software
British National Corpus SARA software A 6 B 670 Such cries traditionally used to promote fertility occur ironically in The Waste Land , punctuating its infertile or sexually perverted world where previous rituals and beliefs seem to be lapsing into futility. A 6 S 1036 Secondly, even though group weddings do occur, for example among people like the Samburu of East Africa, where traditionally all the young men of the same age group married on the same day a group of girls, this does not mean that the marriages are any less individual affairs for having been celebrated all at the same time. A 8 W 415 Although they were not explored during the post-summit press conference, many additional items were discussed by Mr Bush and Mr Gorbachev, including the traditionally vexed question of Star Wars, the Strategic Defence Initiative. A 9 E 200 The flight from the countryside has compounded the misery of the urban poor, traditionally the bedrock of Sandinista support. ABB 1821 COOK'S NOTE: This clam, or vongole, sauce is traditionally served with spaghetti. ABC 1302 Swordfish is a great delicacy in Europe, and has been traditionally caught on longlines, although catch rates have been declining in recent years. AK 6 1211 Lamb, representing the innocence of Christ, is traditionally eaten at Easter.
Web Concordancer, Hong Kong (Author concordancing) “. txt” format
lublin.ppt