CL pres 1.ppt
- Количество слайдов: 23
Computational Lexicography 4 th year Semester 8
Lecture 1. Computationl Lexicography as a Brunch of General Lexicological Science Ever since the well-known dictum that “in the beginning was the word”, the word has always been basic to human understanding and communication. Facts about the word are recorded in a dictionary, whose making has been undertaken for centuries by the lexicographer.
Modern Dictionary Making Nowadays dictionary-making requires a number of new meanings, especially with the computer which offers its large resources for storage, analysis, dissemination and exchange of data. It also suggests increased sophistication in data-base design and linguistic research. Thus the procedure of dictionary-making now prominently includes the work of linguists, engineers and computer scientists.
Sciences: Interrelation o o o Linguistic research on theory and nature of the word is concerned with the nature of the lexicon. Lexicology in general aims at the analysis of the lexicon. Lexicography, popularly known as dictionary-making, is concerned with the description of the lexicon. If these two related disciplines combine their efforts with the computer for both lexicon-building and dictionary-making, we’ll have a new science – computer lexicography, which represents a convergence or interest from the viewpoints of Computational Linguistics, Computational Lexicography and Computer Corpus Linguistics.
Sciences: Tasks o o o The task of Computational Linguistics is to specify lexicons which are formal (i. e. explicit for the computers) and rich enough for the building of natural language processing systems. Such lexicons, however, may not necessarily be suitable for human consumption. Computational Lexicography is used to refer to either using the computer to achieve the goal of fully automatic lexicographical tasks or utilizing existing machine-readable versions of linguistic dictionaries into a format explicit enough for computational linguistic systems. Computer Corpus Linguistics focuses on the principles and practice of compiling texts of actual language in use.
CL: Tasks Thus we can distinguish the following main tasks of Computational Lexicography: o lexicon extraction and building; o lexicon-based language modelling; o computational storage of the lexicon; o the employment of richer lexicons for natural language processing systems; o defining standards for lexical exchange and reusability, so that individual efforts can be maximised.
The Lexicon in Computational Lexicography Contemporary linguistic theories are now emphasizing an evergreater reliance on the lexicon because the lexicon may be viewed as the central repository of linguistic knowledge. For the computational linguist the lexicon is the “bottleneck” of natural language processing systems. This includes attempting to manipulate machine-readable versions of printed dictionaries and transforming them into computational lexicons. Such storing a dictionary in a lexical data/knowledge base allows the search for lexical information beyond the perspective of mere printed dictionary, as well as allowing the creation of various lexicons, when needed.
Developing Notions of the Lexicon o o o At early days the lexicon was equated merely as “a dictionary, a book teaching the signification of words”. Nowadays the lexicon is generally understood as “the vocabulary or a language, especially in dictionary form offering various types of lingustic information”. D. Crystal also called it lexis. The word “lexicon” can be treated differently. A useful distinction may be made between the lexicon as ‘an object defined by linguistic theory’ and the dictionary which presents certain information drawn from the lexicon in a stylized way.
Lexicon: Definitions George Grimes describes the lexicon as ‘simply the totality of all the information about words and word-like objects in a natural language, it registers items and their properties in contrast to the grammar (which registers combinations of items and their properties) (1988) o Paul Bennet makes a distinction between a grammar (i. e. a set of rules for the formation of meaningful and well-formed sentences) and a lexicon (i. e. a set of words and expressions whose use is governed by those rules) (1986). Grimes definition for the lexicon is interesting since it raises the question of whether a theory-neutral lexicon is possible to create. His definition also concerns the problem as to what the lexicon should contain, since individual lexicons will have their own specifications, depending on the purpose for which they were built. o
Lexicon: Definitions A more recent definition was suggested by J. Mel’cuk (1992). He views the lexicon as ‘a specific list of lexical units of a language, arranged in a specific way and supplied with specific information, the whole being designed for a specific purpose’. Conclusion: the lexicon has to be discussed through its relations to the grammar, since what precisely constitutes grammatical and lexical facts respectively continue to be a matter of debate.
Bloomfield’s Definition Within the framework of American structural linguistics the lexicon was treated as a peripheral component in relation to grammar, as illustrated by L. Bloomfield’s statement: ‘the lexicon is really an appendix of the grammar, a list of basic irregularities’, whereas a grammar was treated like ’the meaningful arrangement of forms of a language (1933).
Chomsky’s Definition The lexicon was conceptualized as an independent component in linguistic theory by Noam Chomsky, one of the most influential linguists of this century. However, in his theory lexical facts were not only said to be a different type from general facts, but the lexicon was still viewed as a ‘wastebin’, into which irregular items went, whereas regular variations are not matters for the lexicon, which should contain only idiosyncratic items’ (Ch, 1968).
Chomsky’s Definition Chomsky suggests the differentiation between ‘Internalized Language’ (I-language) i. e. mental knowledge of the language, assuming that this occurs in a homogeneous speaker-hearer community (also called language competence), and ‘Externalized Language’ (E-language) (also called language performance), i. e. everyday speech and writing (newspapers, televised speeches and dialogues etc).
Associative Lexicon o o The relation between the lexicons of E- and Ilanguage may be formulated in terms of Associative Lexicon. The term was suggested by A. Makkai in 1980. An Associative Lexicon is an information retrieval system that represents in visual and audible form the knowledge native speakers possess about the lexicon of their language.
Associative Lexicon and the Human’s Brain The human brain is, naturally, the primary ‘information retrieval system’ activating our ability to associate lexemes with one another. Any artificial system we may build must, therefore, try to do justice to what there is in human sociopsychological reality. The natural Associative Lexicons we carry in our heads are dialectically and sociolinguistically limited. They are a subject to growth and shrinkage due to learning and forgetting.
Associative Lexicon: Features Associative Lexicon represents the cumulative knowledge of geographic and sociolinguistic dialects. It indicates that members of various speech communities have the ability to learn from one another either by memorization or by immigration. The AL is not to be linked to the ideal hearer-speaker in the homogenous society, because such people do not exist.
Associative Lexicon VS Printed Dictionary The difference between a printed dictionary and AL is that: o conventional dictionaries tend to form natural semantic nets around concretely observable and abstract entities while AL aims at building associative groups of lexemes. o conventional dictionaries rely traditionally on alphabetization by which they try to present a totality of the available lexis in the form of a list, while AL represents the set of lexemes according to their frequency of usage, exact range of dialectal habitat, the speaker’s sociological status etc.
Associative Lexicon: Advantages o o Starting from its creation AL was entitled not to be printed but computerized, because a computerized lexicon offers various non-alphabetic paths of access to the word according to various linguistic (e. g. phonetic, grammatical and semantic features of classification. The access also can be based on the word’s associative or semantic interconnections with other words. Hence, the course of Computational Lexicography emphasizes the importance of storing the lexicon in a computer format. Storing the lexicon in this format allows for flexibility in its retrieval. It also reduces the number of problems, associated with its organization.
The Trend Towards Lexicalism o o Lexicalism i. e. the tendency to shift linguistic explanation from facts about constructions to facts about words, may be said to have started from N. Chomsky. It emphasizes that the transformational rules within the grammar are unsuitable for explaining the relations between partially analogous structures: e. g. They destroyed Pompeii. Their destruction of Pompeii.
Analysis: The lexical information for destroy should include subcategorization features which allow for an object Noun Phrase (NP) Pompeii. The task of the lexicon is then to specify either the nominal form (if destroy is the head of an NP) or the verbal form (if destroy is a VP). The relations between destroy and destruction can be explained not by means of the transformational component but rather in terms of the lexicon.
Lexicalism: Trends Nowadays lexicalism has moved to a more thoroughgoing shift from the grammar to the lexicon. But it is one of the parallel trends noticeable in linguistics starting from 1980. They are: o Wholism – the tendency to minimize the distinction between the lexicon and the rest of the grammar. o Trans-constructionism – the tendency to reduce the number of rules that are specific to just one construction. o Poly-constructionism – the tendency to increase the number of particular constructions that are recognized in grammar.
Lexicalism: Trends o o Relationism – the tendency to refer explicitly to grammatical relations, and even to treat them as primary in relation to the constituent structure. Mono-stratalism – the tendency to reject the transformational idea that a sentence is a syntactic structure and can not be shown in a single structural representation. Cognitivism – the tendency to emphasize the similarities and continuities between linguistic and non linguistic knowledge. Implementationism – the tendency to implement grammars in terms of computer programs.
Lexicalism and Word Grammar o o Lexicalism at its beginning failed to define precisely what the lexicon is, and how it differs from the grammar. In 1990 there appeared Word Grammar introduced by R. Hudson, where he reinterprets lexicalism as an approach to grammar in which words are basic and the boundary around the lexicon plays no part. In this view the word assumes central importance, because it is universally recognized as the unit the grammar describes and as the boundary between morphology (inside the word) and syntax (relation between words). For Hudson the word is where internal structure is most arbitrary and relative to the meaning, so it is the unit in recognition of which memory plays the biggest part.


