corpus linguistics. lecture 01. INTRODUCTION.ppt
- Количество слайдов: 32
INTRODUCING CORPUS LINGUISTICS Spring semester 2012
How many linguistics? n n n Structural linguistics Functional linguistics Cultural linguistics (inter)linguoculturology Generative linguistics – transformational linguistics Cognitive linguistics Pragmalinguistics (pragmasemantics) Text/discourse linguistics Quantitative/quantificational linguistics Computer linguistics Applied linguistics Psycholinguistics
Criteria of discrimination n Object of analysis: n Language ¨ ¨ n Speech production situated in a context Method: n n n Speech samples Communication ¨ n Linguistic elements Linguistic structures empirical research computational analysis Purposes n n find out regularities and principles of the object’s built-up and functioning come up with a generalization, a speculative scheme or a conception
Science structure theoretical experimental applied
Evidence-based research in medicine
Доказательная лингвистика (evidence-based linguistics) Р. Г. Пиотровский (1922 -2009)
Which linguistics?
Armchair linguistics vs. Empirical linguistics n n Armchair linguists: drinking strong coffee and making interesting observations about language from the comfort of one’s own armchair preoccupied with speakers’ hazy “intuitions” about language structure. n n n Emprical linguists experiments new findings and insights about the nature of language emerging from investigations of real-life speech and writing — often, though not always, using computers and electronic language samples (“corpora”).
n Simpson J. Empirical linguistics. – London & NY: Continuum International, 2001.
Methods n Experiment ¨ linguistic ¨ psycholinguistic Validity evaluation: how valid are your intuitive judgements? n Corpus linguistics n
What is/are CORPUS/CORPORA = electronic database representing register specific text/speech samples, produced by languages users in actual situations of communication (oral/written samples) n = computer database n
Why is it good? n before Long and arduous process to compile and search substantial bodies of data in order to confirm or challenge one’s own intuitions as a language user n n After vast bodies of data are available at the touch of a keyboard and a click of mouse corpus data guarantee relevance and authenticity
Comparing intuitions and corpora data n BIG collocates with nouns meaning…. LARGE collocates with nouns meaning … n What can be BIG? What can be LARGE?
Definition? &Linguistic corpus – set of ‘texts’ (samples of oral and written speech) electronic unified structured annotated philologically competent
More criteria n A set of texts compiled to form a corpus texts according to certain principles, annotated according to a certain standard, provided with a search tool (corpus manager)
Corpora and corpora n ‘first order corpus’ - a random set of texts – a collection of texts belonging to a genre, an author, a period, etc. ¨ PURPOSE – reference n ‘second order corpus’ – linguistically annotated, representative, principled, compiled with specific purpose ¨ PURPOSE – research
Electronic libraries (random list) n n n n n Корпус латинских текстов “Персей”. Корпус текстов Ф. М. Достоевского. Электронная энциклопедия "Брокгауз и Ефрон". Фундаментальная электронная библиотека. Российская виртуальная библиотека. Библиотека М. Мошкова. Электронная библиотека Химического фак-та МГУ. ……………………………. и т. д.
Corpus vs. electronic library Linguistic corpus Electronic library Samples of texts Full texts Linguistic annotation (tagging) Bibliographic + historic/cultural data linguostatistics No statistic data Relative representativeness Full texts Selection of linguistic material on the basis of certain criteria (representativeness, linguistic relevance, etc. ) Text selection - up to library compilers
Corpus linguistics & A branch of computational linguistics – dealing with the development and usage of linguistic corpora on the basis of computer technologies
Corpus research is based on… Empirical linguistics conceptions n Functional linguistics postulates n Contextual approach to meaning n The dynamics of linguistic norm n Studies of variation n
Norm vs. usage How many norms? n Language is NOT a rigid system n Asymmetric dualism of the linguistic sign n Dynamic/functional categorization of linguistic units n
Dynamics of usage: variation and tendences n n Morphological forms; syntactic combinability patterns (N’s N vs. NN; help smb do sth; help smb to do) Derivational forms (classic example vs. classical example) Combinability (have never been to vs. have never been in; utter joy vs. sheer joy; mental illness vs. mental disease) Syntactic function (e. g. predicative vs. attribute use of adjectives)
Dialectics of linguistic norm “One should know canonized rules to be able to violate them in speech” Prof. N. A. Kobrina
Linguistic Corpora Ø Brown Corpus. Ø Ø Ø Ланкастерский корпус английского языка (Lancaster. Oslo-Bergen Corpus, LOB). British National Corpus. http: //www. natcorp. ox. ac. uk International Corpus of English. Bank of English. Scotch corpus Cobuild Corpus. Мангеймский корпус немецкого языка. Чешский национальный корпус. Уппсальский корпус русского языка. Национальный корпус русского языка. www. ruscorpora. ru Корпусы китайского, турецкого, эстонского, албанского и многих других языков
End-users Ø Ø Ø Ø Ø Applied linguists; Lexicographers; Theoretical linguists; Language teachers; Computer linguists; Other language-related experts; Experts in social sciences; Language learners Novice students Corpora as basis for speech production analysis, etc.
Multipurpose research based on corpora Corpus allows to get info on the following: ü Frequency data ü Frequency fluctuations ü Fluctuations of contextualized usage ü Differences in register usage ü Differences in individual usage (‘author style’) ü Combinability /collocability peculiarities ü Comparison of collocation patterns in different languages (for translation purposes – parallel corpora)
n LINGUISTIC DATA featuring ¨ systemic features of the language, ¨ dynamic aspects of systemic parameters ¨ discourse peculiarities (context of usage /register specificity) ¨ pragmatic functions of linguistic units
Requirements for the credit_1 n n n Understand the principles of empirical, evidence-based linguistics Know the history of corpus linguistics. Understand tasks, possibilities and limitations of corpus analysis. Be familiar with types of corpora, corpus design principles. Be familiar (on operational level) with terminology (the list of most commonly used terms and abbreviations is attached)
Requirements for the credit_2 n Operational skills in corpus analysis: be competent in using corpora: British National Corpus, Corpus of Contemporary American, Corpus of Historical American, Scotts Corpus, Russian National Corpus on different websites (http: //corpus. byu. edu; http: //bncweb. lancs. ac. uk, www. ruscorpora. ru, etc. ) ¨ be competent in creating a search string / filling in search boxes: ¨ n n n n lexical combinability (left/right) KWIC search grammatical (part of speech) combinability (left/right) combinability of morphological forms register limitations on combinability frequency data lemma synonymy combinability of synonyms register frequency of synonyms diachronic characteristics. be able to analyze the data. PROJECT: choose a problematic issue (lexical, morphological, syntactic aspect), analyse the issue with the help of the corpora.
Course plan (18 hrs) 1. OVERVIEW: ¨ ¨ 2. 3. 4. CORPUS: theoretical foundations, design, types CORPUS DESIGN: TECHNICAL DEFINITIONS HOW CAN CORPORA HELP IN LEXIS RESEARCH ¨ ¨ 5. case studies practice HOW CAN CORPORA HELP IN STUDYING REGISTER/GENRE ¨ ¨ 9. case studies practice HOW CAN CORPORA HELP IN STUDYING SYNTAX ¨ ¨ 8. aspects of analysis practice: HOW CAN CORPORA HELP IN STUDYING MORPHOLOGY ¨ ¨ 7. aspects of analysis practice HOW CAN CORPORA HELP IN STUDYING GRAMMAR ¨ ¨ 6. historical background and reasons for switching to corpus analysis corpora presentation case studies practice PROJECT PRESENTATIONS
Thank you! TKlepikova@gmail. com


