
BNC.pptx
- Количество слайдов: 17
: THE BNC INTRODUCTION The British National Corpus
What is BNC? The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide crosssection of current British English, both spoken and written.
What are the main stages of the BNC’s history? Two main stages: the planning (design stage) (1980 s, Oxford University Press). The BNC’s structure the was shaped. execution (creation stage) (1991 - 93) Permissions clearance Collection of texts Encoding of texts Linguistic annotation of texts Storage and documentation of texts
What components does the BNC include? Written component (90 -million words) + spoken component (10 -million words)
Upon what features are written texts chosen and organized? Texts are chosen upon selection criteria & subdivided upon classification features.
What are the BNC’s written texts’ selection criteria? Written texts: domain (the kind of writing it contains: informative writings(75% ) applied sciences, arts, belief & thought, commerce & finance, leisure, natural & pure science, social science, world affairs) medium the kind of publication in which it occurs (60% - come from books; 25% - periodicals, 5 -10% - miscellaneous published material (brochures, advertising leaflets, etc); 5 -10% unpublished written material such as personal letters and diaries, essays and memoranda, etc; less than 5% - material written to be spoken (for example, political speeches, play texts, broadcast scripts, etc. time (the date of publication of a text) no text should date back further than 1975. Exception: imaginative works only, a few of which date back to 1964, because of their continued popularity and consequent effect on the language.
What are the BNC’s written texts’ classification features? Sample size (number of words) and extent (start and end points) Topic or subject of the text Author's name, age, gender, region of origin, and domicile Target age group and gender "Level" of writing (a subjective measure of reading difficulty) : the more literary or technical a text, the "higher" its level. NB: No fixed proportions were specified for these features, although the intention was to make sure that there should be an appropriate level of variation within each criterion.
How is the BNC’s spoken component organized? Two parts: demographic part (transcriptions of spontaneous natural conversations made by members of the public. A total of 124 volunteers were recruited by the British Market Research Bureau. (different social groups) Recruits used a personal stereo to record all their conversations unobtrusively over two or three days, and logged details of each conversation in a special notebook) context-governed part (transcriptions of recordings made at specific types of meeting and event) NB Information about the participants, such as age and sex, was recorded when available.
What are the categories of social context used to create the context governed ? part (spoken component) categories of social context: Educational and informative events, such as lectures, news broadcasts, classroom discussion, tutorials. Business events such as sales demonstrations, trades union meetings, consultations, interviews. Institutional and public events, such as sermons, political speeches, council meetings, parliamentary proceedings. Leisure events, such as sports commentaries, after-dinner speeches, club meetings, radio phone-ins. NB There were no fixed proportion specified for the categories of social context, although the intention was to collect roughly equal quantities of speech.
What were the main steps of the BNC creation? Permission obtained for a text to be included → → the text was converted to machine readable form by one of the commercial partners (OUP, Longman or Chambers). → the resulting text was converted to the standard project encoding format at OUCS, (where its accuracy and internal consistency was also validated). → the text was passed to UCREL, (where word class tagging was automatically added), → the text was returned to OUCS for documentation and accession into the corpus. NB: Each stage of corpus processing was recorded in a database maintained at OUCS.
Abbreviations: OUP - Oxford University Press OUCS - Oxford University Computing Services UCREL - University Centre for Computer Corpus Research on Language. (UCREL is a research centre of Lancaster University. In summer 2011 the first summer school in Corpus Linguistics was held in Lancaster University ).
What is the purpose of a language corpus? “The purpose of a language corpus is to provide language workers with evidence of how language is really used, evidence that can then be used to inform and substantiate individual theories about what words might or should mean. Traditional grammars and dictionaries tell us what a word ought to mean, but only experience can tell us what a word is used to mean. ” http: //www. natcorp. ox. ac. uk/using/index. x ml
How can the BNC be practically used? studying linguistic competence. (words’ features can be analyzed by means of tagging and use of concordancing programs) statistical analysis hypothesis testing checking occurrences or validating linguistic rules on a specific universe
How one can access the BNC? Web-access: free and restricted access. Main free sources: BNC Simple Search A free search tool on the BNC website. Useful for quick queries where frequency information is useful and where 50 hits is enough to explore. BNC at Brigham Young Univ by Mark Davies (USA) A free interface to the BNC. Search for words/phrases, restrict by text category and word-class and choose different search result display options. Quick and powerful. Just the Word Free, simple application that shows combinations with and offers alternatives to the words or phrases entered. Phrases in English by W. H. Fletcher in consultation with M. Stubbs Free tool that identifies phrases. Search for n-grams consisting of up to eight words or part-of-speech tags. Links are provided at the BNC web site: http: //www. natcorp. ox. ac. uk/Wkshops/Materials/specialising. x ml? ID=online
What are main tips of the BNC ? practical usage Corpus linguistics is a specific linguistics area that requires some theoretical and practical preparedness. In order to be a better user one may consult special manuals provided free of charge. One of the difficulties is a great amount of abbreviations related merely to the types of text sources. In this case manuals are very helpful. (e. g. Reference Guide for the British National Corpus (World Edition) edited by Lou Burnard October 2000, 331 pp. ) At BNC at Brigham Young Univ we can search for: “ As with other interfaces to the BNC, you can search by words (mysterious), phrases (fairly certain or white + noun), lemmas (all forms of words, like sing or tall), wildcards (un*ly or r? n*), and more complex searches such as un-X-ed adjectives or verb + any word + a form of ground. Notice that from the "frequency results" window you can click on the word or phrase to see it in context in this lower window”. http: //corpus. byu. edu/bnc/
Main advantages and disadvantages of the BNS Advantages: “+” & “-” depends on user’s goals: British English only, 1975 – 1994 “-” Is not vast, comparable to COCA only, however COCA and COHA are comparable to Google “+” A wide range of sources from academic texts to leaflets, volunteers belong to different social groups.
Web pages (where not mentioned on the slide): http: //www. natcorp. ox. ac. uk/corpus/index. x ml? ID=creation#brief http: //www. natcorp. ox. ac. uk/corpus/creatin g. xml http: //ucrel. lancs. ac. uk/ http: //1 yeespin. wordpress. com/2011/04/01/t he-british-national-corpus/ http: //corpus. byu. edu/bnc/
BNC.pptx