![Скачать презентацию Encoding Information for Interchange An introduction to the Скачать презентацию Encoding Information for Interchange An introduction to the](https://present5.com/wp-content/plugins/kama-clic-counter/icons/ppt.jpg)
a8e57f1f104b3ae7c68c1fa176b79049.ppt
- Количество слайдов: 39
Encoding Information for Interchange An introduction to the TEI Lou Burnard Humanities Computing Unit Oxford University Encoding Information for Interchange: St Malo, 1998
The problem • SGML/XML markup is powerful, flexible, and can be customised to meet most (all? ) needs • But to use it, you need a formal specification (aka document type definition or. DTD) • Where do you get one from? • How do you choose? 2 Encoding Information for Interchange: St Malo, 1998
Some answers • Roll your own – from scratch – within an existing framework • Take what’s on offer • Use the TEI architecture 3 Encoding Information for Interchange: St Malo, 1998
The Text Encoding Initiative Origins and Goals Modular Architecture Customization 4
Where did the TEI come from? • From the humanities research community • librarians and cybernauts • linguists, historians, lexicographers. . . • Sponsors • ACH Association for Computers and the Humanities • ACL Association for Computational Linguistics • ALLC Association for Literary and Linguistic Computing • Funders • • 5 U. S. National Endowment for the Humanities Mellon Foundation Commission of European Communities DG XIII Social Science and Humanities Research Council of Canada Encoding Information for Interchange: St Malo, 1998
… and where is it going? • Continued work in new application areas – manuscript description – physical description – non-SGML data – XML conformance • Continued take-up • Need for new infrastructure • Corrected reprint of P 3 due summer 1998 6 Encoding Information for Interchange: St Malo, 1998
Goals of the TEI • • better interchange and integration of data support for all texts, in all languages, from all periods guidance for the perplexed: whatto encode assistance for the specialist: howto encode any information of interest a user-driven codification of existing best practice 7 Encoding Information for Interchange: St Malo, 1998
TEI Deliverables • • • coherent set of recommendations for text encoding comprising several distinct SGML tagsets based on existing practice documented in a reference manual tutorials for general and specialised audiences. . . but no software 8 Encoding Information for Interchange: St Malo, 1998
The TEI modus operandi. . . • identify significant particularities independent of notation or realisation • avoid controversy, over-delicacy, inadequacy • seek generalizable solutions, acceptable to a consensus 9 Encoding Information for Interchange: St Malo, 1998
. . . and some consequences • • • 10 focus on content, not presentation descriptive, not prescriptive Occam's razor modular, extensible dtd highly general in application, needs customization for particular areas Encoding Information for Interchange: St Malo, 1998
Who uses TEI? • see http: //www-tei. uic/orgs/tei/app/ • digital librarians and archivists • LC, HTI, UVA, CETH, OTA. . . • Language Engineering projects • EAGLES, BNC, MULTEX, Parole, Silfide • academic researchers • Women Writers Project, Project Orlando, Model Editions Partnership, Canterbury Tales Project, Bodleian Library, and many more. . . 11 Encoding Information for Interchange: St Malo, 1998
Designing your DTD • How can a single mark-up scheme handle a large variety of requirements ? – all texts are alike – every text is different • Learn from the database designers – one construct, many views – each view a selection from the whole 12 Encoding Information for Interchange: St Malo, 1998
How many dtds might you need? • one(the Corporate or WKWBFY approach) • none Anarchic or NWEUMP approach) (the • as many as it takes Mixed Economy or (the WNSA approach) or is there a better way? 13 Encoding Information for Interchange: St Malo, 1998
The TEI solution: modularization • a (very) large number of element and attribute definitions • organised as tagsets (core, base, additional, or auxiliary) • grouped into classes a single main DTD with many faces (a British DTD) 14 Encoding Information for Interchange: St Malo, 1998
Combining Tag Sets • And how does one combine tagsets? The how-manydtds problem is back. – all tag sets, all the time (the table d'hôte model) – a few pre-selected combinations (the combination plate model) – in completely unconstrained abandon (the smorgasbord model) – one from column A, two from column B (the Chinese menu model) 15 Encoding Information for Interchange: St Malo, 1998
The Chicago Pizza Model 16 Encoding Information for Interchange: St Malo, 1998
To build a view of the TEI dtd, take. . . • the core tagsets • the base of your choice • the toppings of your choice ]>
… trim to fit. . . • user extension files • rename elements • undefine elements to be redefined* or removed * see later 18 Encoding Information for Interchange: St Malo, 1998
… and cook thoroughly • • ‘compile’ the dtd to remove all parameterization easier to use for some software better project management see http: //firth. natcorp. ox. ac. uk/~tei/pizza. html • don’t forget the documentation! 19 Encoding Information for Interchange: St Malo, 1998
TEI base tagsets • one only must be selected • defines basic structural components • currently defined: – prose, verse, drama – transcribed speech – dictionaries – terminological databases • mixtures of bases require special treatment 20 Encoding Information for Interchange: St Malo, 1998
TEI additional tagsets • sets of elements for specialised application areas • can be mixed and matched ad lib • currently provided: – linking and alignment; analysis; feature structures; certainty; physical transcription; textual criticism, names and dates; graphs and trees; figures and tables; language corpora. . 21 Encoding Information for Interchange: St Malo, 1998
How does this work ? • Main dtd consists of marked sections, each (potentially) containing one tagset • By default, all tagsets are IGNORE d ]]> 22 Encoding Information for Interchange: St Malo, 1998
How does this work? (contd) • Tagsets contain element and attlist declarations, each also enclosed by a marked section • By default all elements are INCLUDEd ]]> 23 Encoding Information for Interchange: St Malo, 1998
How does this work? (contd) • Element names (GIs) are always referred to indirectly, so that they may be renamed 24 Encoding Information for Interchange: St Malo, 1998
Element Classes • Model classes – elements which share syntactic properties (i. e. occur in same position) • Attribute classes – elements which share attributes • Class membership can be inherited • Another way of doing architectural forms 25 Encoding Information for Interchange: St Malo, 1998
Some TEI model classes • divn: structural elements like divisions
,, ,
. . . 26 Encoding Information for Interchange: St Malo, 1998
Some TEI semantic classes • data: phrases likely to be normalised or processed non textually
Some TEI attribute classes • global: attributes which are available to every element n, lang, id, TEIform • linking: attributes for elements which have linking semantics targ. Type, targ. Order, evaluate 28 Encoding Information for Interchange: St Malo, 1998
The class system in action • Simplifying documentation and understanding of the DTD • Parameterizing content models – different for different bases • Simplifies customization – class membership is unaffected – adding new elements to an existing class 29 Encoding Information for Interchange: St Malo, 1998
Parameterized content models • “Components”, for example: – a dictionary is composed of entries – a play is composed of speeches – a novel is composed of paragraphs • in each case, the basic “text soup” (and the structural divisions) remain the same, but they are organized differently 30 Encoding Information for Interchange: St Malo, 1998
How does this work? (contd) • the component class has different members in different bases ]]> ]]> 31 Encoding Information for Interchange: St Malo, 1998
Customization. . . • Removing an element involves – undeclaring it – (NB: ISO 8879 permits references to undefined elements -- though not all vendors know this) • Adding a new element involves – determining its class – defining it – adding it to that class 32 Encoding Information for Interchange: St Malo, 1998
Customization (contd) • Modification of an element implies removal followed by addition • Class membership should be unaffected 33 Encoding Information for Interchange: St Malo, 1998
How does this work? (contd) • Each model class is defined as a parameter entity • Reference to class members is always indirect • Membership extensible (by a kludge) 34 Encoding Information for Interchange: St Malo, 1998
An example: the Lampeter corpus • Requirements – light presentational tagging – structural markup for access – demographic information about text production – small number of tags to ease data capture and validation • Implementation – tagsets: prose base, and tags from four additional sets – some extensions, many exclusions 35 Encoding Information for Interchange: St Malo, 1998
The Lampeter corpus DTD subset ]> 36 Encoding Information for Interchange: St Malo, 1998
The Lampeter corpus extensions. ent % bibl. Struct 'IGNORE' > desunt multa --> % supplied 'IGNORE' > 37 Encoding Information for Interchange: St Malo, 1998
The Lampeter corpus extensions. dtd NB: This is a provisional version(no only! attlists, no documentation… 38 Encoding Information for Interchange: St Malo, 1998
Summary • Designing a successful DTD involves careful, conscious, controlled , theft • Modularize the task • A class system helps identify – what is true of all documents – what is true of some documents • Modifiability can be compatible with standardization 39 Encoding Information for Interchange: St Malo, 1998