Скачать презентацию Encoding Information for Interchange An introduction to the Скачать презентацию Encoding Information for Interchange An introduction to the

a8e57f1f104b3ae7c68c1fa176b79049.ppt

  • Количество слайдов: 39

Encoding Information for Interchange An introduction to the TEI Lou Burnard Humanities Computing Unit Encoding Information for Interchange An introduction to the TEI Lou Burnard Humanities Computing Unit Oxford University Encoding Information for Interchange: St Malo, 1998

The problem • SGML/XML markup is powerful, flexible, and can be customised to meet The problem • SGML/XML markup is powerful, flexible, and can be customised to meet most (all? ) needs • But to use it, you need a formal specification (aka document type definition or. DTD) • Where do you get one from? • How do you choose? 2 Encoding Information for Interchange: St Malo, 1998

Some answers • Roll your own – from scratch – within an existing framework Some answers • Roll your own – from scratch – within an existing framework • Take what’s on offer • Use the TEI architecture 3 Encoding Information for Interchange: St Malo, 1998

The Text Encoding Initiative Origins and Goals Modular Architecture Customization 4 The Text Encoding Initiative Origins and Goals Modular Architecture Customization 4

Where did the TEI come from? • From the humanities research community • librarians Where did the TEI come from? • From the humanities research community • librarians and cybernauts • linguists, historians, lexicographers. . . • Sponsors • ACH Association for Computers and the Humanities • ACL Association for Computational Linguistics • ALLC Association for Literary and Linguistic Computing • Funders • • 5 U. S. National Endowment for the Humanities Mellon Foundation Commission of European Communities DG XIII Social Science and Humanities Research Council of Canada Encoding Information for Interchange: St Malo, 1998

… and where is it going? • Continued work in new application areas – … and where is it going? • Continued work in new application areas – manuscript description – physical description – non-SGML data – XML conformance • Continued take-up • Need for new infrastructure • Corrected reprint of P 3 due summer 1998 6 Encoding Information for Interchange: St Malo, 1998

Goals of the TEI • • better interchange and integration of data support for Goals of the TEI • • better interchange and integration of data support for all texts, in all languages, from all periods guidance for the perplexed: whatto encode assistance for the specialist: howto encode any information of interest a user-driven codification of existing best practice 7 Encoding Information for Interchange: St Malo, 1998

TEI Deliverables • • • coherent set of recommendations for text encoding comprising several TEI Deliverables • • • coherent set of recommendations for text encoding comprising several distinct SGML tagsets based on existing practice documented in a reference manual tutorials for general and specialised audiences. . . but no software 8 Encoding Information for Interchange: St Malo, 1998

The TEI modus operandi. . . • identify significant particularities independent of notation or The TEI modus operandi. . . • identify significant particularities independent of notation or realisation • avoid controversy, over-delicacy, inadequacy • seek generalizable solutions, acceptable to a consensus 9 Encoding Information for Interchange: St Malo, 1998

. . . and some consequences • • • 10 focus on content, not . . . and some consequences • • • 10 focus on content, not presentation descriptive, not prescriptive Occam's razor modular, extensible dtd highly general in application, needs customization for particular areas Encoding Information for Interchange: St Malo, 1998

Who uses TEI? • see http: //www-tei. uic/orgs/tei/app/ • digital librarians and archivists • Who uses TEI? • see http: //www-tei. uic/orgs/tei/app/ • digital librarians and archivists • LC, HTI, UVA, CETH, OTA. . . • Language Engineering projects • EAGLES, BNC, MULTEX, Parole, Silfide • academic researchers • Women Writers Project, Project Orlando, Model Editions Partnership, Canterbury Tales Project, Bodleian Library, and many more. . . 11 Encoding Information for Interchange: St Malo, 1998

Designing your DTD • How can a single mark-up scheme handle a large variety Designing your DTD • How can a single mark-up scheme handle a large variety of requirements ? – all texts are alike – every text is different • Learn from the database designers – one construct, many views – each view a selection from the whole 12 Encoding Information for Interchange: St Malo, 1998

How many dtds might you need? • one(the Corporate or WKWBFY approach) • none How many dtds might you need? • one(the Corporate or WKWBFY approach) • none Anarchic or NWEUMP approach) (the • as many as it takes Mixed Economy or (the WNSA approach) or is there a better way? 13 Encoding Information for Interchange: St Malo, 1998

The TEI solution: modularization • a (very) large number of element and attribute definitions The TEI solution: modularization • a (very) large number of element and attribute definitions • organised as tagsets (core, base, additional, or auxiliary) • grouped into classes a single main DTD with many faces (a British DTD) 14 Encoding Information for Interchange: St Malo, 1998

Combining Tag Sets • And how does one combine tagsets? The how-manydtds problem is Combining Tag Sets • And how does one combine tagsets? The how-manydtds problem is back. – all tag sets, all the time (the table d'hôte model) – a few pre-selected combinations (the combination plate model) – in completely unconstrained abandon (the smorgasbord model) – one from column A, two from column B (the Chinese menu model) 15 Encoding Information for Interchange: St Malo, 1998

The Chicago Pizza Model <!ENTITY % base “(deep. Dish|thin. Crust|stuffed)” > <!ENTITY % topping The Chicago Pizza Model 16 Encoding Information for Interchange: St Malo, 1998

To build a view of the TEI dtd, take. . . • the core To build a view of the TEI dtd, take. . . • the core tagsets • the base of your choice • the toppings of your choice ]> . . . 17 Encoding Information for Interchange: St Malo, 1998

… trim to fit. . . • user extension files • rename elements • … trim to fit. . . • user extension files • rename elements • undefine elements to be redefined* or removed * see later 18 Encoding Information for Interchange: St Malo, 1998

… and cook thoroughly • • ‘compile’ the dtd to remove all parameterization easier … and cook thoroughly • • ‘compile’ the dtd to remove all parameterization easier to use for some software better project management see http: //firth. natcorp. ox. ac. uk/~tei/pizza. html • don’t forget the documentation! 19 Encoding Information for Interchange: St Malo, 1998

TEI base tagsets • one only must be selected • defines basic structural components TEI base tagsets • one only must be selected • defines basic structural components • currently defined: – prose, verse, drama – transcribed speech – dictionaries – terminological databases • mixtures of bases require special treatment 20 Encoding Information for Interchange: St Malo, 1998

TEI additional tagsets • sets of elements for specialised application areas • can be TEI additional tagsets • sets of elements for specialised application areas • can be mixed and matched ad lib • currently provided: – linking and alignment; analysis; feature structures; certainty; physical transcription; textual criticism, names and dates; graphs and trees; figures and tables; language corpora. . 21 Encoding Information for Interchange: St Malo, 1998

How does this work ? • Main dtd consists of marked sections, each (potentially) How does this work ? • Main dtd consists of marked sections, each (potentially) containing one tagset • By default, all tagsets are IGNORE d ]]> 22 Encoding Information for Interchange: St Malo, 1998

How does this work? (contd) • Tagsets contain element and attlist declarations, each also How does this work? (contd) • Tagsets contain element and attlist declarations, each also enclosed by a marked section • By default all elements are INCLUDEd ]]> 23 Encoding Information for Interchange: St Malo, 1998

How does this work? (contd) • Element names (GIs) are always referred to indirectly, How does this work? (contd) • Element names (GIs) are always referred to indirectly, so that they may be renamed 24 Encoding Information for Interchange: St Malo, 1998

Element Classes • Model classes – elements which share syntactic properties (i. e. occur Element Classes • Model classes – elements which share syntactic properties (i. e. occur in same position) • Attribute classes – elements which share attributes • Class membership can be inherited • Another way of doing architectural forms 25 Encoding Information for Interchange: St Malo, 1998

Some TEI model classes • divn: structural elements like divisions <div>, <div 1>, <div Some TEI model classes • divn: structural elements like divisions

,
,
, . . . • divtop: elements which can appear at the start of a divn element , , . . . • chunk: paragraph-like elements , , … • phrase: elements which appear within chunks , , , . . . 26 Encoding Information for Interchange: St Malo, 1998

Some TEI semantic classes • data: phrases likely to be normalised or processed non Some TEI semantic classes • data: phrases likely to be normalised or processed non textually ,