
07701a1f66b66f50db0a3e5538f36143.ppt
- Количество слайдов: 37
TEI Projects and Small Libraries Examining TEI Markup Decisions and Procedures Richard Wisneski, Head, Bibliographic and Metadata Services Virginia Dressler, Digital Librarian Stephanie Pasadyn, Technical Services Librarian Kelvin Smith Library November 2009
Introduction
The Project: • Digitizing and Encoding Kelvin Smith Library’s (KSL) Books on Cleveland, Ohio and the Western Reserve Digital Text Collection o Currently, there are 120 texts in Digital Case, in PDF format o Goal is to text-encode approx. 110 of these, and add more in the future o Texts Date from late 19 th to early 20 th centuries • Using “Book Viewer” in KSL’s “Digital Case” (institutional digital repository) to Display texts’ PDF, Page Images, and TEI • Have applied for an NEH Humanities Collections and Reference Resources Grant to fund project • Will collaborate with neighboring institutions to incorporate into our collection their texts on the history of Cleveland the Western Reserve
Why Do This Project? • Availability of Texts is Limited: See Spreadsheet • No other institution has a project akin to this in Northeast Ohio • Interest in Cleveland Western Reserve history among historians and scholars. Cleveland… – was the northern terminus of the Ohio and Erie Canal – was a leading U. S. manufacturing center, second only to Detroit in the automotive industry, and a leader in steel production, ship building, and other industrial sectors – had the highest concentrations of some Eastern European heritage populations outside of their home countries during the 19 th and early 20 th centuries
Why TEI? • To allow researchers to have access to an electronic text that does not require special-purpose software or hardware • To analyze information – provide a standard textencoding scheme and metadata language which accommodates searching, retrieval, etc. • To share information – have a standard format for data interchange in humanities research • Texts are being encoded in Level 3 (structural) • To create stand-alone electronic text with hierarchy identified • Emphasis on divisions within text, tables, lists, notes, front and back matter
Current Project Practices
Workflow
Project Log Currently, kept on Google Docs in MS Excel shared file:
DIGITIZATION PROCESS
• Step 1: Review and assess digital images – Transfer content from CDs – Assess quality of images (dpi, original canvas size, overall quality) – Rescan titles if needed – Organize/sort for OCR and image conversion
Review digital content
Organize and assess
Image assessment
Key points in assessment • Complete, uncorrupted files • Ascertain image quality as to current practices and standards • Check for legibility of text for OCR process • Compare illustrations and photos with original source if needed • Rescan if needed
Optical Character Recognition • Step 2: – Using OCR software (Abbyy Fine. Reader) to create a text file from the image files – Time saving options in software • removing hyphens between page breaks • retaining page breaks
Sidekick 1400 u
Image conversion • Processing tiff files for the book viewer – – Bit map to grayscale Aware batch conversion to jpeg 2000 format Batch renaming tool Ingestion into Fedora
Book viewer demo
Text Clean-UP Student Workers, Volunteers do work in Open. Office and o. Xygen • Spell-check • Insert page breaks and numbers • Replace images with notation • Remove hyphenations that are erroneously inserted from OCR • Example
TEI Headers • Professional Catalogers create TEI headers: <? xml version="1. 0" encoding="UTF-8"? > <? oxygen RNGSchema="http: //digitalcase. edu: 9000/fedora/get/ksl: p 5 schema/tei_all. rng" type="xml"? > <TEI xmlns: xsi="http: //www. w 3. org/2001/XMLSchema-instance" xmlns="http: //www. tei-c. org/ns/1. 0"> <tei. Header> <file. Desc> <title. Stmt> <title type="main">Report on the preliminary surveys for the Cleveland, Painesville and Ashtabula Rail Road Company </title> <title type="sub">An electronic version</title> <author> <pers. Name>Harbach, Frederick, 1817 -1851</pers. Name> </author> <resp. Stmt> <name xml: id="ksl">Kelvin Smith Library, Case Western Reserve University</name> <resp>Publisher of TEI-conformant electronic version. </resp> </resp. Stmt> <resp. Stmt> <name xml: id="mxb">Mary Burns</name> <resp>TEI Header creator</resp> </resp. Stmt> <resp. Stmt> <name xml: id="rlw">Richard Wisneski</name> <resp>encoder</resp> </resp. Stmt> </title. Stmt> <extent>1. 448 MB</extent> <publication. Stmt> <publisher>Digital Case, Kelvin Smith Library, Case Western Reserve University</publisher> <pub. Place>Cleveland, Ohio</pub. Place> <distributor n="collection">KSL Digital Book Collection</distributor> <availability> <p>This work is in the public domain and may be freely downloaded for personal or academic use. </p> </availability> <idno>http: //hdl. handle. net/2186/ksl: harrep 00</idno> <date when-iso="2009 -09 -01" /> </publication. Stmt> ETC.
TEI Structural Mark-up • Text Encoders mark text following TEI P 5, Level 3 <body> <div type="section" xml: id="section 1" n="1"> <pb n="5" facs=“clecle 00 -00003. jp 2“ /> <head>HISTORY</head> <p>The first settlers of Cleveland were from Connecticut; and, according to tradition, as soon as three families had established themselves — it was about the beginning of the present century — they set up a school for their <hi rend="ital">five children. </hi> The population had increased to <hi rend="ital">fifty-seven</hi>in 1810, and the oldest inhabitants think there was a school taught in that year. It is certain, however, that it could not have been very large. The earliest school mentioned in any record was kept by a Mr. Capman in 1814. But it was not till 1836, the year of organization under the City Charter, that any system of <hi rend="ital">public instruction </hi>was adopted. Previous to this year, the schools, of whatever grade or character, were supported mainly by private enterprise. </p> CONTINUED >>
TEI Structural Mark-up (continued) <table rend="boxed" cols="3" rows="4" xml: id="Table 2"> <head>TABLE OP CURVATURE. </head> <row> <cell> </cell> <cell role="label">SOUTH ROUTE. </cell> <cell role="label">NORTH ROUTE. </cell> </row> <row> <cell>Deflections to Right</cell> <cell>323° 20</cell> <cell>236° </cell> </row> <row> <cell>Deflections to Left</cell> <cell>402</cell> <cell>213° 45'</cell> </row> …AND SO ON
Text Encoding
Learning TEI Learning Text Encoding as Secondary Job Responsibility • Practical Application • Internal Documentation • Case. Learns
Learning TEI Coding a New Text • • One on one overview Creating master outline Coding page by page Referring to and updating documentation
Learning TEI
Learning TEI
Learning TEI
Learning TEI Challenges • • Human Error Evolution of Institutional Practice Minimal Time Allotment Limited Opportunity for Continuing Education
Issues
To Be Done • Re-Scan some of the books • Continue to encode • Hold half- full-day workshops on text encoding to full-time staff • Create of MODS, MARC-XML, and METS records • Re-examine “Book Viewer”
Discussion Questions • Ways to expedite text encoding • Ways to scan texts – outsourcing? • Funding challenges (outsourcing, scanning, equipment) • Book viewer – effective? Ineffective? • Text-Encoding Level – change? • Learning TEI – in-house classes and documentation, TEI-C documentation. Webinars? Online tutorials? Certificate program?
Contact Richard Wisneski: rlw 54@case. edu Virginia Dressler: vad 17@case. edu Stephanie Pasadyn: sap 68@case. edu
Links and references • Digital Case homepage • Digital Case Book Viewer collection
• Women Writers Online, Brown University: http: //textbase. wwp. brown. edu/WWO/index. html • Poetess Archive, University of Miami at Ohio: http: //unixgen. muohio. edu/~poetess/collections/index. php • Victorian Women Writers Project, Indiana University: http: //www. indiana. edu/~letrs/vwwp/index. html • Swinburne Project, Indiana University: http: //swinburnearchive. indiana. edu/swinburne/www/s winburne/
07701a1f66b66f50db0a3e5538f36143.ppt