dd1a81b3ac20438c51db0d053105ea20.ppt
- Количество слайдов: 57
OCLC: some development and research directions in the areas of metadata management and knowledge organization. Presented to Library of Congress cataloging managers retreat. Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004
Topics Framework for World. Cat directions Metadata management and knowledge organization Open World. Cat Working with web services Some research, some production Making data work harder
Framework for World. Cat directions
Collections grid stewardship high Books Journals low high Special collections Archives Freely-accessible web resources uniqueness • Newspapers • Gov. docs • CD, DVD • Maps • Scores low • Rare books • Local history materials • Archives & Manuscripts • Theses & dissertations Research and learning materials • e. Prints/tech reports • Learning objects • Courseware • E-portfolios • Research data Untransferred records
World. Cat – the what? World. Cat: - Grow - Version - Improve • Easier to use (FRBR) The Open Web Both surface and acquire World. Cat content • Microcontent • Evaluative content Add special collections & institutional content to World. Cat: dissertations, cultural heritage collections, Eprints, learning objects
World. Cat – the how? Research in these areas
Some issues • Metadata variety – Encoding, element sets, values/content – Provenance • Metadata manipulation – – Validation, identification Enhancement, augmentation Relation, FRBR, deduplication Transformation • Schematization and web services – Make data available in forms that allow machine services to be flexibly built on top of them – Everything is a service
Open World. Cat
Open World. Cat • Facilitate the rendezvous of users and library services on the web • Surface the library where the users are • Help release the value of library services in the working and learning lives of their users.
Open World. Cat Architecture World. Cat , Additional collections can be added to Worldcatlibraries domain Metadata OCLC will use tools such as x. ISBN and FRBR models to organize World. Cat public views suitable for low precision access Schemas and Vocabularies OCLC Developed Geo-locator services to matches users to extensive First. Search World. Cat institution and user profiles OCLC Uses Host of Authentication and Authorization tools to progressively match content to rights OCLC Organizes World. Cat content in model suitable for harvesting, anticipate unique aspects of various portals Google, Yahoo and Book Vendors Aggregators Portals Profiles and Relationships Content Owner Access Distribution, Search, Display Organization and Presentation
Current partners • Book vendors and bibliographies § § § Click in presentation mode to go through to examples ABE Books ABAA Alibris HCBIB Book. Page • Search engines (pilot with 2 M records exposed as web pages for harvesting) § § Google Yahoo! Try a search for: A history of caricature and grotesque in literature and art
Google and Yahoo! timeline 8/14/03: Google contract signed 10/22/03: Google harvests 150, 000 records 9/19/03: Google given go-ahead to harvest records Jan. ’ 04: 32, 000 inbound links logged (SSO) Dec. ’ 03: Records begin to appear in Google; 800 inbound-links logged (searchsite-originating [SSO]) 5/21/04: Yahoo contract signed Mar. ’ 04: 109, 000 inbound links logged (SSO) May’ 04: 725, 000 inbound links logged (SSO) 5/28/04: Yahoo harvests records 6/6/04: Yahoo completes indexing of 2 million WC records
Traffic Full record displays. Projected for June.
Metadata management and knowledge organization
Research activities • Structures – FRBR – VIAF § BT – FAST – Vocabulary encoding and mappings • Services – x. ISBN – Metadata transformation services – Terminology services – Authority services – Automatic classification and cataloging § § Eprints uk Web harvesting
FRBR • OR Work-set algorithm Click in presentation mode to go through to Fiction. Finder • Work-based view incorporated into World. Cat in First. Search in late 2004 • Fiction. Finder – 2. 6+ million fiction records from Worldcat, clustered by OCLC’s FRBR algorithm – Make greater use of data (genres, settings, imaginary characters, etc) • Participate in ongoing FRBR refinement
FAST
Vocabulary mappings
Services • Web services – Computer to computer applications over the web • Unplug and play – Unbundling monolithic applications and making functionality available in more modular ways • Reuse and sharing – Of services! • Release the value in a web environment of the historical library investment in vocabularies and structures
x. ISBN • An experimental web service – – Leverages FRBRization work Give it an ISBN, it returns all related ISBNs Based on World. Cat Designed for machine-to-machine data exchange • Examples: – Check user ILL requests against all editions/versions in OPAC – Find library’s editions when user finds any edition/version of item on Amazon – Check OPAC for all editions during selection/acquisitions/gift book processing – …
x. ISBN Install FRBR Bookmarklets in your browser to see x. ISBN working. See Bookmarklets page At www. oclc. org/researchworks/ Click cover to search Seattle Public Library Click cover to search amazon. co. uk
Metadata schema transformations • Metadata Schema Transformation Services – Evaluate approaches to crosswalking metadata – Prototype transformation environments • The XSLT “short path” – Supports lightweight XML processing – Designed for public access – Deliverables: § OAI repository of METS-captured xwalks [NEW] • The “long path” option – Designed for high-fidelity translations – May be public or proprietary – Deliverables: Toolkit; expertise in non-MARC formats
1 Transform to intermediate form 2 STRUCTURAL TRANSFORM File of records in format X Translate input semantics to CORE SEMANTIC TRANSLATION 3 CORE SEMANTIC TRANSLATION 5 4 STRUCTURAL TRANSFORM File of records in format Y Transform to output format Y Translate CORE to output semantics
A crosswalk as a METS record • Describe the crosswalk object in the METS header. • Assemble and identify six objects in the METS structural map: – The source metadata schema – The target metadata schema – The crosswalk – Human-readable and executable versions of each • Associate metadata for each file in the METS Descriptive Metadata Section.
Crosswalk METS record in OAI repository
What the METS encoding solves • The semantic and syntactic information required for interpreting and executing a crosswalk is collected into a single object. • The repository is searchable by humans and automated processes. • Services can be built on top of it. • It encourages the development and standardization of crosswalks. These outcomes are possible because every component in the system is a standard.
Terminology Services • Terminology services are web services for knowledge organization schemes (kos) – e. g. , authority files, subject heading systems, thesauri, taxonomies, and classification schemes • A web service that provides mappings from a term in one vocabulary to one or more terms in another vocabulary is an example of a terminology service
Current Situation • A plethora of vocabularies • Many encoding formats • Few inter-vocabulary connections • Identifiers inadequate – Unavailable – Temporary – Inconsistent
Terminology services system framework • Schema transformation: • – MARC XML – SKOS – Zthes • – – – – Record enhancement: – Inter-vocabulary mappings – Persistent identifiers (info: uri) • Access: – – Human-readable: Browse interface (ERRo. Ls) Search/retrieve records (SRU/W) Switch between schema-specific views (XSLT) – m 2 m: § § § Publishing (OAI) Search/retrieve records (SRU/W) info: uri resolution (Open. URL) Open standards: • MARC 21 XML/XSLT/XPath SKOS Zthes SRU/SRW OAI info: uri Open. URL Open source software: – OCLC OAICat – OCLC SRU/SRW server – OCLC ERRo. L J 2 EE webapp • Open content: – GSAFD, others… • • Open access Web services-oriented
Schema Transformation • MARC XML – Authority Format & Classification Format • SKOS – Simple Knowledge Organization Systems • Zthes – Z 39. 50 Profile for Thesaurus Navigation. 5 – Based on Z 39. 19 (NISO Thesaurus Standard)
Vocabulary Processing schema transformation Conversion from most formats: • Z 39. 19 • wordlists in PDF, etc. data enhancement Vocabulary X Initial conversion to MARC XML • Authorities format, or, • Classification format Zthes Add: • provenance (MARC Org. Codes) • persistent identifiers (info: kos) Optionally, add: • inter-vocabulary mappings • Concepts & terms Vocabulary Y • persistent identifers (info: kos) SKOS
Info: kos • Info: uri – provides a mechanism for the registration of public namespaces that are used for the identification of information assets • The kos identifier – provides a mechanism for identifying knowledge organization schemes and the concepts used in those schemes. It has two elements: § scheme § concept
New services environment Zthes SKOS DC http: //alcme. oclc. org/srw/ [SRW request] http: //errol. oclc. org/xyz. srw 2 oai [OAI gateway] server http: //errol. oclc. org/xyz. sru [SRU gateway] http: //errol. oclc. org/xyz. search [SRU-to-HTML gtwy] http: //errol. oclc. org/xyz. html [HTML interface] http: //errol. oclc. org/xyz. rss [RSS feed] http: //errol. oclc. org [Open. URL base URL] server (info: uri resolver) [ERRo. Ls server stylesheets applied] [SRW/SRU response]
Name authority lookup • Interactive Lorcan Dempsey • As a web service • An example: authority control service invoked from within Dspace Click in presentation mode.
Working with web services
Making data work harder
Data mining • Research • Production – Collection analysis service in development phase – Leverages World. Cat data in interactive mode § § Compare my collection to my peers Compare my collection to my neighbors Profile my collection by subject, by age, … etc
Collection • Change creates demand for better data. • Growing interest in knowing more about: – Characteristics – Gaps and overlaps – Use • Tuning collections based on data. • Focus collection spending where creates most value.
Some projects • Characteristics of collections – World. Cat – CIC • Compare ILL, circulation and holdings data. • Last copy: what is irreplaceable? • ARL Global Resources. – Exploring coverage of overseas titles in ARL libraries. • Depends on consistency, coverage, currency
Comparing CIC Collection Profiles
Audience level Forge Letters
Profiles of ‘Letters’ & ‘Forge’ Example 0. 81 0. 65
Topics Framework for World. Cat directions Metadata management and knowledge organization Open World. Cat Working with web services Some research, some production Making data work harder
Thoughts • Machines will do more work – Consistency becomes more important • Variety • Low precision – Make data work
The pattern is new … The knowledge imposes a pattern and falsifies For the pattern is new in every moment
Further information Thanks to colleagues in OCLC Research for contributions to this presentation. Further information about OCLC Research projects can be found at http: //www. oclc. org/research/ Thanks to colleagues in OCLC Collection Management Services for contributions to this presentation. Further information about Open World. Cat at http: //www. oclc. org/worldcat/pilot/


