Скачать презентацию OCLC some development and research directions in the Скачать презентацию OCLC some development and research directions in the

dd1a81b3ac20438c51db0d053105ea20.ppt

  • Количество слайдов: 57

OCLC: some development and research directions in the areas of metadata management and knowledge OCLC: some development and research directions in the areas of metadata management and knowledge organization. Presented to Library of Congress cataloging managers retreat. Lorcan Dempsey (with contributions from colleagues) VP Research and Chief Strategist Library of Congress, 15 June 2004

Topics Framework for World. Cat directions Metadata management and knowledge organization Open World. Cat Topics Framework for World. Cat directions Metadata management and knowledge organization Open World. Cat Working with web services Some research, some production Making data work harder

Framework for World. Cat directions Framework for World. Cat directions

Collections grid stewardship high Books Journals low high Special collections Archives Freely-accessible web resources Collections grid stewardship high Books Journals low high Special collections Archives Freely-accessible web resources uniqueness • Newspapers • Gov. docs • CD, DVD • Maps • Scores low • Rare books • Local history materials • Archives & Manuscripts • Theses & dissertations Research and learning materials • e. Prints/tech reports • Learning objects • Courseware • E-portfolios • Research data Untransferred records

World. Cat – the what? World. Cat: - Grow - Version - Improve • World. Cat – the what? World. Cat: - Grow - Version - Improve • Easier to use (FRBR) The Open Web Both surface and acquire World. Cat content • Microcontent • Evaluative content Add special collections & institutional content to World. Cat: dissertations, cultural heritage collections, Eprints, learning objects

World. Cat – the how? Research in these areas World. Cat – the how? Research in these areas

Some issues • Metadata variety – Encoding, element sets, values/content – Provenance • Metadata Some issues • Metadata variety – Encoding, element sets, values/content – Provenance • Metadata manipulation – – Validation, identification Enhancement, augmentation Relation, FRBR, deduplication Transformation • Schematization and web services – Make data available in forms that allow machine services to be flexibly built on top of them – Everything is a service

Open World. Cat Open World. Cat

Open World. Cat • Facilitate the rendezvous of users and library services on the Open World. Cat • Facilitate the rendezvous of users and library services on the web • Surface the library where the users are • Help release the value of library services in the working and learning lives of their users.

Open World. Cat Architecture World. Cat , Additional collections can be added to Worldcatlibraries Open World. Cat Architecture World. Cat , Additional collections can be added to Worldcatlibraries domain Metadata OCLC will use tools such as x. ISBN and FRBR models to organize World. Cat public views suitable for low precision access Schemas and Vocabularies OCLC Developed Geo-locator services to matches users to extensive First. Search World. Cat institution and user profiles OCLC Uses Host of Authentication and Authorization tools to progressively match content to rights OCLC Organizes World. Cat content in model suitable for harvesting, anticipate unique aspects of various portals Google, Yahoo and Book Vendors Aggregators Portals Profiles and Relationships Content Owner Access Distribution, Search, Display Organization and Presentation

Current partners • Book vendors and bibliographies § § § Click in presentation mode Current partners • Book vendors and bibliographies § § § Click in presentation mode to go through to examples ABE Books ABAA Alibris HCBIB Book. Page • Search engines (pilot with 2 M records exposed as web pages for harvesting) § § Google Yahoo! Try a search for: A history of caricature and grotesque in literature and art

Google and Yahoo! timeline 8/14/03: Google contract signed 10/22/03: Google harvests 150, 000 records Google and Yahoo! timeline 8/14/03: Google contract signed 10/22/03: Google harvests 150, 000 records 9/19/03: Google given go-ahead to harvest records Jan. ’ 04: 32, 000 inbound links logged (SSO) Dec. ’ 03: Records begin to appear in Google; 800 inbound-links logged (searchsite-originating [SSO]) 5/21/04: Yahoo contract signed Mar. ’ 04: 109, 000 inbound links logged (SSO) May’ 04: 725, 000 inbound links logged (SSO) 5/28/04: Yahoo harvests records 6/6/04: Yahoo completes indexing of 2 million WC records

Traffic Full record displays. Projected for June. Traffic Full record displays. Projected for June.

Metadata management and knowledge organization Metadata management and knowledge organization

Research activities • Structures – FRBR – VIAF § BT – FAST – Vocabulary Research activities • Structures – FRBR – VIAF § BT – FAST – Vocabulary encoding and mappings • Services – x. ISBN – Metadata transformation services – Terminology services – Authority services – Automatic classification and cataloging § § Eprints uk Web harvesting

FRBR • OR Work-set algorithm Click in presentation mode to go through to Fiction. FRBR • OR Work-set algorithm Click in presentation mode to go through to Fiction. Finder • Work-based view incorporated into World. Cat in First. Search in late 2004 • Fiction. Finder – 2. 6+ million fiction records from Worldcat, clustered by OCLC’s FRBR algorithm – Make greater use of data (genres, settings, imaginary characters, etc) • Participate in ongoing FRBR refinement

FAST FAST

Vocabulary mappings Vocabulary mappings

Services • Web services – Computer to computer applications over the web • Unplug Services • Web services – Computer to computer applications over the web • Unplug and play – Unbundling monolithic applications and making functionality available in more modular ways • Reuse and sharing – Of services! • Release the value in a web environment of the historical library investment in vocabularies and structures

x. ISBN • An experimental web service – – Leverages FRBRization work Give it x. ISBN • An experimental web service – – Leverages FRBRization work Give it an ISBN, it returns all related ISBNs Based on World. Cat Designed for machine-to-machine data exchange • Examples: – Check user ILL requests against all editions/versions in OPAC – Find library’s editions when user finds any edition/version of item on Amazon – Check OPAC for all editions during selection/acquisitions/gift book processing – …

x. ISBN Install FRBR Bookmarklets in your browser to see x. ISBN working. See x. ISBN Install FRBR Bookmarklets in your browser to see x. ISBN working. See Bookmarklets page At www. oclc. org/researchworks/ Click cover to search Seattle Public Library Click cover to search amazon. co. uk

Metadata schema transformations • Metadata Schema Transformation Services – Evaluate approaches to crosswalking metadata Metadata schema transformations • Metadata Schema Transformation Services – Evaluate approaches to crosswalking metadata – Prototype transformation environments • The XSLT “short path” – Supports lightweight XML processing – Designed for public access – Deliverables: § OAI repository of METS-captured xwalks [NEW] • The “long path” option – Designed for high-fidelity translations – May be public or proprietary – Deliverables: Toolkit; expertise in non-MARC formats

1 Transform to intermediate form 2 STRUCTURAL TRANSFORM File of records in format X 1 Transform to intermediate form 2 STRUCTURAL TRANSFORM File of records in format X Translate input semantics to CORE SEMANTIC TRANSLATION 3 CORE SEMANTIC TRANSLATION 5 4 STRUCTURAL TRANSFORM File of records in format Y Transform to output format Y Translate CORE to output semantics

A crosswalk as a METS record • Describe the crosswalk object in the METS A crosswalk as a METS record • Describe the crosswalk object in the METS header. • Assemble and identify six objects in the METS structural map: – The source metadata schema – The target metadata schema – The crosswalk – Human-readable and executable versions of each • Associate metadata for each file in the METS Descriptive Metadata Section.

Crosswalk METS record in OAI repository Crosswalk METS record in OAI repository

What the METS encoding solves • The semantic and syntactic information required for interpreting What the METS encoding solves • The semantic and syntactic information required for interpreting and executing a crosswalk is collected into a single object. • The repository is searchable by humans and automated processes. • Services can be built on top of it. • It encourages the development and standardization of crosswalks. These outcomes are possible because every component in the system is a standard.

Terminology Services • Terminology services are web services for knowledge organization schemes (kos) – Terminology Services • Terminology services are web services for knowledge organization schemes (kos) – e. g. , authority files, subject heading systems, thesauri, taxonomies, and classification schemes • A web service that provides mappings from a term in one vocabulary to one or more terms in another vocabulary is an example of a terminology service

Current Situation • A plethora of vocabularies • Many encoding formats • Few inter-vocabulary Current Situation • A plethora of vocabularies • Many encoding formats • Few inter-vocabulary connections • Identifiers inadequate – Unavailable – Temporary – Inconsistent

Terminology services system framework • Schema transformation: • – MARC XML – SKOS – Terminology services system framework • Schema transformation: • – MARC XML – SKOS – Zthes • – – – – Record enhancement: – Inter-vocabulary mappings – Persistent identifiers (info: uri) • Access: – – Human-readable: Browse interface (ERRo. Ls) Search/retrieve records (SRU/W) Switch between schema-specific views (XSLT) – m 2 m: § § § Publishing (OAI) Search/retrieve records (SRU/W) info: uri resolution (Open. URL) Open standards: • MARC 21 XML/XSLT/XPath SKOS Zthes SRU/SRW OAI info: uri Open. URL Open source software: – OCLC OAICat – OCLC SRU/SRW server – OCLC ERRo. L J 2 EE webapp • Open content: – GSAFD, others… • • Open access Web services-oriented

Schema Transformation • MARC XML – Authority Format & Classification Format • SKOS – Schema Transformation • MARC XML – Authority Format & Classification Format • SKOS – Simple Knowledge Organization Systems • Zthes – Z 39. 50 Profile for Thesaurus Navigation. 5 – Based on Z 39. 19 (NISO Thesaurus Standard)

Vocabulary Processing schema transformation Conversion from most formats: • Z 39. 19 • wordlists Vocabulary Processing schema transformation Conversion from most formats: • Z 39. 19 • wordlists in PDF, etc. data enhancement Vocabulary X Initial conversion to MARC XML • Authorities format, or, • Classification format Zthes Add: • provenance (MARC Org. Codes) • persistent identifiers (info: kos) Optionally, add: • inter-vocabulary mappings • Concepts & terms Vocabulary Y • persistent identifers (info: kos) SKOS

Info: kos • Info: uri – provides a mechanism for the registration of public Info: kos • Info: uri – provides a mechanism for the registration of public namespaces that are used for the identification of information assets • The kos identifier – provides a mechanism for identifying knowledge organization schemes and the concepts used in those schemes. It has two elements: § scheme § concept

New services environment Zthes SKOS DC http: //alcme. oclc. org/srw/ [SRW request] http: //errol. New services environment Zthes SKOS DC http: //alcme. oclc. org/srw/ [SRW request] http: //errol. oclc. org/xyz. srw 2 oai [OAI gateway] server http: //errol. oclc. org/xyz. sru [SRU gateway] http: //errol. oclc. org/xyz. search [SRU-to-HTML gtwy] http: //errol. oclc. org/xyz. html [HTML interface] http: //errol. oclc. org/xyz. rss [RSS feed] http: //errol. oclc. org [Open. URL base URL] server (info: uri resolver) [ERRo. Ls server stylesheets applied] [SRW/SRU response]

Name authority lookup • Interactive Lorcan Dempsey • As a web service • An Name authority lookup • Interactive Lorcan Dempsey • As a web service • An example: authority control service invoked from within Dspace Click in presentation mode.

Working with web services Working with web services

Making data work harder Making data work harder

Data mining • Research • Production – Collection analysis service in development phase – Data mining • Research • Production – Collection analysis service in development phase – Leverages World. Cat data in interactive mode § § Compare my collection to my peers Compare my collection to my neighbors Profile my collection by subject, by age, … etc

Collection • Change creates demand for better data. • Growing interest in knowing more Collection • Change creates demand for better data. • Growing interest in knowing more about: – Characteristics – Gaps and overlaps – Use • Tuning collections based on data. • Focus collection spending where creates most value.

Some projects • Characteristics of collections – World. Cat – CIC • Compare ILL, Some projects • Characteristics of collections – World. Cat – CIC • Compare ILL, circulation and holdings data. • Last copy: what is irreplaceable? • ARL Global Resources. – Exploring coverage of overseas titles in ARL libraries. • Depends on consistency, coverage, currency

Comparing CIC Collection Profiles Comparing CIC Collection Profiles

Audience level Forge Letters Audience level Forge Letters

Profiles of ‘Letters’ & ‘Forge’ Example 0. 81 0. 65 Profiles of ‘Letters’ & ‘Forge’ Example 0. 81 0. 65

Topics Framework for World. Cat directions Metadata management and knowledge organization Open World. Cat Topics Framework for World. Cat directions Metadata management and knowledge organization Open World. Cat Working with web services Some research, some production Making data work harder

Thoughts • Machines will do more work – Consistency becomes more important • Variety Thoughts • Machines will do more work – Consistency becomes more important • Variety • Low precision – Make data work

The pattern is new … The knowledge imposes a pattern and falsifies For the The pattern is new … The knowledge imposes a pattern and falsifies For the pattern is new in every moment

Further information Thanks to colleagues in OCLC Research for contributions to this presentation. Further Further information Thanks to colleagues in OCLC Research for contributions to this presentation. Further information about OCLC Research projects can be found at http: //www. oclc. org/research/ Thanks to colleagues in OCLC Collection Management Services for contributions to this presentation. Further information about Open World. Cat at http: //www. oclc. org/worldcat/pilot/