3b35d8752e7afc16eb54b84eb8453fc4.ppt
- Количество слайдов: 21
OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005
What is OAIster? Is/was a means for UM to test the OAI protocol… (hence the name) } A method for sharing metadata among institutions and groups of people } A means of developing a search service for end-users worldwide }
Basics of OAI
What does OAIster collect? Harvests all metadata from all OAI data providers (within reason) } Only keeps metadata that points to digital objects, e. g. , articles, photographs, datasets, etc. in digitized form } All available via search service… }
Searching OAIster Time to show off OAIster… } http: //www. oaister. org/ }
A little history } } Service is now 3. 5 years old Started with 66 data providers and a little over 200 K records Now have 572 data providers and “a little” over 6 million records 37% US, 63% international
Visibility of OAI } Surprising who hasn’t made their metadata shareable through OAI § Harvard, Yale, Stanford…the big ones } Initially perplexing, but now clearer: § always done at the end § only recently thought of at initiation of projects § truthfully, many institutions not collaborative…
Examples of data providers } Many data providers are huge, e. g. , § ar. Xiv: physics preprint and postprint articles § pubmed: medical articles, although restricted § pictureaustralia: images from govt and academic institutions in Australia § lcoa: Library of Congress digital archives § usc: U South California census data
Examples of data providers Most are small, though } Many around 100 records } Value of making their records available } § increased visibility § inclusion in bigger search service than theirs § incorporation in Yahoo! Search
Yahoo! Search } Two years ago, collaborated with team at Yahoo! Search to send our metadata to them for indexing § e. g. , “gardens at albury” in Yahoo! Search § know it’s not static html roboting §
System design XSL stylesheets (per source type) UM harvester OAI-enabled DC records Non-OAIenabled DC records Record storage Bib. Class indexes XSLT transformation tool Search interface (XPAT)
Transformation of metadata } Most metadata needs to be brushed off § adding an http: // to the front of URLs } Or raked § removing instances of
Why normalize? } Sample date values
Why use a CV? } Sample subject values
Best practices Fixing more than half of the data providers is cumbersome } Individuals at OAI-enabled institutions started a “Best Practices” group to inform data providers what they ought to do } http: //oai-best. comm. nsdl. org/cgibin/wiki. pl? Table. Of. Contents }
2 nd phase OAI “Best Practices” group sponsored by the Digital Library Federation, which also… } Sponsors our latest grant } § Better and more easily calculated statistics § Search interface improvements § Clustering / classification techniques § Using richer metadata
Clustering / classification Using automated means to take a selection of metadata and determine “what it’s about” } Working with Emory University (one of our grant partners) to test their tool } Results will be integrated into search so can search in smaller group of OAIster records }
Using richer metadata Data providers must use simple Dublin Core } Very sparse schema for describing objects } § dc: title must contain main title, sorted title and alternative titles § dc: subject doesn’t distinguish between geographical, hierarchical, temporal…
Using richer metadata Encouraging use of richer metadata, especially MODS (Metadata Object Description Schema) from LOC } Developed testbed for grant deliverables } § currently only shows MODS work… § http: //www. hti. umich. edu/m/mods/
Other stuff Well, make it smaller somehow… } Clean up Boolean interface } § squinch fields together § include more normalization Make it available through federated search } Proselytize sharing metadata } Test, test }
Contact me } } Kat Hagedorn UM Library Information Technology khage@umich. edu www. oaister. org


