Скачать презентацию OAIster What s with the Weird Name Kat Hagedorn Скачать презентацию OAIster What s with the Weird Name Kat Hagedorn

3b35d8752e7afc16eb54b84eb8453fc4.ppt

  • Количество слайдов: 21

OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

What is OAIster? Is/was a means for UM to test the OAI protocol… (hence What is OAIster? Is/was a means for UM to test the OAI protocol… (hence the name) } A method for sharing metadata among institutions and groups of people } A means of developing a search service for end-users worldwide }

Basics of OAI Basics of OAI

What does OAIster collect? Harvests all metadata from all OAI data providers (within reason) What does OAIster collect? Harvests all metadata from all OAI data providers (within reason) } Only keeps metadata that points to digital objects, e. g. , articles, photographs, datasets, etc. in digitized form } All available via search service… }

Searching OAIster Time to show off OAIster… } http: //www. oaister. org/ } Searching OAIster Time to show off OAIster… } http: //www. oaister. org/ }

A little history } } Service is now 3. 5 years old Started with A little history } } Service is now 3. 5 years old Started with 66 data providers and a little over 200 K records Now have 572 data providers and “a little” over 6 million records 37% US, 63% international

Visibility of OAI } Surprising who hasn’t made their metadata shareable through OAI § Visibility of OAI } Surprising who hasn’t made their metadata shareable through OAI § Harvard, Yale, Stanford…the big ones } Initially perplexing, but now clearer: § always done at the end § only recently thought of at initiation of projects § truthfully, many institutions not collaborative…

Examples of data providers } Many data providers are huge, e. g. , § Examples of data providers } Many data providers are huge, e. g. , § ar. Xiv: physics preprint and postprint articles § pubmed: medical articles, although restricted § pictureaustralia: images from govt and academic institutions in Australia § lcoa: Library of Congress digital archives § usc: U South California census data

Examples of data providers Most are small, though } Many around 100 records } Examples of data providers Most are small, though } Many around 100 records } Value of making their records available } § increased visibility § inclusion in bigger search service than theirs § incorporation in Yahoo! Search

Yahoo! Search } Two years ago, collaborated with team at Yahoo! Search to send Yahoo! Search } Two years ago, collaborated with team at Yahoo! Search to send our metadata to them for indexing § e. g. , “gardens at albury” in Yahoo! Search § know it’s not static html roboting § Ispart. Of Victorian Railways collection. Many, many more hits } Also send metadata to Google }

System design XSL stylesheets (per source type) UM harvester OAI-enabled DC records Non-OAIenabled DC System design XSL stylesheets (per source type) UM harvester OAI-enabled DC records Non-OAIenabled DC records Record storage Bib. Class indexes XSLT transformation tool Search interface (XPAT)

Transformation of metadata } Most metadata needs to be brushed off § adding an Transformation of metadata } Most metadata needs to be brushed off § adding an http: // to the front of URLs } Or raked § removing instances of

Why normalize? } Sample date values <date>2 -12 -01</date> <date>2002 -01 -01</date> <date>0000 -00 Why normalize? } Sample date values 2 -12 -01 2002 -01 -01 0000 -00 -00 1822 between 1827 and 1833 18 --? November 13, 1947 SEP 1958 235 bce Summer, 1948

Why use a CV? } Sample subject values <subject>30, 51, 52</subject> <subject>1852, Apr. 22. Why use a CV? } Sample subject values 30, 51, 52 1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson]. Slavery--United States--Controversial literature view of interior with John Henry sculpture Particles (Nuclear physics) -Research.

Best practices Fixing more than half of the data providers is cumbersome } Individuals Best practices Fixing more than half of the data providers is cumbersome } Individuals at OAI-enabled institutions started a “Best Practices” group to inform data providers what they ought to do } http: //oai-best. comm. nsdl. org/cgibin/wiki. pl? Table. Of. Contents }

2 nd phase OAI “Best Practices” group sponsored by the Digital Library Federation, which 2 nd phase OAI “Best Practices” group sponsored by the Digital Library Federation, which also… } Sponsors our latest grant } § Better and more easily calculated statistics § Search interface improvements § Clustering / classification techniques § Using richer metadata

Clustering / classification Using automated means to take a selection of metadata and determine Clustering / classification Using automated means to take a selection of metadata and determine “what it’s about” } Working with Emory University (one of our grant partners) to test their tool } Results will be integrated into search so can search in smaller group of OAIster records }

Using richer metadata Data providers must use simple Dublin Core } Very sparse schema Using richer metadata Data providers must use simple Dublin Core } Very sparse schema for describing objects } § dc: title must contain main title, sorted title and alternative titles § dc: subject doesn’t distinguish between geographical, hierarchical, temporal…

Using richer metadata Encouraging use of richer metadata, especially MODS (Metadata Object Description Schema) Using richer metadata Encouraging use of richer metadata, especially MODS (Metadata Object Description Schema) from LOC } Developed testbed for grant deliverables } § currently only shows MODS work… § http: //www. hti. umich. edu/m/mods/

Other stuff Well, make it smaller somehow… } Clean up Boolean interface } § Other stuff Well, make it smaller somehow… } Clean up Boolean interface } § squinch fields together § include more normalization Make it available through federated search } Proselytize sharing metadata } Test, test }

Contact me } } Kat Hagedorn UM Library Information Technology khage@umich. edu www. oaister. Contact me } } Kat Hagedorn UM Library Information Technology khage@umich. edu www. oaister. org