Скачать презентацию Metasearch NISO Metasearch Initiative Overview Local Uses of Скачать презентацию Metasearch NISO Metasearch Initiative Overview Local Uses of

b8831bb3e92b7f45b11e3217d1931ccd.ppt

  • Количество слайдов: 66

Metasearch NISO Metasearch Initiative Overview Local Uses of Metasearch Andrew K. Pace John Little Metasearch NISO Metasearch Initiative Overview Local Uses of Metasearch Andrew K. Pace John Little Tim Shearer Head, Systems Senior Analyst, IT Library Systems NCSU Libraries Duke Libraries UNC Libraries Co-chair, NISO-MI Member, NISO-MI TG 3 Member, NISO-MI TG 1 andrew_pace@ncsu. edu John_R_Little@notes. duke. edu sheat@ils. unc. edu

Rumsfeld’s Law of Metasearch You metasearch with the standard you have, not the standard Rumsfeld’s Law of Metasearch You metasearch with the standard you have, not the standard you wish you had.

Credits & Thanks • NISO Metasearch Initiative Team – Jenny Walker, VP Marketing, Ex Credits & Thanks • NISO Metasearch Initiative Team – Jenny Walker, VP Marketing, Ex Libris Co-chair of the initiative – Mike Teets, OCLC, Task Group Chair – Juha Hakala, Nat’l Lib Finland, Task Group Chair – Sara Randall, Endeavor and Katherine Kott, DLF, Task Group chairs – All the active participants of the 3 task groups • TRLN (for co-hosting 2 critical meetings)

Why I’m Here • What is metasearch? • Talk about the history, work, and Why I’m Here • What is metasearch? • Talk about the history, work, and present status of the NISO Metasearch Initiative Committee • Convey the complexity of improving the standing of metasearch • Talk about the work left to be done

I wish I had time to do more of…. • Convincing even the unconvinced I wish I had time to do more of…. • Convincing even the unconvinced that metasearch is a worthwhile endeavor (I will try to do this anyway) • Talk more about Google (I will do this anyway) • I do want to leave plenty of time for discussion

What’s in a name? • Federated search • Channel (RSS) search • Metasearch What’s in a name? • Federated search • Channel (RSS) search • Metasearch

Query form ? ? Query form ? Diverse information resources ? Query form ? ? Query form ? Diverse information resources ?

Federated search Query form ? Just-in-case Diverse information resources Federated search Query form ? Just-in-case Diverse information resources

Federated search examples • ENCompass for Journals On. Site (EJOS) • SCIRUS • Google Federated search examples • ENCompass for Journals On. Site (EJOS) • SCIRUS • Google Scholar

Channel (RSS) Search Query form ? Just-in-time On request ` Diverse information resources Turned Channel (RSS) Search Query form ? Just-in-time On request ` Diverse information resources Turned on or off

Example of Channel Search Example of Channel Search

Metasearch Query form ? Just-in-time Diverse information resources Metasearch Query form ? Just-in-time Diverse information resources

Metasearch Query form ? Diverse information resources integrated searching = metasearching = cross database Metasearch Query form ? Diverse information resources integrated searching = metasearching = cross database searching = parallel searching = broadcast searching = …

Meta. Search Technology Query form ? Metasearch agent Translators/connectors Diverse information resources Meta. Search Technology Query form ? Metasearch agent Translators/connectors Diverse information resources

Metasearch…. Why bother? • Because most patrons do not care where information is or Metasearch…. Why bother? • Because most patrons do not care where information is or who packaged it • Present systems require users to know – How to select / access a database – How to get to them – How to use unique search options • Because Google cannot do it all • Challenge is creating a system that helps users find what they need while minimizing what they need to know

Tennant’s Tenets • Only librarians like to search, everyone else likes to find • Tennant’s Tenets • Only librarians like to search, everyone else likes to find • All things being equal, one place to search is better than two or more. • “Good enough” is often just that • Users are not lazy, they’re human • Our ability to create effective one-stop searching is dependent on our ability to appropriately target user needs • The size of the result set doesn’t matter as much as how the results are presented. (‘the Google lesson’) • Services should be placed as close to the user as possible http: //www. cdlib. org/inside/projects/metasearch/nsdl/

NISO-MI History • ALA (Philadelphia) Midwinter 2003 • NISO-MI Planning (Denver), Spring 2003 • NISO-MI History • ALA (Philadelphia) Midwinter 2003 • NISO-MI Planning (Denver), Spring 2003 • NISO-MI Proselytizing (Washington, D. C. ), Fall 2003 • Task Groups, 2004 - present

The NISO Metasearch Initiative • Any standards identified must help all the stakeholders: – The NISO Metasearch Initiative • Any standards identified must help all the stakeholders: – libraries to deliver services that distinguish their offerings from other free web services – metasearch service providers to offer more effective and responsive services – content providers to deliver enhanced content and protect their intellectual property • Win – Win - Win

NISO-MI History • ALA Midwinter 2003 – Meeting called by 3 providers: Ebsco, Gale, NISO-MI History • ALA Midwinter 2003 – Meeting called by 3 providers: Ebsco, Gale, Proquest – Concerned about impact on services – NISO offered to take leadership role and formed a planning committee – Identified key issues • • • Access Management (a. k. a. authentication/authorization) Resource Identification Metasearch Identification The Search Itself Results Management Statistics – Planned another meeting

NISO-MI History • Denver Spring 2003 – Access Management • Understand metasearch needs; find NISO-MI History • Denver Spring 2003 – Access Management • Understand metasearch needs; find best solutions available; develop best practices – Resource Identification • Work with Dublin Core RSLP and ISO Directories group; exchange format for collection and service descriptions – Search, Retrieve, Results Management • Current environment analysis (Z 39. 50, SRW/SRU, Proprietary API’s, XML Gateways); develop best practice for API’s; continue Z 39. 50 profiling ===================== – Metasearch Identification • Solution: Register a practice that metasearch engines can use to identify themselves – Statistics • Work with Z 39. 7 and COUNTER; Explain Metasearch environment; Adapt existing standards; Publicize importance of statistics

NISO-MI History • D. C. , Fall 2003 – Combined with Open. URL for NISO-MI History • D. C. , Fall 2003 – Combined with Open. URL for 2 -day workshop; briefed a larger audience on the broad issues discussed in Denver; Agreed that a focused initiative was needed – Approved Recommendations – Appointed leadership

NISO-MI Leadership • Overall Co-chairs – Jenny Walker, Ex. Libris – Andrew Pace, NCSU NISO-MI Leadership • Overall Co-chairs – Jenny Walker, Ex. Libris – Andrew Pace, NCSU • Access Management (TG 1 / NISO BA) – Mike Teets, OCLC • Collection Description (TG 2 / NISO BB) – Juha Hakala, National Library of Finland » Pete Johnston, UKOLN, Collection Description » Larry Dixson, LC, Service Description • Search and Retrieve (TG 3 / NISO BC) – Sara Randall, Endeavor – Matt Goldner, OCLC (formerly of Fretwell-Downing) – Katherine Kott, Digital Library Federation

TG 1: Access Management TG 1: Access Management

Active Participants • • • Katie Anstock – Talis Information Ltd. Susan Campbell - Active Participants • • • Katie Anstock – Talis Information Ltd. Susan Campbell - CCLA Frank Cervone – Northwestern University Paul Cope – Auto-Graphics, Inc. David Fiander – University of West. Ontario Ted Koppel – The Library Corporation Mark Needleman – SIRSI Corporation Ed Riding - Dynix RL Scott – US DOE, OSTI Tim Shearer – University of North Carolina Mike Teets – OCLC, Inc. (Chair)

TG 1 – Access management • Authentication – The process where a network user TG 1 – Access management • Authentication – The process where a network user establishes a right to an identity -- in essence, the right to use a name (Lynch 1998) – Are you who you say you are? • Authorization – The process whereby a network user, based on their attributes, receives entitlements or authority to use a resource – So, can you use this?

Access Management Charter • Gather requirements for Metasearch authentication and access needs, inventory existing Access Management Charter • Gather requirements for Metasearch authentication and access needs, inventory existing processes now in place, and develop a series of formal use cases describing the needs. • Deliver – Definitions document of Access Management and Metasearch terms. – Inventory of methods and techniques in use today – Use cases describing authentication and access needs.

TG 1’s Plan of Attack Inventorying Current Approaches and Technologies Breaking apart the problem TG 1’s Plan of Attack Inventorying Current Approaches and Technologies Breaking apart the problem Identifying (defining) all the actors Enumerating functions Developing Use Cases Analyzing Use Cases • Ranking appropriateness of solutions to use cases • Recommend standard or best practice

Situations Can Be Complex Citizen Library Auth Student State Authen Student Library Menu Campus Situations Can Be Complex Citizen Library Auth Student State Authen Student Library Menu Campus Authent Metasearch Databases

Current authentication technologies. Potential solutions? • Proprietary APIs? • NCIP? SIP 2? • LDAP? Current authentication technologies. Potential solutions? • Proprietary APIs? • NCIP? SIP 2? • LDAP? • Shibboleth? • Kerberos? • Athens (UK) ? • PAPI? • Tequila? • Non-authenticated identification? • IP recognition? • Proxy Servers? • Referring URL? • Embedded data in URL? • Vendor provided Javascript? • Cookies? • Shouting?

Status • Completed survey of authentication methods in use. • Developed comprehensive use cases Status • Completed survey of authentication methods in use. • Developed comprehensive use cases then simplified to a three metasearch specific cases. • Ranked authentication methods in use by their ability to deliver on use case needs. • Introduced an environmental ranking to cover factors such as ease of use, adoption, complexity, cost, etc. • Developed a charting model to identify best solutions.

Access Management Process Objects Credentials Attributes Processes Authentication Authorization Entitlements Certification Certificate The AMP Access Management Process Objects Credentials Attributes Processes Authentication Authorization Entitlements Certification Certificate The AMP A Mike Teets Invention

Access Management Instances of Authentication that take place in a simple metasearch transaction Resource Access Management Instances of Authentication that take place in a simple metasearch transaction Resource 2 User Meta. Search 1 3 = AMPS, Access Management Process Symbol Resource

Relative Rankings of Authentication Methods Relative Rankings of Authentication Methods

Decisions to be Reached • Are any current approaches universally applicable? • Can/Should we Decisions to be Reached • Are any current approaches universally applicable? • Can/Should we develop our own authentication standard that addresses all situations? • Is authentication conducive to a standard at all? Possible result: a series of “best practices”?

TG 1 Recommendations • Now – IP authentication – Username / Password • Potential TG 1 Recommendations • Now – IP authentication – Username / Password • Potential for the future – Shibboleth

What’s next… RANKINGS AND RECOMMENDATIONS • Text document with comprehensive analysis of methods in What’s next… RANKINGS AND RECOMMENDATIONS • Text document with comprehensive analysis of methods in use. • Recommend best practices where available. • Recommend development necessary for models with the most promise for metasearch. • Liaison with Shibboleth community started

TG 2: Collection Description TG 2: Collection Description

The Meta-Problem (from a Discovery Standpoint) • Many database (content) providers, each with their The Meta-Problem (from a Discovery Standpoint) • Many database (content) providers, each with their own web presence and means of interaction • User wants to use data from many providers at the same time

User Needs • Find/discover collections that match a certain list of criteria • Obtain User Needs • Find/discover collections that match a certain list of criteria • Obtain enough descriptive information to be able to identify a desired collection • Discover the services that provide access to the collection(s) • Interpret items retrieved from the collection in the context of the collection

TG 2 Mission • Understand how portals use collection and/or service descriptions • Analyze TG 2 Mission • Understand how portals use collection and/or service descriptions • Analyze options; recommend schemas and syntax for implementation of collection (S 1) and service (S 2) descriptions

TG 2 Work Plan • Create data models for collections and services • Design TG 2 Work Plan • Create data models for collections and services • Design metadata semantics for models • Design syntax for representation and data exchange • Build on existing work where possible • Ensure linkages between Collections (S 1) and Services (S 2) • Don’t build a whole new service • Don’t specify the architecture for a given service • Don’t specify protocols for exchange of collection and service metadata

Goals (Solutions) • Create two element sets to be used by metasearch (and other) Goals (Solutions) • Create two element sets to be used by metasearch (and other) applications – Collections descriptions: human readable text to describe contents of database • Building on Significant previous work, notably – Research Support Libraries Programme, UK, 1999 -2002 – Dublin Core Collection Description Working Group, 2003+ – Service descriptions: to be used by applications to access remote database services

Relations between collections and services • A collection may have a parent, and may Relations between collections and services • A collection may have a parent, and may have multiple sub-collections (children) • Each collection description has 0 -to-many service descriptions • A service may make multiple resources available • Each service description has 1 (only) collection description

DC Collection Description Application Profile (DC CD AP) • A DC Collection Description Application Profile (DC CD AP) • A "core" set of collection description properties – For simple collection-level descriptions – Suitable for a broad range of collections – Primarily to support discovery of collections • Includes: • • Collection title • Description • Size • Subject(s) • Language • Type • Intellectual Rights Access Rights Data Range Collection method Logo Collection history Etc.

TG 2 -S 1 progress to date • Working with/around DC CD AP issues TG 2 -S 1 progress to date • Working with/around DC CD AP issues (some joint membership) with data model • Metasearch Initiative introduced some library-specific requirements out of scope for DC CD AP. • TG 2 -S 1 ends up with super-set of DC CD AP

Service Description Goals • Ultimately, a mechanism to describe (and access) informational services that, Service Description Goals • Ultimately, a mechanism to describe (and access) informational services that, in turn, provide access to collections • How? – Indicate protocol used – Provide access point(s) for service – Provide authentication/authorization guidelines – Lists operations/queries supported • TG 2 -S 2 using Zeerex as vehicle

Zeerex: A Starting Point • Originally a Z 39. 50 based specification • Based Zeerex: A Starting Point • Originally a Z 39. 50 based specification • Based on Z 39. 50 “Explain” service, which was never fully or particularly well implemented • Flexible enough to deliver collection descriptions, relatively easy to implement • “Z 39. 50 Explain, Explained and Re. Engineered in XML”

Under discussion: • Maintaining and exchanging collection description and service access information – Auto-generate Under discussion: • Maintaining and exchanging collection description and service access information – Auto-generate descriptions? – Harvest descriptions? • Collection Identifiers – Metasearch needs globally unique and persistent identifiers for collections ( and services) – Also needed by ONIX community, e-resource management systems and more

Future • Publish/promote standardized Collection and Service Description schemas • Write guidelines, best practices Future • Publish/promote standardized Collection and Service Description schemas • Write guidelines, best practices for implementation • Promote creation of, and facilitate sharing of, collection and service descriptions among metasearch providers • Ensure interoperability (or at least consistency) with TG 1 (Authentication) and TG 3 (Search and Retrieve)

TG 3: Search/Retrieve TG 3: Search/Retrieve

Goals – Describe current practice in Metasearching search and retrieval – Define a standard Goals – Describe current practice in Metasearching search and retrieval – Define a standard vocabulary and terms – Define a template for exchange of search and retrieval functionality – Inventory proprietary XML interfaces and best practices for Metasearch and retrieval – Recommend the data elements to describe a Result Set and a record within a Result Set – Review SRW/SRU and recommend modifications for use as the basis of a Metasearch and retrieval standard.

Initial steps § Four main areas of activity in 2004 -2005 – Current practices Initial steps § Four main areas of activity in 2004 -2005 – Current practices – Metadata returned about result sets – Citation level data elements – Search / Retrieval standard investigation

Survey of Current Practices • What protocols commonly used • Capture other common information Survey of Current Practices • What protocols commonly used • Capture other common information • Sent in June to over 100 organizations with a (disappointing) 25% response rate • Responses analyzed (Stanford) and will be posted to the NISO-MI Wiki

Result Set Metadata • Result set metadata—information that is valid only in the context Result Set Metadata • Result set metadata—information that is valid only in the context of the current result set As opposed to… • Record metadata – Administrative/control metadata – Descriptive • Intended to inform possible standard protocol or to make sure proprietary protocols have sufficient information • Reviewing and tightening the elements

Results Set Management • How to allow for extension to core metadata so that Results Set Management • How to allow for extension to core metadata so that Information Providers can transmit “extra value” information • If cross database searching allowed in single search, how will variations in the results set metadata be handled on a database level? • Can result set metadata be overridden at the single record level? • Tension between the need for a simple to implement protocol and the need for rich metadata to provide advanced features?

Citation level data elements • How to map them to commonly supported metadata formats Citation level data elements • How to map them to commonly supported metadata formats – DC, MARC, MODS • The goal? – Provide recommendations to improve citation information for reuse in standards like Open. URL or document delivery

Citation Level Data Elements • Need to be able to parse volume, issue, … Citation Level Data Elements • Need to be able to parse volume, issue, … information to reuse for other actions (Open. URL, document delivery) • Reviewing the work of several other groups – MARBI 773 work – IMS Resource List standard – Open. URL 1. 0 (the winner)

Search and retrieve • Review current practices and make recommendations for best practice and Search and retrieve • Review current practices and make recommendations for best practice and further standards work. • Specifically review SRW/SRU and recommend modifications for use as the basis of a Metasearch and retrieval standard.

Search and retrieve • Structured search and retrieve – Z 39. 50 – SRW/SRU Search and retrieve • Structured search and retrieve – Z 39. 50 – SRW/SRU (http: //www. loc. gov/srw/) – XML gateways (proprietary) • Unstructured – html/http parsing (“screen- scraping”)

March 2005 meeting adopted a new approach: • What is the lowest barrier of March 2005 meeting adopted a new approach: • What is the lowest barrier of entry to encourage adoption of a standard search and retrieve protocol that would enable consistent processing of search results – display, sort, merge, de-dupe and ability to generate Open. URLs for onward linking? àNISO Metasearch XML Gateway (MXG) Yes, a new standard !!

To be specific…. Relationship of this standard with NISO SRW/U The NISO Metasearch XML To be specific…. Relationship of this standard with NISO SRW/U The NISO Metasearch XML Gateway is a nonconformant subset of the NISO SRW/U standard (http: //www. loc. gov/z 3950/agency/zing/srw/). The features missing from MXG that are necessary for SRU conformance are support for an Explain record and rich CQL support. MXG has been designed to provide a low implementation barrier to content providers that want to make their databases available to metasearch engines. Interoperability across content providers was explicitly not a goal of MXG. The features of SRU that are missing from MXG are necessary for interoperability.

NISO MI XML Gateway (MXG) • • SRU/SRW as possible starting point Amazon A NISO MI XML Gateway (MXG) • • SRU/SRW as possible starting point Amazon A 9 Open. Search as low barrier Recommended schemas for result set metadata Recommended schemas for citation data based on subset of Open. URL 1. 0 data elements

Challenges and Opportunities • • Web. Feat Metasearch patent (? ? ) Googlezon-mania Building Challenges and Opportunities • • Web. Feat Metasearch patent (? ? ) Googlezon-mania Building the business case Finishing Sept. 19 -21 NISO Metasearch and Open. URL Workshop Washington, DC

Loughborough University How to make your e-resources earn their keep, Ruth Stubbings, presentation at Loughborough University How to make your e-resources earn their keep, Ruth Stubbings, presentation at The radical library: taking up the challenge seminar, November 2003

Metasearch here to stay! • Key initiative for NISO • Needs support of all Metasearch here to stay! • Key initiative for NISO • Needs support of all the stakeholders – Metasearch vendors – Information Providers – Libraries “If your not fer us, yer agin us” -John Little

Thank You. http: //www. lib. ncsu. edu/niso-mi http: //www. lib. ncsu. edu/staff/pace Andrew K. Thank You. http: //www. lib. ncsu. edu/niso-mi http: //www. lib. ncsu. edu/staff/pace Andrew K. Pace Head, Systems NCSU Libraries andrew_pace@ncsu. edu