7e51cfd30485679328e9cd7c8d2df64b.ppt
- Количество слайдов: 31
Joint workshop on electronic publishing Beyond OAI-Services: Bielefeld Academic Search Engine (BASE) Dirk Pieper, Friedrich Summann Bielefeld University Library
Overview: Joint workshop on electronic publishing Part 1: Bielefeld UL: from meta search to search engines BASE: objectives, content, services Outlook and further information Part 2: Backend, Frontend OAI dataflow, BASE dataflow OAI harvesting problems Further developments of BASE
Joint workshop on electronic publishing From where we come from …
From where we come from … Joint workshop on electronic publishing One central on-site library divided into groups of subject libraries • 2 Mio books and other media items, the majority on open shelves • Active registered users in 2004: 28, 000 • 2, 675 reader workplaces • Budget for acquisitions in 2004: EUR 3, 200, 000 incl. special funds • Journals: about 5, 700 subscriptions • Host of the International Bielefeld Conference series, a conference that offers every two years a major strategic discussion forum for library managers from all over Europe and beyond
Joint workshop on electronic publishing From meta search to search engines (1) • Integration of heterogenous information resources for users is a primary objective of UL Bielefeld at all times • Milestones: • 1993 Introduction of the document delivery system JASON • 1995 Development of the first German library project for a cooperative electronic information supply IBIS • 1998 Introduction of JASON-Subito. Online access to journals available in full-text versions (i. a. by consortial agreements with publishers) • 1998 -2001 main coordinator of the Digital Library NRW (a major grant of the NRW State Ministry) • 2000 Combination of Digital Library NRW services and the library's local website in order to offer integrative services in corporate design • 2002 Development of a netbased integrated learning and teaching environment (online learning) based on Blackboard and a university publications server (Bie. SOn) based on OPUS • 2004 Launch of the Bielefeld Academic Search Engine (BASE) on the basis of FAST Data Search Software
From meta search to search engines (2) Joint workshop on electronic publishing • Integration on the level of library´s local system: • OPAC: • Local holdings • Institutional repository servers (OAI, with focus on fulltext dissertations) • Journal Article Database (JADE, about 39 Mio Articles) in combination with document delivery (JASON, Elsevier-ppv, Subito) • • • Inside Serials Elsevier Springer JSTOR … • Meta search for several subject portals (Digital Library)
BASE: objectives, content, services (1) Joint workshop on electronic publishing • First starting point: reality of academic online information web pages subject databases publishers‘ ejournals library catalogues institutional repository servers search engine digital libraries commercial providers portals search
BASE: objectives, content, services (2) Joint workshop on electronic publishing • Second starting point: experience with meta search (Digital Library) and user studies: • Users want search engine look and feel • Search functionality of meta search environments is too slow when compared to search engines like Google • Little integration of fulltext resources • Little integration of the “visible web” • Main objectives of BASE: • to overcome the fragmentation of academic search information resources • to use search & retrieval standards provided by search engine technology • to provide comfortable search interfaces and flexible result presentation • to handle with highly structured and unstructured data • to create spacious shared indices for a new kind of “meta” search
Joint workshop on electronic publishing BASE: objectives, content, services (3) web pages subject databases publishers‘ ejournals library catalogues institutional repository servers search engine for academic online information
Joint workshop on electronic publishing BASE: objectives, content, services (4)
Joint workshop on electronic publishing BASE: objectives, content, services (4) Projekt Gutenberg-DE Internet Library of Early Journals Oxford Various Institutional Repositories Springer Link Metadata Cornell Hist. Math Fulltext Crawl University Michigan Historical Math Biomed Central Project Euclid Zentralblatt Mathematik Bielefeld Univ: Math. Preprints OAI Verlag Krause und Pachernegg OPAC UL Bielefeld Univ: Documenta Mathematica Perseus Digital Library Zeitschriften der Aufklärung (Bielefeld UL) TIB Hannover MATH Collection
BASE: objectives, content, services (5) • Services provided by UL Bielefeld within BASE: Joint workshop on electronic publishing • Identification and selection of high-quality content repositories • Contact and negotiations with content providers (universities, libraries, commercial content providers) • Data aggregation, data pre-processing and data-processing of internationally distributed and highly heterogeneous ressources • Data production (e. g. german enlightment, JADE, . . . ) • Delivering of indexes in standardised formats (XML) for platformindependent re-use by other search engine providers • Integration of BASE within meta search environments (e. g. SISIS-Elektra) • Providing access to additional content in local OPAC environments
BASE: Outlook and further information (1) • The next steps: Joint workshop on electronic publishing • Leaving the „demonstrator“-status • Increase the number of indexed OAI-Servers • Integrate local library resources (OPAC and other databases) • Integrate more commercial subject databases • Increase fulltext indexing • More use of FAST-features
Joint workshop on electronic publishing BASE: Outlook and further information (2) • DLF Spring Forum New Orleans 2004: http: //www. diglib. org/forums/Spring 2004/ • Norbert Lossau: Search Engine Technology and Digital Libraries, Libraries Need to Discover the Academic Internet, in: D-Lib Magazine, June 2004 (Volume 10, Number 6) • Friedrich Summann, Norbert Lossau: Search engine technology and digital libraries : moving from theory to practice, in: D-Lib Magazine, September 2004 (Volume 10, Number 9) • http: //base. ub. uni-bielefeld. de
TUNING, ADMINISTRATION and DEBUGGING General Web Content and Full Text Pipeline Database Content (Bibl. Data) FILE TRAVERSER INDEX FILES FILTER Pipeline Search API OAI-Sources (Metadata+Docs) QUERY & RESULT PROCESSING SEARCH CONNECTORS Full Text Collections WEB CRAWLER DOCUMENT PROCESSING Joint workshop on electronic publishing FAST based architecture and intelligent modifications
Joint workshop on electronic publishing Added functionalities: Connectors General Web Content and Full Text OAI-Sources (Metadata+Docs) Database Content (Bibl. Data) CONNECTORS Full Text Collections
Joint workshop on electronic publishing OAI dataflow OAI-Data Harvesting Dissertations, monographs (fulltext) OPAC Articles (fulltext) Pub. Med, Euclid, Ar. Xiv, Cite. Seer, Citebase, DOAJ articles Article Database All ressources (texts, images, video, refernces. . BASE Internal Index (FAST)
Joint workshop on electronic publishing BASE dataflow OAI-Data Harvesting Database Records Web Pages Pre-Processing Internal Index (FAST) User interface (PHP)
Joint workshop on electronic publishing OAI university repositories in BASE 3 9 22 1 9 3 USA 34 Canada 7 Australia 8 1 2 4 11 27 6 1 6 3
OAI harvesting problems Joint workshop on electronic publishing • Non-Responding repositories • Only References without fulltext • Restricted access • Invalid characterset (not well-formed) • Varying Field content
OAI Harvesting : Problems in Practice (Examples 1) Joint workshop on electronic publishing <source>http: //elib. suub. uni-bremen. de/publications/ ELib. D 905_diplom_allnoch. pdf</source> <dc: creator>Barry Wellman, Jeffrey Boase, Kakuko Miyata</dc: creator> <dc: subject>Barry Wellman, Jeffrey Boase, Kakuko Miyata The Mobile-izing. . </dc: subject> <dc: title>Talk P. Bruzzone</dc: title> <dc: creator>Bruzzone </dc: creator> <dc: creator>Pierluigi</dc: creator> <dc: date>2004 -07 -05</dc: date> <dc: type>Review </dc: type><dc: identifier>http: //www. rbej. com/content/2/ 1/52</dc: identifier> Reproductive Biology and Endocrinology 2004, 2: 52 doi: 10. 1186/1477 -7827 -2 -52
OAI Harvesting : Problems in Practice (Examples 2) Joint workshop on electronic publishing <dc: identifier>http: //www. forex. uni-bremen. de/cgibin/forex 2/user/publish? search=sqn&sqn=00005223 </dc: identifier>
Joint workshop on electronic publishing BASE homepage
Joint workshop on electronic publishing Advanced Search form
Joint workshop on electronic publishing Result Presentation
Joint workshop on electronic publishing Further Development (1): Frontend • combining metadata record and corresponding fulltext in result display [Done] • Search history [Done] • Truncation [Done] • Flexible Templating (customised views) • Improvement Search Interface (based on search API) • Refinement on data deliverer
Joint workshop on electronic publishing Local view on BASE
Joint workshop on electronic publishing Subject index browsing
Joint workshop on electronic publishing Author index browsing
Further Development (2): Backend Joint workshop on electronic publishing • automation of harvesting and content preprocessing • Federated search, linking with external indexes • search result improvement (ranking, boosting, linguistics) • performance optimisation • support of standard protocols (Z 39. 50, OAI, SOAP) as a target system
Joint workshop on electronic publishing Further Visions • Integrating XML queries • Link topology analysis • Citations analysis • Automatic linguistic analysis of anchor texts • Push services • Personalized ranking • Cross-language information retrieval
7e51cfd30485679328e9cd7c8d2df64b.ppt