c921a2d626090472fadb1f23a97e5ec2.ppt
- Количество слайдов: 21
Search Services for Digital Libraries Javed Mostafa SLIS & SI SIFTER Research Team Indiana University Digital Libraries Forum 2002 Project SIFTER © Indiana University
Search • Information searching is a basic necessity … – Critical to the usefulness of a digital library • Information available through a digital library may actually come from many different sources (both historical and recent) • Users may need access to multiple digital libraries – distributed across the globe Digital Libraries Forum 2002 Project SIFTER © Indiana University
Search as a service • Should be viewed as a “service” not merely as a “function” that the user performs Digital Libraries Forum 2002 Project SIFTER © Indiana University
Search as a “utility” service MAN USER WAN LAN DL ØHigh quality - standards ØPersistent - always on ØRobust – scalable ØSmart – “demand aware” Digital Libraries Forum 2002 Project SIFTER © Indiana University
What does it take to be a search service? ØOrganization ØAggregation ØRepresentation ØClassification ØMatching ØDelivery media & devices: (customization) ØUsers’ interests: (query & profile personalization) ØPresentation & interaction ØPrune, cluster, rank, format, visualize Digital Libraries Forum 2002 Project SIFTER © Indiana University
Modeling Search DAM v. F 1: D -> C & F 2: C -> R F 1 Representation Thesaurus Management F 2 Classification Classifier Management Digital Libraries Forum 2002 Project SIFTER © Indiana University Profile & Query Management Shift Detection PAM
Key Challenges • Data – Diverse sources – Numerous formats – Heterogeneous content • Dynamic environment – Content drift – Quality change • User needs – User’s demands are context-sensitive – User’s interest vary and may change over time Digital Libraries Forum 2002 Project SIFTER © Indiana University
Rising to the Challenge • Developing algorithms and systems that utilize both IR and AI approaches Digital Libraries Forum 2002 Project SIFTER © Indiana University
User Needs • Need to capture interest for continuous service • Detect different types of interests and shifts Digital Libraries Forum 2002 Project SIFTER © Indiana University
Capturing User’s Interest • Explicit (topics), rating content, and user behavior Digital Libraries Forum 2002 Project SIFTER © Indiana University
Detecting Different Types of Interests and Shifts Solid: Discriminatory; Dotted: moderately discriminatory; Dashed: non-discriminatory • Lam, W. & Mostafa, J. "Modeling User Interest Shift Using a Bayesian Approach". Journal of the American Society for Information Science & Technology , 52(5), 416 -429, 2001 Digital Libraries Forum 2002 Project SIFTER © Indiana University
Representation & Classification • Representation: – Use of thesauri – Algorithms to convert data stream to efficiently computable structures • Classification: – Algorithms to cluster or classify to higher level representations Digital Libraries Forum 2002 Project SIFTER © Indiana University
Automated Approaches • Mostafa, J. , & Lam, W. "Automatic Classification Using Supervised Learning in a Medical Document Filtering Application. " Information Processing & Management, 36(3), 415444, 2000 – Learned from existing classification results – used PUBMED for training • To deal with dynamic nature of content also developed algorithms for vocabulary and association discovery Digital Libraries Forum 2002 Project SIFTER © Indiana University
Interactive Term and Association Discovery Digital Libraries Forum 2002 Project SIFTER © Indiana University
Diverse Sources: Distributed Services ~300 K TREC data set & distributed thesauri • Raje, R. , Qiao, M. , Mukhopadhyay, S. , Palakal, M. , & Mostafa, J. “Homogeneous Agent-based Distributed Information Filtering'', Cluster Computing (in press) Digital Libraries Forum 2002 Project SIFTER © Indiana University
Diverse formats • Developing systems for health news (text), scholarly research publications (text), music (audio) and cultural information (all major formats) – Med. SIFTER – Tune. SIFTER – Research. SIFTER (One. Start) – View. Finder (CLIOH) Digital Libraries Forum 2002 Project SIFTER © Indiana University
Systems for Different Data Formats Digital Libraries Forum 2002 Project SIFTER © Indiana University
Systems for Different Data Formats View. Finder Digital Libraries Forum 2002 Project SIFTER © Indiana University
Beyond Current Challenges • Distributed DLs with both – Data – Services • Need to be integrated -> Web Services meets Multi-agent Searching Digital Libraries Forum 2002 Project SIFTER © Indiana University
More Challenges • Cross-format, cross-language, and cross-domain information synthesis in real-time – Imagine a diplomat wishing to become familiar with historical and current situational contexts of Palestine-Israel conflict as a user of a DL Digital Libraries Forum 2002 Project SIFTER © Indiana University
Acknowledgment • SLIS-IUB, C&IS-IUPUI, & SI-Indiana University • UITS, Indiana University • Eli Lilly, Indianapolis • National Science Foundation (DLI-II and ITR I) Digital Libraries Forum 2002 Project SIFTER © Indiana University
c921a2d626090472fadb1f23a97e5ec2.ppt