
eee7598eaf8103abf82325601ba01eb6.ppt
- Количество слайдов: 21
Trends in Web Search and its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore
Min-Yen Kan, WING@NUS Tips on Web Searching • Visualize results, then come up with multiple queries • Use multiple search engines • Advanced Search – inurl: , site: – “Phrasal search” But that’s just general search… • Federated resources / Niche search engines 26 Sep 2008 World Scientific Talk 2
Min-Yen Kan, WING@NUS Site- and Task-specific resources • Site Prestige Know what others think and do – Google Page. Rank (Link structure), Alexa (Traffic) – Google Trends / Insight (Queries) • Social Searching (Web 2. 0) The voice of the reader / critic – (Bookmarks / Tags) Del. icio. us, Citeulike. org, Bibsonomy. org – (News) Digg / Slashdot – (Blogs) Google Blog, Technorati • People Search: Finding public information on a person – Spock (web), Zabasearch (US only) – Linked. In, Facebook – Must validate your sources http: //labs. digg. com/arc/ 26 Sep 2008 World Scientific Talk 3
Min-Yen Kan, WING@NUS Expert Search Find people who will advocate on your behalf • What do they want? → Impact • Scholar: – Active? → Check their recent articles – Names common? → Define area of interest – Compare against peers – Download vs. citation counts • Patent search: – Referenced by: (citation count; different than scholar) • Identifying webfaced advocates: – Blog search, Page. Rank 26 Sep 2008 http: //flickr. com/photos/phauly/ How do machines do it? • Expert search task as benchmark test • Download web pages to analyze • Needed to deal with spam pages • Used Page. Rank to assess prestige World Scientific Talk 4
Min-Yen Kan, WING@NUS Problem or opportunity? • Revenue from print continually declining • Students and researchers rely on internet • Researchers want archiving rights – freedom of academic information The game has fundamentally changed Characteristics: • Not zero-sum content • Distribution is now largely the role of search engines → Necessitates new role of publisher and new revenue model – Will classic models work? Advertising, Subscription, Transactional & Bundling – Variants? Versioning (Varian), Moving window (JSTOR) 26 Sep 2008 World Scientific Talk http: //flickr. com/photos/danielbroche/ 5
Min-Yen Kan, WING@NUS Forecasting – + Content is becoming free – MIT / Stanford opening up textbooks – Open access archiving → long term: content will not be primary revenue source Academic publishers – Connect to libraries and federations at institution level – Individual customers are secondary e. Book revenue hasn’t held up its promise yet… – Device gap: i. Phone and next. Gen devices → Revenue may be further down the pipe 26 Sep 2008 Trusted source – Expertise in copyediting, typesetting, project management, distribution, social networking – Many individual web publishers rediscovering same problems → Consultancy model → Win-win partnerships with individual authors World Scientific Talk 6
Min-Yen Kan, WING@NUS Web Trends • Social Content • Wisdom of masses: Crowdsourcing • Rich Media • Open Source / Access Paradigmatic change – Classifieds → Craigslist – POTS → Skype – CD store → i. Tunes – Publishers → ? ? http: //www. informationarchitects. jp/ slash/i. A_Web. Trends_2007_2_1024_768. gif 26 Sep 2008 World Scientific Talk 7
Min-Yen Kan, WING@NUS Where is research going? Server centric • Search API usage • Browser as computer • Web page structure, mining text data User centric • Modeling web users at tasks: Exploring / Fact-finding • Personalization, recommending • Social networks • Understanding opinion • Query and log analysis http: //flickr. com/photos/alisdair/ 26 Sep 2008 World Scientific Talk 8
Min-Yen Kan, WING@NUS Webfaced pop quiz – which is which? American Statistical Society World Scientific Springer courtesy: http: //pagerank. si/ 26 Sep 2008 World Scientific Talk 9
Min-Yen Kan, WING@NUS Forecast: Know your strengths Get advocates • Make it easy to get individuals to insist to their institution to buy your materials • Know who is accessing (not necessarily buying) your content Content revenue will continue to decline • Find an economic model that works for you • Work as partners in content creation Be savvy on trends • Be visible: do “white hat” Search Engine Optimization (SEO) • Make your abstracts indexable by others 26 Sep 2008 World Scientific Talk + Academic publishers – Connect to libraries and federations at institution level – Individual customers are secondary Trusted source – Expertise in copyediting, typesetting, project management, distribution, social networking – Many individual web publishers rediscovering same problems –→ Consultancy model –→ Win-win partnerships with individual authors 10
Min-Yen Kan, WING@NUS Trends in Digital Libraries >> WING @ NUS • Expanding types of information in search • Automated tools for DLs • Usability in E-books and online media • User modeling • Personalization, annotation and relation to other user tasks http: //flickr. com/photos/pathfinderlinden 26 Sep 2008 World Scientific Talk 11
Min-Yen Kan, WING@NUS Scholarly Digital Libraries • Fore. Cite: our scholarly DL • Data Cleaning • Slide and Document Alignment • Searching in the OPAC • Math Information Retrieval 26 Sep 2008 World Scientific Talk 12
Min-Yen Kan, WING@NUS Fore. Cite: Beyond the document as an item Server Client A user-centric DL framework • Put author / reader functionality together • Tagging, correction, annotation and viewing • Automatic tools: keyphrases and sentence classification • For use on and offline, organizes local PDF files for you • Only need your web browser 26 Sep 2008 World Scientific Talk 13
Min-Yen Kan, WING@NUS Data Cleaning • Addresses – Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802 – LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802 -2343 • Products – Honda Fix vs. Honda Jazz – Apple i. Pod Nano 4 GB vs. 4 GB i. Pod nano 4 GB 26 Sep 2008 Search results: “Jeffrey D. Ullman” 384, 000 pages 45% “Jeffrey D. Ullman” + “aho” 174, 000 pages “J. Ullman” + “aho” 124, 000 pages 33% 41, 000 pages “Shimon Ullman” 27, 300 pages 0% “Shimon Ullman” + “aho” 66 pages • Idea: use web as additional context for disambiguation and clustering • Placed 3 rd in Web People Search Task (WEPS 2007) World Scientific Talk 14
Min-Yen Kan, WING@NUS Slides and their relationship to documents Document in focus 26 Sep 2008 Slides in Focus World Scientific Talk 15
Min-Yen Kan, WING@NUS Searching in Libraries http: //linc. comp. nus. edu. sg 26 Sep 2008 World Scientific Talk 16
Min-Yen Kan, WING@NUS Symbolic Information Search How do users want to search materials? Not quite right… Our answer: Text-to-Expression Linking – Resolve text keywords to expressions – e. g. , “Pythagorean Theorem” “a 2+b 2=c 2” or “x 2+y 2=z 2” Reduce the need for expression input Solves the notational variation problem 26 Sep 2008 World Scientific Talk 17
Min-Yen Kan, WING@NUS Conclusions • Consider us your research WING! • Trade data and problems for solutions and interns Meanwhile: • Use better search strategies • Practice white hat SEO • Identify webfaced advocates 26 Sep 2008 World Scientific Talk 18
Min-Yen Kan, WING@NUS References • Kahin and Varian (2000) Internet Publishing and Beyond • Towle et al. (2007) Electronic Books in the 2003 -2005 Period, Pub Res Q 23: 95 -104 Photo Credits • Flickr Creative Commons Search Thanks to all of you for listening & my fellow WING group members 26 Sep 2008 World Scientific Talk 19
Min-Yen Kan, WING@NUS 26 Sep 2008 World Scientific Talk 20
Min-Yen Kan, WING@NUS Abstract • I will present trends in current academic research on web search and digital libraries, and discuss their relevance to publishers and their economic model. With respect to the web, I will cover how search engines are starting to specialize and use click through and ad data to improve relevance ranking. With respect to digital library research, I discuss my group's research at NUS on advancing the state-of-the-art in scholarly digital libraries. I cover advances on how we deal with data cleaning issues, and slide and equation retrieval and alignment. 26 Sep 2008 World Scientific Talk 21
eee7598eaf8103abf82325601ba01eb6.ppt