
6687eae861e87e085a92dcce22b75f74.ppt
- Количество слайдов: 34
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University April 12, 2000 Introduction
Outline Introduction v Learning Based Web Query Processing v FACT: A Prototype System v Preliminary System Evaluation v Conclusions F Outline April 12, 2000 2
How Do We Query the Web? Use a search engine v Form query key words v An example: Find room rates of hotels in Hong Kong w used search engine www. yahoo. com w keywords: Hong Kong+hotel v Demonstration, SIGMOD 2000 3
forward Hotel 1 3 Hotel 2 Look at the Number! Demonstration, SIGMOD 2000 done 4
Query the Web -- Current Situation Search engines return a long list of URLs. User is required to browse the web pages to find the information. v The information required is often not on the returned page -- navigation through hyperlinks is often required (those links may or may not that obvious). v The target information is in different forms (paragraphs, lists, tables …) Are we happy v A lot of web pages to be browsed with this? v Demonstration, SIGMOD 2000 5
Efforts to Improve the Situation Search engines w better index, improve precision/recall, metasearch engines, better presentation of results, …. v IR techniques to Web w document clustering/indexing, better model, similarity functions, documents ranking, . . . v Intelligent agent w user profiling, hyperlink recommendation, . . . v Database approach w wrappers, query languages, … v Demonstration, SIGMOD 2000 6
Our Dream v Querying the Web as easy as querying a relational database w SQL query returns a table of hotel prices SELECT room rates FROM web. hotel WHERE city = “hong kong” v May remain a dream for a while : -( Demonstration, SIGMOD 2000 7
A Practical goal Use keywords to express query requirements + simple, no need to know schema of data - inaccurate v Relieve users from tedious browsing as much as possible w Not URLs, not Web sites, even not Web pages v Present query results to users as accurate and concise as possible w Tables, lists, paragraphs, … containing user required information v Demonstration, SIGMOD 2000 8
Query Results -- Queried Segments Return query results as accurate and concise as possible. v Basic idea: w Breaking a Web page into segments: a row in a table, an item in a list, a paragraph, w returning only queried segments to users v • queried segments : segments that contain the information the user is interested in. Demonstration, SIGMOD 2000 9
Outline Introduction F Learning Based Web Query Processing v FACT: A Prototype System v Preliminary System Evaluation v Conclusions v Outline April 12, 2000 10
Learning Based Query Processing v The fundamental difficulties in Web query processing: w Web is a huge, ever growing, heterogeneous, semi -structured data source w Most users of Web are naïve users issuing ad hoc queries v Learn the knowledge for query processing from the User! Demonstration, SIGMOD 2000 11
A Learning Based Technique v Learn from the user when he browses from the first few URLs w to navigate through the web pages w to identify the required information in a web page v Process the rest URLs automatically and retrieve queried segments Demonstration, SIGMOD 2000 12
forward Hotel 1 3 Hotel 2 User browses it! Demonstration, SIGMOD 2000 done 13
Back User clicks here! Demonstration, SIGMOD 2000 14
Room information User marks it! Demonstration, SIGMOD 2000 15
back Fact starts here! Demonstration, SIGMOD 2000 16
roomrates Fact chooses it! Demonstration, SIGMOD 2000 17
xxx Fact finds it! Demonstration, SIGMOD 2000 18
Outline Introduction v Learning Based Web Query Processing F FACT: A Prototype System v Preliminary System Evaluation v Conclusions v Outline April 12, 2000 19
A Query Processing System A learning based query processing system: v User Interface: accepts user queries, presents query results, a browser capable of capturing user actions Query Analyzer: analyzes and transforms user queries v Session Controller: coordinates learning and locating v Learner: generates knowledge from captured user actions v Locator: applies knowledge and locates query results v Crawler & Parser: retrieves pages and parses to trees v Knowledge Base: stores learned knowledge v System April 12, 2000 20
Reference Architecture User Interface Learner Query Analyzer Session Controller Knowledge Base Locator Crawler & Parser Search Engine Web System April 12, 2000 21
A Query Session Learning Process Scripts Browser User Actions URLs Session Controller Result Buffer Query results Learner Training Segment Strategy Graph Knowledge Base Checking Query Result Presenter Locator Locating Process System April 12, 2000 22
Training Strategies v v v Sequential w First n sites: user browses and system learns w Next N-n sites: system processes Random w Randomly choose n sites: user browses and system learns w the system processes the rest Interleaved w First n 0 sites, user browses and system learns w Next n - n 0 site, system makes decision. For incorrect ones, user browses and system re-learns w Next N-n sites: system processes Demonstration, SIGMOD 2000 23
Outline Introduction v Learning Based Web Query Processing v FACT: A Prototype System F Preliminary System Evaluation v Conclusions v Outline April 12, 2000 24
System Evaluation Functionality v Performance w precision, recall, correctness w efficiency: in a site, how many pages the system visits to find a result w training efficiency: how many training samples are needed v User interface v Demonstration, SIGMOD 2000 25
Demonstration, SIGMOD 2000 26
System Evaluation - Effectiveness v Given a set of keywords, the system makes N decisions N =N 1 + N 2 + N 3 + N 4 Precision Recall Correctness Demonstration, SIGMOD 2000 = N 1 / (N 1+N 3) , = N 1 / # relevant sites , = (N 1+N 2) / N. 27
System Evaluation - Efficiency v How efficiently the system finds a queried segment in a site? Levelof a Queried Segment thelength theshortest = of path to find it Absolute Path length = # Crawled pages, Relative Path Length = # Crawled pages / Level of the Queried Segment. Demonstration, SIGMOD 2000 28
Basic Performance Q 11: Hong Hotel Room Rate Q 12: Hong Kong Hotel Sequential training Demonstration, SIGMOD 2000 29
Query Q 12 Effects of training Strategies Demonstration, SIGMOD 2000 30
Improved Performance Interleaved training Demonstration, SIGMOD 2000 31
Outline Introduction v Learning Based Web Query Processing v FACT: A Prototype System v Preliminary System Evaluation F Conclusions v Outline April 12, 2000 32
Conclusions Proposed and implemented learning based Web query processing with the following features w Returning succinct results: segments of pages; w No a prior knowledge or preprocessing, suited for ad hoc queries; w exploiting page formatting and linkage information simultaneously. v The preliminary results are promising v Demonstration, SIGMOD 2000 33
Future Work Better knowledge w key factor that affects system performance v Dynamic web pages ? w Integrating results from another project v System evaluation v v Prototype product dot company $$$ ? ? ? Future Work April 12, 2000 34
6687eae861e87e085a92dcce22b75f74.ppt