
875b47f71a328ad39dc35655ba611774.ppt
- Количество слайдов: 25
Meaningful Labeling of Integrated Query Interfaces Eduard C. Dragut (speaker) University of Illinois at Chicago Clement Yu University of Illinois at Chicago Weiyi Meng SUNY at Binghamton VLDB 2006, Seoul, Korea
A Motivating Scenario Ü Looking for a ticket ð Chicago – Seoul, September 10 th – September 17 th m co a. lt de orbitz. com ex pe di a. co m Ü A user looking for the “best” price for a ticket: ð ð Has to explore multiple sources It is tedious, frustrating and time-consuming E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 2
The goal Ü Provide a unified way to query multiple sources in the same domain The Web Unified query interface Airfare. com Formulate the query priceline. com united. com delta. com nwa. com E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 3
Overview Integrating Query Interfaces Auto Cluster query interfaces Car Extract query Rental interfaces Peng 04 He 05, Zhang 04 Books Airfare (Deep) Web Match query interfaces B. He 03, Dhamankar 04, Doan 02, Madhavan 05, Wu 04 Various formats e. g. ASCII files Integration of Interfaces H. He 03, Dragut 06 E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 4
Overview Integrating Query Interfaces Ü Integration Steps: ð Structural merging of query interfaces [He 03 et al, Dragut 06 et al] à à ð ð Grouping constraints Ancestor-Descendant relationships Determining the domain of each global field in the integrated interface [He 03 et al] Meaningful labeling of the integrated interface à The topic of this presentation E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 5
Motivation of Naming Ü Ü A query interface needs to be easily understood by any user, irrespective of his/her background The study of query interfaces in the seven domains used in our experiment revealed that the designers of query interfaces follow some “hidden” norms: ð there are certain relationships between the labels of the fields in the same groups à ð Ü E. g. , all plurals the labels of the (super) groups semantically characterize the set of fields underneath them The semantic ambiguity problem ð Synonyms and homonyms are the two sources of naming conflicts [Batini 86 et al, Bright 94 et al] E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 6
The objectives Ü The main goal is to provide a systematic way to label fields in the integrated query interface so that the concepts on the integrated query interface are easily understood by ordinary users. ð Ü Validated undergoing a survey Provide a set of desirable properties required in order to have consistent labels for the attributes within an integrated interface so that users have no difficulty in understanding it. ð Not covered in detail E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 7
Naming Algorithm Ü The input ð A set of query interfaces in the same domain à à E. g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity Each query interface is represented hierarchically [Wu 04] vacations. net ð The mapping between the fields of the query interfaces. à ð ð Organized in clusters (e. g. [Wu 04 et al, B. He 03 et al]) The set of groups of fields given by the merge algorithm [Dragut 06 et al] The integrated query interface given by the merge algorithm as a schema tree [Dragut 06 et al] E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 8
An Example of Input Ü Three fragments of query interfaces represented hierarchically Ü The mapping between them, i. e. the set of clusters c_Dep. City c_Dest. City c_Dep. Month c_Dep. Day c_Dep. Time c_Dep. Year (Travel, 3) (Travel, 4) (Travel, 7) (Travel, 6) (Travel, 8) (Travel, null) (Price. Line, 2) (Price. Line, 3) (Price. Line, 5) (Price. Line, 6) (Price. Line, null) (Price. Line, 7) (British, 2) (British, 3) (British, 9) (British, 8) (British, null) c_Aduts c_Infants c_Children c_Seniors c_Airlines c_Class (Travel, 14) (Travel, null) (Travel, 15) (Travel, 16) (Travel, 12) (Travel, null) (Price. Line, 12) (Price. Line, 14) (Price. Line, 13) (Price. Line, null) (British, 5) (British, null) (British, 6) (British, null) (British, 13) E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 9
Naming Algorithm - Sketch Ü Step 1: Consistent labeling of the fields ð ð Ü Step 2: Consistent labeling of the internal nodes ð ð Ü Fields in the same group - use intersect-and-union strategy Isolated fields, no consistency required Root fields - treated as a group Output: each group of fields (or field) has a set of candidate labels, possibly empty For each internal node, starting from the lowest level to the root, apply a set of inference rules on labels Output: each internal node has a set of candidate labels, possibly empty Step 3: Enforce consistency within the entire integrated interface ð Not covered E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 10
Preliminaries Ü Normalization [e. g. , He 03 et al, Madhavan 01 et al , Rahm 01 et al] ð Ü E. g. Adults (18 -64) becomes adult Semantic relationships among complex labels need to be established ð ð E. g. , synonymy, hypernymy/ hyponymy Main issues à à à Thesauruses provide semantic relationships only for individual content words (e. g. , Word. Net [Fellbaum 98]) How to show that Area of Study is a synonym of Field of Work in the Job domain? How to show that Class is a hypernym of Class of Tickets in the Airline domain? E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 11
Preliminaries Ü Manipulation of labels ð A label is seen as a set of normalized content words à E. g. , {area, study} corresponds to Area of Study E. g. , {field, work} corresponds to Field of Work à Area of Study is a synonym of Field of Work à Ø Ø Ü Area is synonym of Field (by Word. Net) Study is synonym of Work (by Word. Net) Most descriptive vs. most general labels ð e. g. Category, Job Category, Area of Work, Function à à Category and Function – too general Job Category and Area of Work – descriptive, avoids confusion E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 12
Consistent Labeling of Groups of Fields Ü Assumption: ð Ü Ü The labels given by a query interface for the fields in the same group are consistent Organize the labels of a group in a relation-like form, called group relation General idea to build a consistent solution: ð Combine multiple rows of consistent labels until a label is assigned to each field in the group Cluster/schema c_Adult c_Child aa Adults Children airfareplanet Adult Child Airtravel Adult Child Adults Children British c_Seniors Economytravel E. Dragut et al Meaningful Labeling of Integrated Query Interfaces c_Infants Page 13
Consistent Labeling of Groups of Fields Ü Levels of Consistency ð String Level à ð Equality Level à ð Two distinct tuples belong to this level of consistency if they have the same label for a cluster in the group relation Two distinct tuples belong to this level of consistency if they have equal labels for a cluster in the group relation Synonymy Level à Two distinct tuples belong to this level of consistency if they have synonym labels for a cluster in the group relation Cluster/schema c_Num. Connections aa Non. Stop Choose an Airline airfare Number of Connections Airline Preference alldest cheap c_Ticket. Class of Ticket Max Number of Stops msn E. Dragut et al Meaningful Labeling of Integrated Query Interfaces c_Airline Preferred Airline Preference Class Airline Page 14
Consistent Labeling of Internal Nodes Ü The problem ð Ü Given an internal node in the integrated interface, determine a label that is semantically suitable for it, i. e. its semantic is rich enough to cover the semantics of all its descendant leaf nodes An example ð a fragment of the integrated interface of real Estate domain E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 15
Consistent Labeling of Internal Nodes Ü In assigning labels to internal nodes we mainly exploit two types of knowledge: ð ð Ü The semantic relationship among the labels of the internal nodes in the individual schema trees The relationship between internal nodes of source schema trees with overlapping sets of descendent leaves The two types of knowledge are employed to derive a set of logical inference rules among the textual labels ð Some of them will be exemplified next E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 16
Consistent Labeling of Internal Nodes Ü First logical inference ð Informally, consider two internal nodes v 1 and v 2 of two distinct source schema trees with the property that: à à ð Ü v 1’s set of descendant leaves is a subset of v 2’s set of descendant leaves nodes, and v 1’s label is a hypernym of v 2’s label Then the labels of the two nodes are semantically equivalent within the given domain of discourse An example: E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 17
Consistent Labeling of Internal Nodes Ü Second logical inference (the idea): ð Ü The same label is assigned to internal nodes in multiple source query interfaces and the descendant leaves of each such internal node are among those of the internal node in the integrated interface for which a label is sought. An example: ð Fragment integrated query interface E. Dragut et al Meaningful Labeling of Integrated Query Interfaces ð Within source query interfaces Page 18
Consistent Labeling of Internal Nodes Ü Third logical inference (hypernymy scenario) ð Informally, consider two internal nodes v 1 and v 2 of two distinct source schema trees with the property that: à ð Ü v 1’s label is a hypernym of v 2’s label Then v 1’s label semantically covers the union of the descendant nodes of the two nodes. An example: ð Fragment integrated query interface E. Dragut et al Meaningful Labeling of Integrated Query Interfaces ð Within source query interfaces Page 19
Where can the instances help? Ü Discard labels as values ð ð Ü The problem is known as schema element name as value [Xu 03, Dhamankar 04] Example, in the Book domain labels like Hardcover or Paperback are data instances of fields with labels like Format or Binding Reconcile most general vs. most descriptive ð The idea is to bound the meaning of the most general label to a more descriptive one E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 20
Experiment Ü Setup ð Seven real world domain: Domain # interfaces Avg. # fields per interface Avg. # internal nodes per interface Avg. depth of interfaces Airfare 20 10. 7 5. 1 3. 6 Automobile 20 5. 1 1. 7 2. 4 Book 20 5. 4 1. 3 2. 3 Job 20 4. 6 1. 1 2. 1 Real Estate 20 6. 5 2. 4 2. 7 Car Rentals 20 10. 4 2. 5 Hotels 30 7. 6 2. 4 2. 3 Ü Used also in Wu 04 et al, Madhavan 05 et al, Dragut 06 at al E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 21
Experiment Ü Human Acceptance ð Questions asked: à à à ð Do you have any difficulty in filling in an entry for each field? If you do, identify the fields you had difficulty filling in. Are the fields understandable on the source interfaces? 11 Survey respondents reported the following: Domain Labeling Quality Human Acceptance Ignoring Inherited Ambiguity Airfare 53. 0% 96. 6% 98. 3% Automobile 79. 7% 100. 0% Book 83. 3% 98. 9% 100. 0% Job 80. 0% 100. 0% Real Estate 79. 1% 97. 8% Car Rentals 52. 5% 97. 9% 98. 2% Hotels 70. 1% 95. 3% 96. 1% E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 22
Example Integrated Interfaces Ü Airfare domain integrated interface The source query interface Four people found the group confusing E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 23
Example Integrated Interfaces Ü Auto domain integrated interface Ü No surveyed person has identified any problem for this integrated query interface E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 24
End Ü Please visit the project web site ð http: //www. cs. uic. edu/~edragut/QIProject. html Thank you for your time and patience! E. Dragut et al Meaningful Labeling of Integrated Query Interfaces Page 25
875b47f71a328ad39dc35655ba611774.ppt