Скачать презентацию Mining Test Oracles for Search Engines Wujie Zheng Скачать презентацию Mining Test Oracles for Search Engines Wujie Zheng

6cf3a3a8314cabd097df20d020d3dc73.ppt

  • Количество слайдов: 24

Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse. cuhk. edu. hk 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse. cuhk. edu. hk 1

Outline n n Search Engines Evaluation/Testing Our Approach Data Collection Examples 2 Outline n n Search Engines Evaluation/Testing Our Approach Data Collection Examples 2

Search Engines Evaluation/Testing 3 Search Engines Evaluation/Testing 3

Search Engine Evaluation n Prepare a set of queries and the ground truth, then Search Engine Evaluation n Prepare a set of queries and the ground truth, then evaluate the results of different search engines using well-defined measurements q q How to prepare queries, i. e. , test inputs? How to get the ground truth, i. e. , test oracles? 4

Test Oracles n Previous Approaches q q Manually labeling too costly, hardly reusable Clickthrough Test Oracles n Previous Approaches q q Manually labeling too costly, hardly reusable Clickthrough Data cannot find relevant pages that are not in the search results Automatic labeling based on the search results of multiple search engines at the same time bias to systems of similar characteristics Use previous search results as test oracles desired search results may change 5

Mining Test Oracles from Search Results 6 Mining Test Oracles from Search Results 6

Basic Idea n Mine implicit rules between inputs/outputs, e. g. , q q q Basic Idea n Mine implicit rules between inputs/outputs, e. g. , q q q tvguide. com, => imdb. com; basketball-reference. com, => nba. com ericsson, sony, => sonyericsson. com 7

Build The Dataset n Terms (features) of inputs q q n Terms (features) of Build The Dataset n Terms (features) of inputs q q n Terms (features) of outputs q n Query words Query types Domains of top 10 search results Terms (features) of multiple search engines q Search engine + domains of top 10 search results 8

Example Dataset n pine, furniture, Home. csv, barnfurnituremart. com, americancount ryhomestore. com, overstock. com, Example Dataset n pine, furniture, Home. csv, barnfurnituremart. com, americancount ryhomestore. com, overstock. com, prairiecountryfurniture. com, e tsy. com, unfinishedfurnituregiant. com, cozylogfurniture. com, dir ectfrommexico. com, oakplus. com, sawdustcityllc. com, n buy, wine, online, Food. csv, wine. com, foodandwine. com, market viewliquor. com, winechateau. com, wines. com, thewinebuyer. co m, wineweb. com, alloutwine. com, cellaraiders. com, french-wineonline. com, n piercing, labret, Beauty. csv, wikipedia. org, youtube. com, about. com, ygoy. com, ehow. com, bodyjewelleryshop. com, google. com, bmezine. com, piercingdot. com 9

Example Dataset n interest, rates, today, Finance. csv, real. csv, google: wellsfargo. com, google: Example Dataset n interest, rates, today, Finance. csv, real. csv, google: wellsfargo. com, google: bankrate. com, google: marketwatch. com, google: interest. com, google: mortgagenewsdaily. com, google: u sbank. com, google: mortgage 101. com, google: yahoo. com, google: mortgageloan. com, bing: wellsfargo. com, bing: bankrate. com, bing: marketwatch. com, bing: wsj. com, bing: interest. co m, bing: bankrate. com, bing: usbank. com, bing: yahoo. com, bing: usat oday. com, yahoo: bankrate. com, yahoo: wellsfargo. com, yahoo: ban krate. com, yahoo: interest. com, yahoo: msn. com, yahoo: moneyrates. com, yahoo: cnn. com, yahoo: yahoo. com, yahoo: fxstreet. com, yahoo: marketwatch. com, 10

Association Rule Mining n n A, B, C=>D confidence(A=>D) = support(A, D)/support(A) q q Association Rule Mining n n A, B, C=>D confidence(A=>D) = support(A, D)/support(A) q q bing: mlb. com, => google: mlb. com, support(bing: mlb. com, google: mlb. com)=26, support(bing: mlb. com)=27, confidence(bing: mlb. com, => google: mlb. com, )=26/27 11

Association Rule Mining n n n Mine all frequent itemsets We are most interested Association Rule Mining n n n Mine all frequent itemsets We are most interested in the single postfix rules, i. e. , A=>B, where B’s size is 1 Algorithm q For each itemset S n For each u in S q Check the rule S-u => u 12

Data Collection 13 Data Collection 13

Search Engines n n n Google Bing Yahoo Baidu Sogou Soso 14 Search Engines n n n Google Bing Yahoo Baidu Sogou Soso 14

Queries n n Google trends (hot queries), 1000 queries Queries in KDDCUP 2005, 800 Queries n n Google trends (hot queries), 1000 queries Queries in KDDCUP 2005, 800 queries Google Adwords, 15, 000 queries, 22 types Baidu Tops 15

Examples 16 Examples 16

n n n n dpreview. com, kenrockwell. com, => amazon. com, : 29/29=1. 0, n n n n dpreview. com, kenrockwell. com, => amazon. com, : 29/29=1. 0, violations: test: 37/40, violations: 3881, 4691, 4783, amazon. com, kenrockwell. com, => dpreview. com, : 29/29=1. 0, violations: test: 37/39, violations: 2089, 8921, canon, amazon. com, => canon. com, : 22/22=1. 0, violations: test: 34/38, violations: 4090, 4870, 5384, 7400, canon. com, amazon. com, => canon, : 22/22=1. 0, violations: test: 34/38, violations: 3560, 5409, 8983, 8988, canon. com, Hobbies. csv, => canon, : 31/31=1. 0, violations: test: 31/34, violations: 3560, 5409, 8988, canon. com, dpreview. com, => canon, : 22/22=1. 0, violations: test: 24/26, violations: 5409, 8983, gsmarena. com, samsung. com, => samsung, : 26/26=1. 0, violations: test: 32/35, violations: 852, 1195, 1714, phonenumber. com, => whitepages. com, : 25/25=1. 0, violations: test: 11/12, violations: 1077, Hobbies. csv, nikon, => nikon. com, : 28/28=1. 0, violations: test: 26/28, violations: 896, 8319, canon. com, => canon, : 37/37=1. 0, violations: test: 37/41, violations: 3560, 5409, 8983, 8988, amazon. com, nikon, => nikon. com, : 25/25=1. 0, violations: test: 25/27, violations: 896, 8319, reversephonedirectory. com, Computer. csv, => whitepages. com, : 22/22=1. 0, violations: test: 26/30, violations: 1804, 4424, 5453, 8720, 17

n n n n Internet. csv, ericsson, => sonyericsson. com, : 24/24=1. 0, violations: n n n n Internet. csv, ericsson, => sonyericsson. com, : 24/24=1. 0, violations: test: 23/24, violations: 8776, reversephonedirectory. com, => whitepages. com, : 22/22=1. 0, violations: test: 28/32, violations: 1804, 4424, 5453, 8720, simplyrecipes. com, about. com, => allrecipes. com, : 25/25=1. 0, violations: test: 38/39, violations: 5596, Finance. csv, oanda. com, => xe. com, : 20/20=1. 0, violations: test: 27/30, violations: 3410, 5566, 5781, oanda. com, => xe. com, : 20/20=1. 0, violations: test: 28/31, violations: 3410, 5566, 5781, food. com, foodnetwork. com, => allrecipes. com, : 30/30=1. 0, violations: test: 32/34, violations: 7642, 8519, foodnetwork. com, simplyrecipes. com, => allrecipes. com, : 39/39=1. 0, violations: test: 40/43, violations: 566, 5596, 7642, ericsson, sony, => sonyericsson. com, : 24/24=1. 0, violations: test: 23/24, violations: 8776, myrecipes. com, foodnetwork. com, => allrecipes. com, : 24/24=1. 0, violations: test: 28/30, violations: 2748, 5252, myrecipes. com, allrecipes. com, => foodnetwork. com, : 24/24=1. 0, violations: test: 28/35, violations: 377, 1236, 1335, 1645, 3752, 6655, 6920, phonenumber. com, phone, => whitepages. com, : 20/20=1. 0, violations: test: 8/9, violations: 1077, Food. csv, joyofbaking. com, => allrecipes. com, : 27/27=1. 0, violations: test: 35/36, violations: 566, nikonusa. com, nikon, => nikon. com, : 28/28=1. 0, violations: test: 26/28, violations: 896, 8319, joyofbaking. com, => allrecipes. com, : 27/27=1. 0, violations: test: 35/36, violations: 566, 18

n n n n n mortgageloan. com, => bankrate. com, : 20/21=0. 9523809523, violations: n n n n n mortgageloan. com, => bankrate. com, : 20/21=0. 9523809523, violations: 7719, test: 24/28, violations: 545, 1603, 5073, 7711, Finance. csv, mortgageloan. com, => bankrate. com, : 20/21=0. 9523809523, violations: 7719, test: 24/28, violations: 545, 1603, 5073, 7711, recipes, myrecipes. com, => foodnetwork. com, : 20/21=0. 9523809523, violations: 7778, test: 20/25, violations: 1236, 1335, 6655, 6920, 7770, recipes, myrecipes. com, => allrecipes. com, : 20/21=0. 9523809523, violations: 7778, test: 24/25, violations: 7770, phonearena. com, samsung, => gsmarena. com, : 21/22=0. 9545454546, violations: 3806, test: 33/34, violations: 3802, samsung. com, samsungmobile. com, => samsung, : 21/22=0. 9545454546, violations: 8585, test: 8/10, violations: 1195, 4778, food. com, about. com, => allrecipes. com, : 21/22=0. 9545454546, violations: 2406, test: 43/46, violations: 5740, 7359, 8893, Dining. csv, mcdonalds, => mcdonalds. com, : 21/22=0. 9545454546, violations: 5326, test: 20/22, violations: 3470, 3569, amazon. com, nikon. com, => nikon, : 25/26=0. 9615384616, violations: 7295, test: 25/30, violations: 1256, 5102, 6165, 6744, 7287, nikon. com, Hobbies. csv, => nikon, : 28/29=0. 9655172413793104, violations: 7295, test: 26/31, violations: 1256, 5102, 6165, 6744, 7287, 19

Examples of Multiple Search Engines 20 Examples of Multiple Search Engines 20

n n n n n bing: medicinenet. com, google: emedicinehealth. com, => google: medicinenet. n n n n n bing: medicinenet. com, google: emedicinehealth. com, => google: medicinenet. com, : 107/107=1. 0, violations: symptoms, bing: medicinenet. com, => google: webmd. com, : 55/55=1. 0, violations: Hobbies. csv, yahoo: allrecipes. com, => google: allrecipes. com, : 53/53=1. 0, violations: bing: medicinenet. com, yahoo: nih. gov, => google: medicinenet. com, : 100/100=1. 0, violations: google: amazon. com, bing: gsmarena. com, => google: gsmarena. com, : 52/52=1. 0, violations: bing: gsmarena. com, google: youtube. com, => google: gsmarena. com, : 73/73=1. 0, violations: google, google: google. com, => bing: google. com, : 56/56=1. 0, violations: google: allrecipes. com, recipe, => bing: allrecipes. com, : 55/55=1. 0, violations: bing: medicinenet. com, yahoo: mayoclinic. com, => google: medicinenet. com, : 90/90=1. 0, violations: bing: dpreview. com, bing: amazon. com, => google: dpreview. com, : 56/56=1. 0, violations: 21

n n n bing: medicinenet. com, yahoo: mayoclinic. com, => google: mayoclinic. com, : n n n bing: medicinenet. com, yahoo: mayoclinic. com, => google: mayoclinic. com, : 89/90=0. 988888889, violations: 7124, Home. csv, bing: amazon. com, => google: amazon. com, : 90/91=0. 989010989, violations: 2124, bing: medicinenet. com, yahoo: wrongdiagnosis. com, => google: medicinenet. com, : 90/91=0. 989010989, violations: 8556, bing: webmd. com, yahoo: wrongdiagnosis. com, => google: webmd. com, : 95/96=0. 98958333334, violations: 6305, recipes, yahoo: allrecipes. com, => google: allrecipes. com, : 95/96=0. 98958333334, violations: 6041, bing: mayoclinic. com, bing: nih. gov, => google: mayoclinic. com, : 102/103=0. 9902912621359223, violations: 583, bing: mayoclinic. com, bing: medicinenet. com, => google: medicinenet. com, : 124/125=0. 992, violations: 645, bing: medicinenet. com, bing: webmd. com, => google: medicinenet. com, : 136/137=0. 9927007299270073, violations: 8556, yahoo: nextag. com, bing: amazon. com, => google: amazon. com, : 172/173=0. 9942196531791907, violations: 4773, bing: medicinenet. com, google: mayoclinic. com, => google: medicinenet. com, : 174/175=0. 9942857143, violations: 645, google: walmart. com, bing: amazon. com, => google: amazon. com, : 177/178=0. 9943820224719101, violations: 4773, 22

n n n bing: mayoclinic. com, google: nih. gov, => google: mayoclinic. com, : n n n bing: mayoclinic. com, google: nih. gov, => google: mayoclinic. com, : 143/145=0. 9862068965517241, violations: 1255, 583, bing: amazon. com, yahoo: thefind. com, => google: amazon. com, : 72/73=0. 98630136, violations: 4773, symptoms, bing: webmd. com, => google: webmd. com, : 77/78=0. 9871794872, violations: 6451, yahoo: medicinenet. com, yahoo: wrongdiagnosis. com, => google: medicinenet. com, : 77/78=0. 9871794872, violations: 8556, yahoo: medicinenet. com, yahoo: mayoclinic. com, => google: mayoclinic. com, : 78/79=0. 9873417721518988, violations: 7124, bing: allrecipes. com, yahoo: allrecipes. com, => google: allrecipes. com, : 160/162=0. 9876543209876543, violations: 566, 5601, yahoo: bankrate. com, bing: bankrate. com, => google: bankrate. com, : 82/83=0. 9879518072289156, violations: 6266, Internet. csv, bing: gsmarena. com, => google: gsmarena. com, : 83/84=0. 9880952381, violations: 7617, bing: gsmarena. com, => google: gsmarena. com, : 86/87=0. 9885057471264368, violations: 7617, bing: nextag. com, bing: amazon. com, => google: amazon. com, : 176/178=0. 9887640449438202, violations: 4773, 7343, bing: mayoclinic. com, bing: answers. com, => google: mayoclinic. com, : 89/90=0. 988888889, violations: 6328, 23

Thank you! 24 Thank you! 24