Скачать презентацию Searching through the Internet Dr Eslam Al Maghayreh Скачать презентацию Searching through the Internet Dr Eslam Al Maghayreh

7311d2d77534b4a2bd189a77c1ae7e24.ppt

  • Количество слайдов: 30

Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University 1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University 1

Outline n n n Introduction Information Retrieval Indexing Smarter Internet Searching Examples 2 Outline n n n Introduction Information Retrieval Indexing Smarter Internet Searching Examples 2

Introduction n n Internet has enormous quantity of information: n billions of web pages Introduction n n Internet has enormous quantity of information: n billions of web pages n thousands of newsgroups Two questions face any information seeker: n (1) How can I find what I want? n (2) How can I know that what I find is any good? 3

Information Retrieval n Goal = find documents relevant to an information need from a Information Retrieval n Goal = find documents relevant to an information need from a large document set Info. need Query Document collection Retrieval IR system Answer list 4

Example Google Web 5 Example Google Web 5

Search Engine n Consists of: n the interface you use to type in a Search Engine n Consists of: n the interface you use to type in a query n an index of Web sites that the query is matched with n and a software program (called a spider or bot) that goes out on the Web and gets new sites for the index 6

IR problem n First applications: in libraries (1950 s) ISBN: 0 -201 -12227 -8 IR problem n First applications: in libraries (1950 s) ISBN: 0 -201 -12227 -8 Author: Salton, Gerard Title: Automatic text processing: the transformation, analysis, and retrieval of information by computer Editor: Addison-Wesley Date: 1989 Content: n n n External attributes and internal attribute (content) Search by external attributes = Search in DB IR: search by content 7

Possible approaches 1. String matching (linear search in documents) - Slow 2. Indexing - Possible approaches 1. String matching (linear search in documents) - Slow 2. Indexing - Fast - Flexible to further improvement 8

Query Documents Indexing Query Representation Comparison Function Document Representation Index Results 9 Query Documents Indexing Query Representation Comparison Function Document Representation Index Results 9

Main problems in IR n Query evaluation (or retrieval process) n n To what Main problems in IR n Query evaluation (or retrieval process) n n To what extent does a document correspond to a query? System evaluation n How good is a system? Are the retrieved documents relevant? (precision) Are all the relevant documents retrieved? (recall) 10

Document indexing n n Goal = Find the important meanings and create an internal Document indexing n n Goal = Find the important meanings and create an internal representation Factors to consider: n n Accuracy to represent meanings (semantics) Exhaustiveness (cover all the contents) Facility for computer to manipulate What is the best representation of contents? n n n Coverage (Recall) Word: good coverage, not precise Phrase: poor coverage, more precise Concept: poor coverage, precise Word Phrase Concept Accuracy (Precision) 11

Keyword selection and weighting n How to select important keywords? n n Simple method: Keyword selection and weighting n How to select important keywords? n n Simple method: using middle-frequency words Search engines usually disregard minor words such as "the, and, to, etc. " 12

Result of indexing n Each document is represented by a set of weighted keywords Result of indexing n Each document is represented by a set of weighted keywords (terms): D 1 {(t 1, w 1), (t 2, w 2), …} e. g. D 1 {(comput, 0. 2), (architect, 0. 3), …} D 2 {(comput, 0. 1), (network, 0. 5), …} 13

Retrieval n The problems underlying retrieval n Retrieval model n n How is a Retrieval n The problems underlying retrieval n Retrieval model n n How is a document represented with the selected keywords? How are document and query representations compared to calculate a score? 14

Vector space model n n Vector space = all the keywords encountered <t 1, Vector space model n n Vector space = all the keywords encountered Document D = < a 1, a 2, a 3, …, an> ai = weight of ti in D Query Q = < b 1, b 2, b 3, …, bn> bi = weight of ti in Q R(D, Q) = Sim(D, Q) 15

Matrix representation Document space D 1 D 2 D 3 … Dm Q t Matrix representation Document space D 1 D 2 D 3 … Dm Q t 1 a 11 a 21 a 31 t 2 a 12 a 22 a 32 t 3 a 13 a 23 a 33 … … tn a 1 n a 2 n a 3 n am 1 am 2 am 3 … b 1 b 2 b 3 … amn bn Term vector space 16

Some formulas for Sim Dot product t 1 D Cosine Q t 2 Dice Some formulas for Sim Dot product t 1 D Cosine Q t 2 Dice Jaccard 17

(Classic) Presentation of results n n Query evaluation result is a list of documents, (Classic) Presentation of results n n Query evaluation result is a list of documents, sorted by their similarity to the query. E. g. doc 1 0. 67 doc 2 0. 65 doc 3 0. 54 … 18

IR on the Web n n n No stable document collection (spider, crawler) Duplication IR on the Web n n n No stable document collection (spider, crawler) Duplication Huge number of documents Multimedia documents Multilingual problem … 19

Tips for smarter Internet searching n n Use unique, specific terms Use the minus Tips for smarter Internet searching n n Use unique, specific terms Use the minus operator (-) to narrow the search n n n yarmouk -university Utilize quotation marks, to view "consecutive words of a phrase, " such as "flower arrangement". Enter a short question, such as " what time is it in amman? “, “ 3. 55*4. 5 -11 =“, “who is the king of england? ”, “what is the distance between the sun and earth” 20

Smarter Internet Searching n n n inurl: test results n only test must be Smarter Internet Searching n n n inurl: test results n only test must be found in the web address (URL) allinurl: test results n Both test AND results must be found in the web address. define: n will provide definitions of the words, gathered from various online sources. n define: search engine 21

Smarter Internet Searching n Allintext n Sometimes you get pages that do not have Smarter Internet Searching n Allintext n Sometimes you get pages that do not have your search term/phrase in them. n Why? Because Google also searches for pages that just link to the target page. n Use allintext to get only those pages that have your search terms in them. 22

Smarter Internet Searching n n Allinanchor: n Returns only pages that link to pages Smarter Internet Searching n n Allinanchor: n Returns only pages that link to pages with your search terms, but not in the actual pages. n This is the opposite of allintext. Site: n Limit your search to a specific web site. n Example: n students site: yu. edu. jo filetype: pdf 23

Smarter Internet Searching n n n Don't use common words and punctuation n Common Smarter Internet Searching n n n Don't use common words and punctuation n Common words and punctuation marks should be used when searching for a specific phrase inside quotes Most search engines do not distinguish between uppercase and lowercase Maximize Auto. Complete 24

Smarter Internet Searching n The wildcard operator (*): Google calls it the fill in Smarter Internet Searching n The wildcard operator (*): Google calls it the fill in the blank operator. For example, amusement * will return pages with amusement and any other term(s) the Google search engine deems relevant. n Using a wildcard (*) for a character does not work in Google. cat* returns the same results as cat. 25

Smarter Internet Searching n Related sites: n n For example, related: www. yu. edu. Smarter Internet Searching n Related sites: n n For example, related: www. yu. edu. jo can be used to find sites similar to Yarmouk University site. Specific file type: For example Information retrieval filetype: ppt 26

Examples n Searching for papers n n n YU library Google scholar Searching for Examples n Searching for papers n n n YU library Google scholar Searching for instructor resources n n Morgan Kaufmann Pearson 27

Examples n n n Searching for books to buy n Amazon. com n Ebay. Examples n n n Searching for books to buy n Amazon. com n Ebay. com Searching for items to buy n Electronics: bustbuy. com Searching for hotels n Expedia. com n Priceline. com n Booking. com 28

Examples n Regional search n n Searching for images n n Google jo Google Examples n Regional search n n Searching for images n n Google jo Google images Searching for a job n n Jobsinacademia. net Academickeys. com 29

The End. 30 The End. 30