Скачать презентацию Don t accept the limits of Google Presentation for Скачать презентацию Don t accept the limits of Google Presentation for

c29b9e47225f9fbe6a88eed657a21a1f.ppt

  • Количество слайдов: 25

Don’t accept the limits of Google! Presentation for the Energy Institute April 2009 Terry Don’t accept the limits of Google! Presentation for the Energy Institute April 2009 Terry Kendrick Information Now Limited terry. [email protected] com 01603 628818

Google enough? Biggest? Comprehensive? And enough for any searcher? Best? -ease of use? -sources? Google enough? Biggest? Comprehensive? And enough for any searcher? Best? -ease of use? -sources? 90% plus market share for search

Google the biggest? (sometimes but not always …. ) “Terry Kendrick” (hits) Yahoo. com Google the biggest? (sometimes but not always …. ) “Terry Kendrick” (hits) Yahoo. com 3230 Altavista. com 3240 Live. com 2, 900000 Google. com 3040 Ask. com 554 Source: Search 27 April 2009 19. 00 Hmmm… but how many hits can you really see anyway?

Google the biggest? (sometimes but not always …. ) “Terry Kendrick” (hits) Yahoo. com Google the biggest? (sometimes but not always …. ) “Terry Kendrick” (hits) Yahoo. com 5, 620 Altavista. com 5, 470 Live. com 2, 320 Google. com 2, 690 Ask. com Hmmm… but how many hits can you really see anyway? 428 Cuil – 3, 126 Source: Search 12 October 2008 20. 50

Google best? • Google is great for coverage and accessibility. Academic library resources are Google best? • Google is great for coverage and accessibility. Academic library resources are better quality : Brophy, J. , & Bawden, D. (2005). Is Google enough? Comparison of an internet search engine with academic library resources. Aslib Proceedings, 57(6), 498 -5

Comprehensive and all you need? • “There is nothing in this study to explain Comprehensive and all you need? • “There is nothing in this study to explain why web users seem to greatly prefer the Google search engine, since overall the performance of Google and Yahoo is more or less equivalent, and ahead of their competitors. We therefore suppose that the reasons go beyond the criteria of relevance of results” – Jean Veronis. University of Provence “Comparative Study of Six Search Engines”. 2006

Limits of Google • Doesn’t have everything on the web in its cache • Limits of Google • Doesn’t have everything on the web in its cache • Doesn’t show you everything it has got in its cache • Other search engines may have some different material • Even “breaking” Google will only give you up to around 1000 hits per search • Advanced Search is better done directly into the search line rather than through the mask • (But it’s still an excellent search engine!)

First page results – Google, Microsoft, Yahoo, Ask • Among 12, 570 random user-defined First page results – Google, Microsoft, Yahoo, Ask • Among 12, 570 random user-defined queries just over 1 percent of first page search results were the same across the engines – The percent of total results unique to one search engine was 88. 3 percent. – The percent of total results shared by any two search engines was 8. 9 percent. – The percent of total results shared by three search engines was 2. 2 percent. – The percent of total results shared by the top four search engines was 0. 6 percent. Source: Dogpile, April 2007 Research by: Queensland University of Technology and Pennsylvania State University

Despite Dogpile’s self supporting research there’s a high overlap in the first ten pages Despite Dogpile’s self supporting research there’s a high overlap in the first ten pages or so though, right? Intuitive …. But is it really the case? See: http: //ranking. thumbshots. com/

“Must See” Search engines (all. com unless noted otherwise) • • • Yahoo Altavista “Must See” Search engines (all. com unless noted otherwise) • • • Yahoo Altavista Alltheweb Google Live Ask BBC Searchme Cuil • • Trovando. it Exalead Quintura A 9 …… . • • • Ixquick Vivisimo / Clusty Mamma Dogpile ez 2 www Surfwax Webcrawler Fazzle Killerinfo Icerocket • • Zuula Mahalo Toolbe Baidu (China)/ Yandex (Russia) • • • Altsearchengines ( top 100) http: //altsr. us/ www. thesearchrace. com/

… don’t forget specialist search engines Examples: www. zoominfo. com People /company summary www. … don’t forget specialist search engines Examples: www. zoominfo. com People /company summary www. base-search. net. Academic search engine www. searchmil. com/ Military search engine … but good for tools and techniques www. truveo. com – video search engine www. questia. com –”world’s largest online library” www. archive. org – includes “wayback machine” www. seeqpod. com / www. songza. com – playable audio files www. bandsintown. com Gigs www. masterseek. com – business directory

Human web: blogs, newsgroups and mailing lists • • • www. boardreader. com www. Human web: blogs, newsgroups and mailing lists • • • www. boardreader. com www. twazzup. com www. bloogz. com www. blogpulse. com www. feedster. com www. technorati. com Searching them • http: //groups. google. com/groups • http: //google. com/blogsearch • …also Dark Net (see www. darknet. com) such as Bittorrents

Google’s view on the size of the web • “Recently, even our search engineers Google’s view on the size of the web • “Recently, even our search engineers stopped in awe about just how big the web is these days — when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1, 000, 000) unique URLs on the web at once! • the number of individual web pages out there is growing by several billion pages per day. • So how many unique pages does the web really contain? We don't know; we don't have time to look at them all! : -) Strictly speaking, the number of pages out there is infinite -- for example, web calendars may have a "next day" link, and we could follow that link forever, each time finding a "new" page. We're not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what's a useful page, and there is no exact answer. • We don't index every one of those trillion pages -- many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers. But we're proud to have the most comprehensive index of any search engine, and our goal always has been to index all the world's data. ” Google Blog http: //googleblogspot. com/2008/07/we-knew-web-was-big. html

How big is the deep web? “The Deep Web covers somewhere in the vicinity How big is the deep web? “The Deep Web covers somewhere in the vicinity of 900 billion pages of information located through the World Wide Web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 8 billion pages at the time of this writing. ” Source: Deep Web Research 2006 by Marcus P. Zillman Published January 15, 2006 Fall 2007 data: • Google. com indexes 12. 5 billion public web pages. • 71 billion static web pages are publicly-available. These pages can easily be found by Google and other search engines. 6. 5 billion static pages are hidden from the public. As private intranet content, these are the corporate pages that are only open to employees of specific companies • • 220+ billion database-driven pages are completely invisible to Google therefore = 6% of the internet ? http: //netforbeginners. about. com/cs/secondaryweb 1/a/secondaryweb. htm

Invisible Web includes key information resources… • Databases – – E. g. Companies House Invisible Web includes key information resources… • Databases – – E. g. Companies House Library catalogues Picture collections “Mash –ups” Password protected/ subscription sites – E. g. Newspaper archives

Example databases (many invisible web) • • • www. oscars. org http: //vads. ahds. Example databases (many invisible web) • • • www. oscars. org http: //vads. ahds. ac. uk/collections/ST. html www. a 2 a. org. uk www. ipo. gov. uk www. ncjrs. gov/abstractdb/Search. asp http: //businesscreditusa. com/index. asp http: //plants. ifas. ufl. edu/search 80/Net. Ans 2/ www. allmusic. com/ http: //aad. archives. gov/aad/ www. eric. ed. gov • www. istl. org/01 -winter/internet. html

Mashups and podcasts • • • www. folkestonegerald. com/map/ www. chicagocrime. org/map www. housingmaps. Mashups and podcasts • • • www. folkestonegerald. com/map/ www. chicagocrime. org/map www. housingmaps. com www. yourhistoryhere. com www. ufomaps. com www. gypsymaps. com Google maps • www. programmableweb. com/matrix www. ipodder. org ; http: //britcaster. com/ www. podcast. net; www. podcastcentral. com; Subject specific example: www. jodcast. net/amp/index. html Podcasting:

Video streaming • • • www. researchchannel. org www. britishpathe. com http: //mitworld. mit. Video streaming • • • www. researchchannel. org www. britishpathe. com http: //mitworld. mit. edu/index. php http: //web. sls. csail. mit. edu/lectures/ http: //videolectures. net www. monkeysee. com/ Academic • www. loc. gov/film/arch. html • www. mediachannel. com • http: //showbiz. quickfound. net/video_search_and_news. html Community • www. youtube. com • www. veoh. com

Open access repositories • www. doaj. org/ • http: //oaister. umdl. umich. edu/o/oaister/viewcolls. html Open access repositories • www. doaj. org/ • http: //oaister. umdl. umich. edu/o/oaister/viewcolls. html • www. freefulltext. com • www. arl. org/sparc/repos/ir. html • http: //archives. eprints. org/ • www. sherpa. ac. uk • http: //re. cs. uct. ac. za// • www. hw. ac. uk/libwww/irn 142/irn 142. html large list • http: //www. interdok. com/dopp/search. cfm -conference proceedings, not free access

What if? • The bot visits the site but goes away before doing the What if? • The bot visits the site but goes away before doing the whole site (eg parts of pages, number of pages)? • Page author used a “No robots” command? • The material was put up last week or is real time? • The content is dynamically generated (cgi asp and others) • Material is graphic or embedded deep (e. g ppt notes pages) • Spelling is wrong! (e. g Mary J Bilge) • Other reasons!

How invisible is the invisible web? • http: //oedb. org/library/college-basics/research-beyond-google “Research Beyond Google: 119 How invisible is the invisible web? • http: //oedb. org/library/college-basics/research-beyond-google “Research Beyond Google: 119 Authoritative, Invisible, and Comprehensive Resources” • • www. completeplanet. com/ (and Brightplanet – little out of date)) http: //virtualchase. com/search_engines/databases. html www. freepint. com/gary/direct. htm (very out of date) www. deepwebresearch. info (up to date – incredibly detailed often techy) • • www. turbo 10. com (Hmm…. . ) www. incywincy. com www. deepdyve. com • • http: //www. osti. gov/media/deep. Web. WM_256. html www. enth. com www. iage. com/invisible. html www. weblens. org/invisible. html • www. deepweb. us • www. llrx. com/features/deepweb 2009. htm • http: //library. laguardia. edu/invisibleweb/webography • Federated search –Deep Web Technologies • Long shot ……… “Search our database” [subject term] – Database [subject term] How do I find these “invisible” resources

Virtual libraries / Gateways / Portals Examples: • www. hw. ac. uk/lib. WWW/irn/pinakes. html Virtual libraries / Gateways / Portals Examples: • www. hw. ac. uk/lib. WWW/irn/pinakes. html • www. intute. ac. uk • www. loc. gov/rr/askalib/virtualref. html • www. loc. gov/rr/international/portals. html • www. lii. org

Different types of subject gateways • • www. tasi. ac. uk/advice/using/finding. html http: //www. Different types of subject gateways • • www. tasi. ac. uk/advice/using/finding. html http: //www. kidsclick. org/ http: //yahooligans. yahoo. com/ www. ala. org/gwstemplate. cfm? section=greatw ebsites&template=/cfapps/gws/default. cfm • www. anthus. com/Cyber. Dewey. ht ml • http: //library. bendigo. latrobe. edu. au/irs/web cat/ddcindex. htm • http: //listverse. com

Google on the future Coming up with elegant, fitting and relevant solutions to meet Google on the future Coming up with elegant, fitting and relevant solutions to meet the challenges of mobility, modes, media, personalization, location, socialization, and language will take decades. Search is a science that will develop and advance over hundreds of years. Think of it like biology and physics in the 1500 s or 1600 s: it’s a new science where we make big and exciting breakthroughs all the time. ……. Just like biology and physics several hundred years ago, the biggest advances are yet to come. That’s what makes the field of Internet search so exciting. http: //googleblogspot. com/2008/09/future-ofsearch. html