b3a0a4093eb6ddeaffd7d7b22fbcc2c6.ppt
- Количество слайдов: 71
Search and the ‘Net in 2013 Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2013
For today. . . n The Searchscape n Entity-based Search n New Services and Tools n The Social Web n Bing, Blekko, Duck. Go n News from Google n A Privacy Primer n Trends and Future Directions n Linklist http: //people. hws. edu/hunter/searchnet 13 links. htm
America at the Digital Turning Point Center for the Digital Future – USC Annenberg School for Communication www. digitalcenter. org/pdf/CDF_10_year_digital_turning_point. pdf n Longitudinal study over 10 years n Over 2, 000 US households surveyed each year n “…online behavior changes relentlessly. ” n “…constant social connection, unlimited access to information, and unprecedented abilities to purchase. ” n “…online technology creates extraordinary demands on our time, major concerns about privacy, and fundamental questions about the proliferation of the digital realm…”
America at the Digital Turning Point Selected highlights n Americans view the Internet as an important information source, yet many Internet users do not trust much of the information (there) n Our privacy is lost. n Most printed daily newspapers will be gone in about five years. n The sheer overwhelming nature of technology may be reaching a critical point. n Because of online technology, work is increasingly a 24/7 experience.
America at the Digital Turning Point Time spent face-to-face with family in the household since the Internet
The Web Worldwide data from the International Telecommunications Union 2011 n Total Population – ca. 7 b. n Connected to the Web – ca. 2 b. n Mobile subscriptions – ca. 6 b. GLOBAL 5, 981, 000 Developed nations 1, 461, 000 Developing nations 4, 520, 000 n Mobile subscriptions forecast for 2017 n 9 b. with 5 b. mobile broadband connections
New Top Level Domains (ICANN 1/11/12) n. com domains almost exhausted for new website names n “Someone got there first” n New businesses must pay domain brokers for an address or register a new one with unnatural, insignificant words n Now possible to purchase a unique TLD (. mycompany or. ourtrademark or. ourbrand) n Fee - $185, 000 with waiting period of 2 years.
Domain Registration n Currently unrestricted: n. com n. info n. net n. org n Currently require proof of eligibility n. edu. coop. mil n. gov. int. museum n. xxx. aero. asia
Search engines and satisfaction mdgadvertising. com (data from Pew Research) How often do you actually find the information you’re looking for with search engines?
Entity-based Search: Google’s Knowledge Graph Bing’s Satori
Entity-based search The back endn How S. E’s worked until now n Matched query terms to terms in their crawlercreated database n Results refined n n n Linkage patterns Popularity Personalization Other (? ? ? ) Ambiguous terms abound “kings” “jaguar” “Apollo”
Can a system know? ? n “Charles Dickens” n This searcher wants information about and books by him n “Frank Lloyd Wright” n This searcher wants information about and pictures of buildings designed by him
The basics…. n Entity database seeded with a large“bag of nouns” and supplemented with nouns from web crawls identified through natural language processing n These nouns are mapped to another database of information related and/or relevant to those nouns through n. l. p. beyond simple text matches n Results can be customized based on click responses from previous anonymous searches for that query
Yahoo Research paper - 2009 http: //research. yahoo. com/files/pods 09 -woc. pdf n Extract structured data (addresses, prices, item #, etc. ) from web documents and associate it with an entity n Link relationships between entities n An actor to his films and other actors he has worked with n Discover categorizing information in the document’s content Subject headings n Reviews ( : or ) : n Type of food served n
The front endn Google’s Knowledge Graph: n Focused on questions and answers n Contextual box for ambiguous terms with short descriptions n Bing’s Satori: n Focused on potential “actions” associated with the entity n Searchers for a rock band usually want to buy a recording, find lyrics or get tickets n “Snapshot” panel – entity-based results from the social web (yours and others)
Benefits of entity-based search n Greater predictability of searcher satisfaction n Discovers related information that does not contain the search term(s) n Disambiguates many terms n Colocates related information from across the Web in a variety of filetypes
The Long Tail http: //searchengineland. com/search-illustrated-b 2 blong-tail-seo-13237
Future challenges- the “long tail” n Entities are now limited to the most popular topics n Currently no way to map complex queries to an entity or entity group “volcanic eruptions in the 18 th century” n “Lady Gaga concerts in a warm location” n n Currently limited to English only n Including more entities in English and other languages will greatly increase processing and impact response time
New Services and Tools Vertical, Realtime, Metas
i. Seek Education http: //education. iseek. com n Targeted discovery engine for students, teachers and administrators n Sources limited to “university, government and established non commercial providers” n Limited to Safe Search n Lesson plans n Results clusters include Subject areas Specific topics Places State standards Grade levels People
i. Seek Web http: //iseek. com n Small database n i. Seek crawler n Google n Public-contributed “favorites” n Results clusters include Specific topics Organizations People Date & time Places Source n My. Seek (personal account service) problematic (7/15/13)
Topsy – www. topsy. com Real-time search of the social web n Results from Twitter and Google+ n Ranking factors include n How often the page is cited in tweets n “Influence” algorithm n n “Who is listening to you? ” Dynamic process assigns influence score based on § Number of followers § Their influence § How often your tweets are re-tweeted
Topsy – www. topsy. com Real-time search of the social web n Unlike other real time se’s, ranking is based on a deep archive of social media n Trending metrics used in ranking n What’s viral right now? n Experts Search locates authoritative Twitter users on topics of your choice n Advanced search filters Site/domain n Language (10) n Twitter user Date, time posted
Lexipedia (in beta) n Lexigraphic visualization tool based on NLP n Maps parts of speech related to specific terms n Nouns, verbs, adjectives n Gives synonyms, antonyms and “fuzzynyms” eg. happy – well, fortuitous, volitional n Hover for definition and usage examples n Currently available in English, Spanish, German, French, Dutch and Italian (all meanings and usage given in English) n Powered by i. Seek
Zapmeta n Searches Web, Images and Video engines n Web search includes Yahoo, Bing, Gigablast, AV, Entireweb n Results grouped into “concept clusters” n Advanced search offers Full Boolean n Limit by country of page’s origin n Limit by domain type n Highlighting search term(s) n
Polymeta n Web search includes Google, Bing, Ask, Yahoo, Exalead n Source selection available for each search type Web News Images Videos Twitter Blogs n Twitter search is limited to top 50 containing your search terms n n Faceted and graphed results available n Related results from other search types appear to the right
Searchteam. com n Search engine with wiki-like, real-time collaborative work spaces n “Collective knowledge from your trusted social network circles” n Web sites Videos (You. Tube) Images Reference (Wikipedia) Educational Books and Articles (Amazon) n Faceted results and suggested searches n Related main topics n Subtopics n Related searches (suggested)
Searchteam. com n Search. Spaces n Organize and share links n Online forum for collaborative searching with friends n Must search while in a searchspace to add to it n Educational tab not inclusive of all. edu domains n Results counts unreliable
The Shape of Today’s Social Web
Why search the social web? ? ? n Public responses/attitudes/primary sources n Breaking news n Trending topics and people n Latest product reviews n Companies and competition n Security, technology topics (latest virus, etc. ) n Locate individuals and their networks n Who they follow, who follows them n People interested in a topic/hobby n Monitor collaborations
Social Networks in the Egyptian Revolution 1/25/11 -2/11/11 Enabling protesters to become citizen journalists
Mining Today’s Social Web: The trust factors n People you don’t know n Wikipedia n Human-created databases, directories “I need a few good sites on solar energy” n n Mahalo, Ipl 2. org Q&A Services “How do I repair my garage door opener? ” n Yahoo Answers, Answers. com, Mahalo Answers
Mining Today’s Social Web: The trust factors n People you follow n Twitter-human created Tweets “What’s the buzz on Beyonce? ” n People you know n Post a question to friends and family “What type of Mac should I buy? ” n Facebook, Linked. In, Google+, Bing (login via Facebook)
http: //marketingland. com/new-social-discovery-engine-bottlenoseaims-to-take-over-real-time-exploration-17024
Twittermining Some tweets are more “authoritative” than others… n Access to unfiltered, real-time perspective on what people are thinking and doing n Authority (and usefulness) of a tweet depends on Who sent it n The number and “authority” of their followers n When it was sent n Documents/sites it refers to n
Twittermining Tools n Twitter. com n Requires a (free) account n Only the latest 2 weeks available n Searchable by hashtag (#) Author-designated keyword or significant term or phrase n n n #rochester #jobs #marketing
Twittermining Tools n Discover Tab (access via your account) n Launched 5/12 n Offers Personalized content based on your Twitter activity n Favorites, follows, retweets, and more by people you follow n Who to follow -Twitter accounts suggested for you based on who you follow n Browse categories (<25) and people/organizations heavily associated with the categories
. Twittermining Tools n https: //twitter. com/search-advanced n No account required n Only the latest 2 weeks available n Advanced search features n n n Booleans Hashtag Language limit Author search (tweets from or to) “Near this place” Attitude – positive, negative, question
Twittermining Tools n Storify. com n Users build social stories, bringing together media scattered across the Web into a coherent narrative n Access material shared with and by you and public posts n Postings, status updates, photos, videos, podcasts from Twitter, Facebook, You. Tube, Flickr, Instagram and more n Discover others with similar interests n Requires (free) account, via Facebook or Twitter
Social Networks and Results: Users Respond A distraction and concerns about privacy
Established Services: Bing, Blekko, Duck. Go
The Fallacy of the Superior Search Engine Conrad Saam* n Is there a difference in the quality of search results from Google and Bing? n Data set of 100 difficult queries “clean crayon off an led t. v. screen” “Who was Kim Jong Un’s mother? ” “wii new release rumors” *http: //searchengineland. com/google-fails-to-trouncebing-again-the-fallacy-of-the-superior-search-enginerevisited-107238
The Fallacy of the Superior Search Engine n Evaluative factors n Timeliness n One-click access to information n Volume of content n Lack of spam n Authoritative sites appear in first 3 results n The winner? ? ? G. 296 B. 274 n “Bing needs to be a much better search engine than Google to make it worth the switch”
G Source: www. comscore. com
Microsoft’s Bing n Redesigned 6/8/12 n Social search results now located in the new Social Sidebar (Facebook-based) n When logged in through Facebook Ask friends n Friends who might know n People who know n Feed of questions you’ve asked your FB friends through Bing n n Without a FB login Sidebar results come from public posts
What Bing is NOW n Travel- Price Predictor n Video- Hover and get a preview n Music: Artists – All content related to the artist (entity-based search) n Events – Fan. Snap (meta for ticket purchasing) n Shopping – Hottest deals on the web right now n Maps – Malls and Airports added n Everywhere – Xbox, Mobile, i. Pad
Curating the web with Blekko http: //blekko. com n Human/crawler service n Blekko (human) editors create “topic” and “built-in” slashtags used to label content in the Blekko crawler database. n Registered users can create their own tags for any site in the Blekko database for a personal, searchable web n Slashtags help refine results and eliminate spam n Small but well curated database
Blekko this year n Slashtags now automatically added to searches based on aggregated anonymous search behavior. n Adding /monte gives you results from 3 engines; sources revealed only after you select the most relevant results set n Received substantial investment from major Russian search engine Yandex
Duck. Go – http: //ddg. gg n Home and search results pages redesigned n Related “Search Suggestions” on results pages n “Goodies” – user-supplied questions with answers in 20 broad categories Entertainment Food & Drink Travel Programming Sysadmin Web Design
Google – The Highlights
Google+ plus. google. com n Google’s social network (requires a Google account) n Launched 9/19/11 (access to Twitter ended 7/2/11) n Currently over 400 m users, 100 m active on a monthly basis n Facebook currently over 1. 01 b. active users n Offers “hangouts” –video chat rooms within the social network n Businesses and organizations allowed
Google+ n “Google+1” allows Google+ member to give a site a vote of approval n Web search results include +1 votes, sometimes location-based n Best access to content is through Google: n site: plus. google. com search term(s)
Google’s Now An Intelligent Digital Assistant n Turns spoken natural language queries into a search, returning customized answers n Uses search and other data from your mobile devices, g-mail and other Google services n The more personal, contextualizing data accessible, the more customized the answers n Currently available for mobile devices only
Search Lesson Plans and Common Core Standards n Part of Google’s search education initiative n 5 main topics with beginner, intermediate and advanced levels Picking the right search terms n Understanding search results n Narrowing a search to get the best results n Searching for evidence for research tasks n Evaluating credibility of sources n google. com/insidesearch/searcheducation/lessons. html n
Personalization and Social Networks: Search Plus Your World n Boosts in results ranking n Based on IP search behavior (Opt-out) n Based on personal search behavior (Opt-in) n Based on your social networks (Opt-in) n Based on Google+ public posts (Default; multiple steps needed to opt-out) n Based on your private Google+ network posts (Opt-in)
IP-based personalization n To permanently opt-out go to Search Settings n To opt-out on a per-search basis use the toggle (top right) n Personalization based on your personal search behavior is still opt-in
APA Lawsuit settled n 2005 – Association of American Publishers and Mc. Graw-Hill, Person, Penguin, John Wiley, Simon & Shuster allege copyright violation in the Library scanning project n 2012 - Google settles with publishers, who may now remove their books or journals from the Library project n Author’s Guild suit remains unsettled (back in court 5/8/13 -Second Circuit Court of Appeals)
A Privacy Primer SHARING USER INFORMATION HAS BECOME THE INDUSTRY NORM
Search engines and privacy
NSA and Personal Data n Corporations historically restricted to reporting only total number of government information requests n 6/5/13 - Snowden leaks NSA documents on PRISM surveillance program; implicates 9 major Internet corporations n 6/15/13 -Yahoo, MS, Apple, FB successfully petition to disclose certain data on requests; not allowed to specify number of classified FISA requests n 6/18/13 -Google petitions FISA court for permission to reveal requesting agency, time frame and other details of the requests
Google’s policy for its accountbased services n New unified privacy policy in effect 3/1/12 n User profiles and individual search behavior will be shared among all Google services that require a login n Account holders cannot opt-out of this sharing n Separate privacy policies still in effect for Google Books and Chrome
Google’s policy for services not requiring an account n Covers Search, Youtube n IP-based personalization in effect since 2009 n “We will not combine Double-Click cookie information with personally identifiable information unless we have your opt-in consent”
Bing’s privacy policy n For MS services that require a Windows Live ID n “…information collected through one MS service may be combined with information obtained through other Microsoft services. ” n Signing into one service may automatically sign you into other Microsoft services n To opt-out n Use separate browsers for each MS service you access n Sign in and out of your accounts throughout the day to de-couple specific activities
Duck. Go n Does not collect or share personal information n No browser cookies stored n No personally identifiable or IP-based search histories stored n No IP addresses stored n Very comprehensive with high-quality search results
Current Trends and Future Directions
Search Engine Trends in 2012 n Reversal in transparency at the major services n Increasing personalization as the norm n Explosion of social network influence n Stronger anti-competitive allegations n Modest Bing marketshare gains
“The nature of the Internet is undergoing a paradigm shift” – Matthew Berk (Zyxt Labs) http: //zyxt. com/post/26851542949/study-of-1 -3 -billion-urls 22 -of-web-pages-reference n 2012 study of 1. 3 billion URLs n 22% of web pages contain Facebook URLs n Among 500 m. hardcoded links to Facebook only 3. 5 m. are unique n URLs from Common Crawl (open repository of web crawl data that can be accessed analyzed by everyone)
“The Internet is shifting…. ” – M. Berk n from unstructured to structured content Structured content can be parsed and formatted into any other type of content n Unstructured content- static html n from websites to entities n Nodes in social and other networks that contain or link to websites and other content n from links to connection n Growth of business and personal presence on the social web n
In the future --n Mobile search will continue to grow rapidly n Entity-based search will continue to develop n Personalization will grow but more slowly as users better understand the consequences n Social networks will continue as powerful tools for grassroots political movements n Web access and web search will attract more government scrutiny worldwide
Thank You and Enjoy Your Searching! Michael Hunter Reference Librarian Hobart and William Smith Colleges Geneva, NY 14456 (315) 781 -3014 hunter@hws. edu