Скачать презентацию You Are What You Link Lada Adamic Eytan Скачать презентацию You Are What You Link Lada Adamic Eytan

f0d80886d70492ef3d9132ca7db8ef5e.ppt

  • Количество слайдов: 31

You Are What You Link Lada Adamic Eytan Adar WWW 10 – May, 2001 You Are What You Link Lada Adamic Eytan Adar WWW 10 – May, 2001

Outline Graph structures of social networks • How person to person links on the Outline Graph structures of social networks • How person to person links on the web create observable social networks Understanding and predicting links • Additional online info (text, links, email subscriptions) gives context to social links • Predict social links even where there is no explicit hyperlink. Understanding communities through links

Julie nd rie tf es m Hey, I’m Becky. I study. . . I Julie nd rie tf es m Hey, I’m Becky. I study. . . I live in. . . My favorite books are. . . Here are some photos. . . e li Ju My favorite links: yb I’m studying. . . I like. . . My friends are. . . m Hi, I’m Julie! yr oo m ie Be ck y Becky

Becky and Julie aren’t the only ones to link to each other Becky and Julie aren’t the only ones to link to each other

Stanford Social Web Stanford Social Web

Graph Structure of Social Networks Graph Structure of Social Networks

Differences in cohesiveness of communities Stanford MIT Differences in cohesiveness of communities Stanford MIT

Links among personal homepages at MIT and Stanford MIT Stanford Users with non-empty WWW Links among personal homepages at MIT and Stanford MIT Stanford Users with non-empty WWW directories 2302 7473 Percent with links in either direction 69% 29% Percent with links in both directions 22% 7%

The number of links/person is uneven Interesting social networks analysis The number of links/person is uneven Interesting social networks analysis

Largest connected component MIT: 86% Stanford: 58% Largest connected component MIT: 86% Stanford: 58%

Shortest path from one person to another MIT: 6. 4 hops Stanford: 9. 2 Shortest path from one person to another MIT: 6. 4 hops Stanford: 9. 2 hops

Clustering Coefficient C= # of links among neighbors max # links among neighbors C= Clustering Coefficient C= # of links among neighbors max # links among neighbors C= 3 4*3/2 MIT: Stanford: = 1 2 0. 21 70 x that of a random graph!

Understanding and Predicting Links Understanding and Predicting Links

Information available online email list common text outlink inl ink outlink k n nli Information available online email list common text outlink inl ink outlink k n nli i

How information was collected User’s web directories were crawled Outlinks were extracted Text was How information was collected User’s web directories were crawled Outlinks were extracted Text was passed through Thing. Finder to extract things like people, places, companies Mailing list subscriptions were obtained from the mailing list servers (95% public for Stanford, internal to MIT) Inlinks were obtained by querying search engines: Google for Stanford Alta. Vista for MIT (equivalent urls)

Comparison with traditional means of gathering information on social networks Advantages Easily and automatically Comparison with traditional means of gathering information on social networks Advantages Easily and automatically gathered (no phone, live, or mail surveys). Data sets are orders of magnitude larger. Information is already public. Disadvantages Data sets are incomplete i. e. you don’t get to ask the questions, just take down the answers

Friends have more in common I love Prince! Prince is the coolest! I live Friends have more in common I love Prince! Prince is the coolest! I live in Terra House Find me in Terra. I play volleyball Wanna play volleyball? I play basketball I live in Kimball I play a lot of computer games

http: //negotiation. parc. xerox. com/web 10/ http: //negotiation. parc. xerox. com/web 10/

So can we guess who’s friends with whom from the information gathered online? • So can we guess who’s friends with whom from the information gathered online? • Choose person A • Rank everybody else according to their likeness to that person • See how “friends” (people who are linked to A) were ranked. • Evaluate for text, outlinks, inlinks, mailing lists separately

Example, top matches for a particular user annaken: Clifford Hsiang Chao Linked (friends) Likeness Example, top matches for a particular user annaken: Clifford Hsiang Chao Linked (friends) Likeness score Person NO 8. 25 Eric Liao YES 3. 96 John Vestal NO 3. 27 Desiree Ong YES 2. 82 Stanley Lin NO 2. 66 Daniel Chai NO 2. 55 Wei Hsu YES 2. 42 David Lee NO 2. 41 Byung Lee

Coverage in ability to predict user-user links i. e. friends had at least one Coverage in ability to predict user-user links i. e. friends had at least one item in common

Performance of friend matching algorithm method inlinks 6. 0 outlinks 14. 2 mailing lists Performance of friend matching algorithm method inlinks 6. 0 outlinks 14. 2 mailing lists 11. 1 text Stanford average rank 23. 6 The most common ranking for a friend is #1 method MIT average rank inlinks 9. 3 outlinks 18. 0 mailing lists 22. 0 text 31. 6

Stanford we don’t have that much in common with our friend’s friends Stanford we don’t have that much in common with our friend’s friends

Understanding Communities Through Links Understanding Communities Through Links

What are good and bad link predictors? • What you would expect… • Very What are good and bad link predictors? • What you would expect… • Very unique things are only relevant to individuals • Very general things (“MIT” “Stanford”) are relevant to everyone • Some top 10 lists…

Text Based Predictors MIT Top Things Stanford Top Things Union Chicana (student group) NTUA Text Based Predictors MIT Top Things Stanford Top Things Union Chicana (student group) NTUA (National Technical University of Athens) Phi Beta Epsilon (fraternity) Project Aiyme (mentoring Asian American 8 th graders) Bhangra (traditional dance, practiced within a club at MIT) pearl tea (popular drink among members of a sorority) neurosci (appears to be the journal Neuroscience) clarpic (section of marching band) Phi Sigma Kappa (fraternity) KDPhi (Sorority) PBE (fraternity) technology systems (computer networking services) Chi Phi (fraternity) UCAA (Undergraduate Asian American Association) Alpha Chi Omega (sorority) infectious diseases (research interest) Stuyvesant High School viruses (research interest) Russian House (living group) home church (Religious phrase) • Bad phrases: general organizations, cities (Oakland, Cambridge, etc), departments (CS)

Out-link Based Predictors MIT Top Out-links Stanford Top Out-links MIT Campus Crusade for Christ* Out-link Based Predictors MIT Top Out-links Stanford Top Out-links MIT Campus Crusade for Christ* alpha Kappa Delta Phi (Sorority)* The Church of Latter Day Saints National Technical University Athens The Review of Particle Physics Ackerly Lab (biology)* New House 4 (dorm floor, home page)* Hellenic Association* MIT Pagan Student Group* Iranian Cultural Association* Web Communication Services* Mendicants (a cappella group)* Tzalmir (role playing game)* Phi_Kappa_Psi (fraternity)* Russian house (living group) comedy team * Magnetic Resonance Systems Research Lab* Sigma Chi (fraternity)* Applications assistance group* La Unión Chicana por Aztlán ITSS instructional programs* • Worst ranked sites are search engines and portals (Altavista, Lycos, Yahoo, etc. ), and top level homepages such as www. mit. edu and www. stanford. edu.

In-link Based Predictors • The top predictors are almost exclusively individual home pages pointing In-link Based Predictors • The top predictors are almost exclusively individual home pages pointing to lists of friends • Poor predictors: Long lists (all homepages, department listings)

Mailing List Based Predictors MIT Top Mailing Lists Stanford Top Mailing Lists Summer social Mailing List Based Predictors MIT Top Mailing Lists Stanford Top Mailing Lists Summer social events for residents of specific dorm floor Kairos 97 (dorm) Religious group mendicant-members (a cappella group) Religious group Cedro 96 (dorm summer mailing list) Religious group first-years (first year economics doctoral students) Intramural sports team from a specific dorm local-mendicant-alumni (local a cappella group alumni) Summer social events for residents of specific dorm floor john-15 v 13 (Fellowship of Christ class of 1999) Religious a cappella group stanford-hungarians (Hungarian students) Intramural sports team from a specific dorm serra 95 -96 (dorm) “…discussion of MIT life and administration. ” metricom-users (network services employees who use metricom) Religious group science-bus (science education program organized by engineering students) • Bad lists: General announcement lists at MIT, nonhousing based activities (theater), job lists

Future Work • Use other pieces of available information • demographic information (where people Future Work • Use other pieces of available information • demographic information (where people live, department, year, etc. ) • combine information • Label structures (Flake, et. al. 2000) • Given structures determined by graph algorithms • Label them using extracted information

Summary • Homepage graph structure varies depending on community • Possible to predict (to Summary • Homepage graph structure varies depending on community • Possible to predict (to some degree) where links will exist • Good predictors seem unique to communities