d61a2f74feab7fef6389a4ce73cdf90e.ppt
- Количество слайдов: 38
Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis Martin Szomszor University of Southampton TAGora: Semiotic Dynamics of Online Social Communities EU -IST-2006 -034721
Outline • Introduction and Motivation – Why is your folksonomy interaction useful? – How could it be exploited? • Making Sense of Folksonomies – Distributed Contact Networks – Tag Filtering / Tag Senses • Profiles of Interests • Future Work – Disambiguation – Building Better Profiles of Interests
Introduction http: //news. bbc. co. uk/ delicious. com http: //slashdot. org/ Dream Theater Metallica Rush
Increasing number of online identities • Recent Ofcom study found that UK adults have on average 1. 6 profiles. 39% of those that have one profile have at least 2 – [Ofcom 2008] Social Networking: A quantative and qualitative research report into attitudes, behaviours, and use. • In the future, people will maintain an increasing number of online identities to meet different information sharing tasks and to connect with different communities
Tag Clouds delicious. com
The Big Picture delicious. com Profile of Interests
Personalisation Profiles could be exported to other sites to improve recommendation quality delicious. com Profile of Interests Better user experience Profiles could be used to support personalised searching
Consolidation and Integration cuba hotels holiday travel 2008 currency http: //dbpedia. org/resource/Cuba http: //dbpedia. org/resource/Travel http: //dbpedia. org/resource/Holiday http: //dbpedia. org/resource/Category: Tourism
Tagging Variation Filtered Tags Raw Tags [1] Szomszor, M. , Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.
Disconnected Identities fan of friend #me contact friend
Making Sense of Folksonomies Tagging Semantics FOAF DBpedia + Wordnet Identity Integration Tag Integration Delicious Last. fm … Flickr Facebook
1. Contact Integration Tagging Semantics FOAF DBpedia + Wordnet Identity Integration Tag Integration Delicious Last. fm … Flickr Facebook
SNS Contact Integration
Consolidated Contact View #me • Recommend new connections
FOAF Representation of SNS Accounts http: //tagora. ecs. soton. ac. uk/Live. Social. Semantics/ht 2009/foaf/4 http: //tagora. ecs. soton. ac. uk/facebook/613077109 <owl#same. As> <http: //tagora. ecs. soton. ac. uk/facebook/613077109> <http: //tagora. ecs. soton. ac. uk/schemas/facebook#has. Friend> <http: //tagora. ecs. soton. ac. uk/facebook/1006466985>, <http: //tagora. ecs. soton. ac. uk/facebook/684541156>, … <http: //tagora. ecs. soton. ac. uk/facebook/1043367866>; http: //tagora. ecs. soton. ac. uk/delicious/martinszomszor http: //tagora. ecs. soton. ac. uk/flickr/7214044@N 08 http: //tagora. ecs. soton. ac. uk/lastfm/mszor
2. Tag Integration Tagging Semantics FOAF DBpedia + Wordnet Identity Integration Tag Integration Delicious Last. fm … Flickr Facebook
Folksonomy Integration Tag Heterogeneity Web 2. 0 != Web_2. 0
Folksonomy Integration: Tag Heterogeneity is. Filtered. To Web 2. 0 Web_2. 0
Tag Filtering • Find canonical form for each tag: – Use Dbpedia entry labels as reference • compound terms separated by _ – second-life, second+life, second. life -> second_life • concatenated / camel case terms are expanded – secondlife, Second. Life -> second_life • International Characters Normalised: – Caf%C 3%A 9 -> Cafe • Recommend Spelling Corrections – resaerch -> did. You. Mean research • Follow unambiguous redirections: – Humor, Funny -> Humour
cooccurring. Tag is. Filtered. To Tag Cooccurrenc. Info has. Cooccurrence. Info xsd: string rdfs: label xsd: integer User. Tag has. Cooccurrence. Frequency has. Domain. Tag tag. Used (f) xsd: integer Domain. Tag has. Next. Segment (f) has. Global. Tag. Segment has. User. Frequency Global. Tag has. Domain. Frequency xsd: integer has. Global. Frequency Final. Tag. Segment has. Tag. Sequence (f) uses. Tag Resource has. Post Tagger tagged. Resource http: //tagora. ecs. soton. ac. uk/schemas/tagging# http: //www. w 3. org/2001/XMLSchema# xsd: datetime tagged. On property subclass (f) = functional property
Linked Data View
Linked Data View
Linked Data View
Linked Data View
Finding Syntactic Variations sparql$ select ? x where { ? x <http: //tagora. ecs. soton. ac. uk/schemas/tagging#is. Filtered. To> <http: //tagora. ecs. soton. ac. uk/tag/web_2. 0>} ┌───────────────────────┐ │ ? x │ ├───────────────────────┤ │ <http: //tagora. ecs. soton. ac. uk/tag/web 2. 0> │ │ <http: //tagora. ecs. soton. ac. uk/tag/web 2> │ │ <http: //tagora. ecs. soton. ac. uk/tag/web_2. 0> │ │ <http: //tagora. ecs. soton. ac. uk/tag/web_20> │ │ <http: //tagora. ecs. soton. ac. uk/tag/web 20> │ └───────────────────────┘ sparql$ select * where { ? x <http: //tagora. ecs. soton. ac. uk/schemas/tagging#is. Filtered. To> <http: //tagora. ecs. soton. ac. uk/tag/second_life>} ┌──────────────────────────┐ │ ? x │ ├──────────────────────────┤ │ <http: //tagora. ecs. soton. ac. uk/tag/second_Life> │ │ <http: //tagora. ecs. soton. ac. uk/tag/second. life> │ │ <http: //tagora. ecs. soton. ac. uk/tag/Second. Life> │ │ <http: //tagora. ecs. soton. ac. uk/tag/Second_Life> │ │ <http: //tagora. ecs. soton. ac. uk/tag/second%20 life> │ │ <http: //tagora. ecs. soton. ac. uk/tag/SECOND_LIFE> │ │ <http: //tagora. ecs. soton. ac. uk/tag/second_life> │ │ <http: //tagora. ecs. soton. ac. uk/tag/secondlife> │ └──────────────────────────┘
Tag Senses • What are the possible meanings for a tag? • We use two reference sets: – DBPedia • Concepts – Wordnet • Synsets
Disambiguation Ontology did. You. Mean has. Wordnet. Sense Tag Word. Sense Dbpedia. Sense. Info has. Dbpedia. Sense. Info http: //www. w 3. org/2006/03/wn/wn 20/schema/ sense. Weight http: //tagora. ecs. soton. ac. uk/schemas/disambiguation# http: //tagora. ecs. soton. ac. uk/schemas/dbpedia# dbpedia. Sense http: //tagora. ecs. soton. ac. uk/schemas/tagging# http: //www. w 3. org/2001/XMLSchema# property xsd: float subclass (f) = functional property Resource
DBpedia Extraction • Extract triples from XML dump – Calculate normalised title string • Caf%C 3%A 9 -> cafe – Calculate concatenated title string • Second_life -> secondlife – Extract disambiguation term from title • Orange_(fruit) – Identify compound labels • Second_Life -> Second, Life
DBpedia Extraction • Number of incoming links • Extract page redirects • Extract Disambiguation Links – Find Primary disambiguation (e. g. Apple)
DBpedia Extraction • Parse wiki text and extract terms: – Terms filtered using stop words (with some wiki specific additions) – Store term frequencies – Store number of distinct terms in page – Store total term frequency • Can associate a vector of terms and weights to each possible sense
has. Compound. Label. Sequence (f) has. Primary. Disambiguation has. Next. Label. Sequence (f) Compound. Label. Sequence xsd: string isa has. Disambiguation Final. Compound. Label. Sequence has. Compound. Label (f) Resource xsd: string xsd: integer Term. Frequency. Pair has. Total. Terms xsd: string xsd: integer has. Normalised. Label xsd: string has. Term. Frequency. Pair has. Label has. Concatenated. Label xsd: integer has. Total. Term. Frequency has. Term. Frequency xsd: string has. Disambiguation. Term
Profiles of Interests [2] Szomszor, M. , Alani, H. , Cantador, I. , O'Hara, K. and Shadbolt, N. (2008) Semantic Modelling of User Interests based on Cross-Folksonomy Analysis. In: 7 th International Semantic Web Conference (ISWC), October 26 th - 30 th, Karlsruhe, Germany.
Global Category View Delicious Flickr Wikipedia Category Total Freq • Wikipedia Category the differences in the interests What are Total Freq 69, 215 Design Travel 51, 674 that are learnt from each domain? Blogs 68, 319 Australia 51, 617 Music 45, 063 London 46, 623 Photography 41, 356 Festivals 42, 504 Tools 35, 795 Music 40, 943 Video 34, 318 Cats 38, 230 Arts 29, 966 Holidays 37, 610 Software 28, 746 Family 37, 100 Maps 26, 912 Japan 36, 513 Teaching 22, 120 Concerts 35, 374 Games 21, 549 Surnames 34, 947 How-to 19, 533 Washington 33, 924 Technology 18, 032 Given Names 32, 843 News 17, 737 Dogs 32, 206 Humor 15, 816 Birthdays 22, 290
Future Work • Given a set of possible senses, how can we choose the best match? • Folksonomy data can provide contextual information: – User tag-cloud – Cooccurrence Network – User Cooccurrence Network • Can abstract this information as a vector of terms and weights (context)
Disambiguating Flickr Images
Building Better Profiles • What tags correspond to interests? – Locations and topics are useful, but other terms are not • TF / IDF Approach – It’s not that useful to find out we are all interested in HTML • Making use of the Category hierarchy – If I’m interested in Facebook, Flickr, Last. fm, Delicious, etc, I can extrapolate the interest Online_Social_Networks
dbpedia: has. Dbpedia. Sense. Info http: //tagora. ecs. soton. ac. uk/tag/apple/sense-info/0 0. 30628910807 dbpedia: sense. Weight owl: same. As http: //tagora. ecs. soton. ac. uk/dbpedia/resource/Apple_Inc. dbpedia: has. Term. Frequency. Pair _: b 9510 f 0000 a 5 dbpedia: has. Term. Frequency “mac” 35 dbpedia: has. Term http: //tagora. ecs. soton. ac. uk/tag/apple/sense-info/1 0. 248912928 dbpedia: sense. Weight owl: same. As http: //tagora. ecs. soton. ac. uk/dbpedia/resource/Apple dbpedia: has. Term. Frequency. Pair _: b 9510 f 0000 a 5 dbpedia: has. Term. Frequency dbpedia: has. Term 41 “fruit”
d61a2f74feab7fef6389a4ce73cdf90e.ppt