ddc7556825c7cc10cbec011f2e3f64fe.ppt
- Количество слайдов: 16
Ranking Tweets Considering Trust and Relevance Srijith Ravikumar, Raju Balakrishnan, and Subbarao Kambhampati Arizona State University 1
• One of the most prominent micro-blogging service. • Twitter has over 140 million active users and generates over 340 millions tweets daily and handles over 1. 6 billion search queries per day. • Users access tweets by following other users and by using the search function. 2
Twitter Search Results for the Query: “Britney Spears” • Sorted by Reverse Chronological Order • Select the top retweeted single tweet as the top Tweet. • Does not apply any relevance metrics. • Contains spams and untrustworthy tweets. 3
Tweet. Rank Query Top K Results Query Tweet Rank Top N Results Acts as a mediator between User and Twitter K is much higher than N and thereby we are able to eliminate untrustworthy results. 4
Need for Relevance and Trust Spread of False Facts in Twitter has become an everyday event • Re-Tweets and users can be bought. • Thereby making relying on those for trustworthiness does not work. 5
Getting Relevant & Trustworthy Results • Manual curation is out of question. . (unless you are Government of China : -) ) - How many would it take to clean up a micro-blog with 140 million active users? • Automated analysis? -Page Rank uses the explicit links between the Web Pages for evaluation of Trust and Relevance. But what are the links between tweets? 6
Links in Twitter Space Retweet Agreement Re-Tweet: Explicit links between tweets Agreement: Implicit links between tweets that contain the same fact 7
Agreement • Agreement between two tweets is defined as amount of similarity in their content. Retweets are not considered in Agreement as Retweets are unverified endorsements. • How does agreement Capture Relevance and Trust? • - A tweet which is agreed upon by a large number of other tweets is likely to be popular. The popular tweets are more likely to be Relevant. -Since agreement does not include retweets, most agreed tweet has most number of independent users agreeing on the same fact and hence they are more trustworthy. 8
Agreement Computation • For efficient computation of agreement we need to understand the meaning of each tweet. This need Natural Language Processing. • As a preliminary idea, we compute agreement using Soft TF-IDF with Jaro-Winkler similarity. • Soft TF-IDF is similar to TF-IDF except it considers similar tokens in two compared document vectors in addition exactly similar terms. 9
Computing Ranked Results • • • Simple voting technique is used to compute the Ranked Results. The Agreement of a tweet is the sum of the agreement with all others tweets. The tweets are sorted according to Agreement voting and Top-N results are send to user. 10 1. 3. 7 . 6 1 2. 4 0. 0 3 1. 0
Results: Britney Spears Twitter Results Tweet. Rank Results (Oops? !) Britney Spears is Engaged. . . In entertainment: Britney Spears Again! - its britney: engaged to marry her longtime http: //t. co/1 E 9 Lsa. H 7 boyfriend and former agent Jason Trawick. RT @GMA: Britney Spears Engaged #Britney #Spears #engaged to Again http: //t. co/5 Ly 0 lga 4 #boyfriend: #report: LOS ANGELES (Reuters) - Pop star Britney. . . http: //t. co/Pi. VU Britney Spears engaged: http: //t. co/gp. QQ 2 S 6 I" Congratulations to Britney Spears and her beau Jason Trawick for getting engaged via a 3. 5 carat ring! We are certainly happy for her! 11
Evaluation - Relevance • Top N results where manually labelled as follows: Not related to the topic or spam 0 Remotely Relevant to the topic 1/3 Tweets which have some information on the topic 2/3 Tweets which have good amount of information 1 12
Evaluation - Trust • Top N results where manually labelled as follows: Untrustworthy tweets such as spam or wrong facts Tweets which are opinions Tweets which contain correct facts -1 0 1 13
Ranking Cost • The time increases quadratically with the number of tweets. • Since the computation of agreement is pairwise it can be easily parallelized using Map. Reduce. 14
Twitter Eco-System Tweeted URL Tweeted By Followers Hyperlinks 15
Summary Ø Micro-blog spamming is increasingly becoming lucrative and problematic. ØWe are working on a ranking sensitive to trustworthiness and relevance of Micro-blogs. ØWe model the tweet space as a tri-layer graph; containing tweet layer, user layer and web-page layer. ØRanking is derived based on users, tweets, and prestige of the referred web pages. 16


