c0689ddb49764b29b718faac8bd13fa2.ppt
- Количество слайдов: 23
Discovery, Analysis and Monitoring of Hidden Social Networks and their Evolution Malik Magdon-Ismail Rensselaer Polytechnic Institute
Our Group Ø Ø M. Goldberg M-I B. Szymanski A. Wallace Students: Ø Mykola Hayvanovich Ø Apirak Hoonlor Ø Stephen Kelley Ø Konstantin Mertsalov 2
Motivation Communications supporting IED planning have patterns and are correlated…. Analysis of the patterns can reveal the groups as well as their internal group structure. 3
Communications Time: January 12, 2005, 09: 35 From: joe@xyz. com To: sue@abc. com Subject: Hello Message: Where have you been? 16: 06: 31]
Streaming Example Time From 10: 00 10: 05 10: 06 10: 12 10: 13 10: 15 10: 20 10: 22 10: 25 10: 31 Alice Charlie Alice Felix Alice Bob Charlie Bob Felix To Message Charlie Golf tomorrow? Tell everyone. Felix Alice mentioned golf tomorrow. Bob Hey, golf tomorrow. Spread the word. Bob Tee off: 8 am at Pinehurst. Grace Hey guys, golf tomorrow. Harry Hey guys, golf tomorrow. Charlie Pinehurst Tee time: 8 am. Elizabeth We’re playing golf tomorrow. Dave We’re playing golf tomorrow. Felix Tee time 8 am at Pinehurst Elizabeth We tee off 8 am at Pinehurst. Dave We tee off 8 am at Pinehurst. Grace Tee time 8 am, Pinehurst. Harry Tee time 8 am, Pinehurst. 5
Streaming Example Time From 10: 00 10: 05 10: 06 10: 12 10: 13 10: 15 10: 20 10: 22 10: 25 10: 31 Alice Charlie Alice Felix Alice Bob Charlie Bob Felix To Message Charlie Golf tomorrow? Tell everyone. Felix Alice mentioned golf tomorrow. Bob Hey, golf tomorrow. Spread the word. Bob Tee off: 8 am at Pinehurst. Grace Hey guys, golf tomorrow. Harry Hey guys, golf tomorrow. Charlie Pinehurst Tee time: 8 am. Elizabeth We’re playing golf tomorrow. Dave We’re playing golf tomorrow. Felix Tee time 8 am at Pinehurst Elizabeth We tee off 8 am at Pinehurst. Dave We tee off 8 am at Pinehurst. Grace Tee time 8 am, Pinehurst. Harry Tee time 8 am, Pinehurst. 6
Streaming Example Time From 10: 00 10: 05 10: 06 10: 12 10: 13 10: 15 10: 20 10: 22 10: 25 10: 31 Alice Charlie Alice Felix Alice Bob Charlie Bob Felix To Charlie Felix Bob Grace Harry Charlie Elizabeth Dave Felix Elizabeth Dave Grace Harry 7
Overview: SIGHTS & RDM Level 2 3 , s e l l . . . 3 , h e l l Pattern id = 3 Pattern = “ 2 trade”bb Level 1 2 t r a d e . . . 2 t r a d e Pattern id = 2 Pattern = “buy, ” Level 0 b u y , t r a d e . . . b u y Higher ranked leaders Group leader Subgroup leaders 8 Members
Communications Ø Email, Telephone, Newsgroup, Weblog, Chatrooms, … Time: January 12, 2005, 09: 35 From: joe@xyz. com To: sue@abc. com Subject: Hello Message: Where have you been lately? 9
Communication Graph January 12, 2005, 09: 35 joe@xyz. com sue@abc. com 10
Communication Graph What are the social groups/coalitions? 11
Social Groups are Clusters Ø Clusters may overlap. 12
Social Groups are Clusters Ø Clusters may overlap. Ø A cluster is a locally defined object. 13
Social Groups are Clusters Ø Clusters may overlap. Ø A cluster is a locally defined object. ØGroup members are more introverted than extroverted. YES NO 14
Social Groups are Clusters Ø Clusters may overlap. Ø A cluster is a locally defined object. ØGroup members are more introverted than extroverted. Ø Social groups (clusters) persist 15
SIGHTS Statistical Identification of Groups Hidden in Time and Space - System for statistical analysis of social coalitions in communication networks Data Sources Blogs Emails (Enron) Chatroom Synthetic data Coalition Discovery Coalition Analysis Visualizations Overlapping Clustering Streaming groups Persistent groups. Leaders Opposing groups Topic matching Size-Density plots Static coalitions Dynamic coalitions Different analyses on dataset Group members Visualization options Size vs. Density Plot Leader index 16 Choose time window Groups matching analyst topic in red
Examples ENRON Ali Baba Data Set (Do. D) Two clusters: Electric circuit design; Optimization of Neural Networks: Intersection: “Sensitivity analysis in degenerate GROUND TRUTH Ø quadratic programming” Citeseer Ø Group A Ø Dog Ø Vulture Ø Camel Ø Yassir Hussein Ø Bird Ø (6 others) Group B Ø Ahmet Ø Saleh Sarwuk Ø Shaid Ø Pavlammed Pavlah Ø Osan Domenik SIGHTS Ø Group A Ø Dog Ø Vulture Ø Camel Ø Gopher Ø Group B Ø Ahmet Ø Saleh Sarwuk Ø Shaid Ø Dajik 17
Recursive Data Mining (RDM) Build a classifier to identify the relationship between sender and receiver of a message EXAMPLE: “Do you have time to meet some time this week? ” “Lets meet 2 pm today, ok? ” Which is advisor, which is student? 18
Pattern Definition Hierarchical Pattern Construction (recursive definition) Captures patterns; patterns of patterns… (can even capture long-range patterns) Pattern id = 4 Pattern = “ 3, _ell” Level 2 3 , s e l l . . . 3 , h e l l Level 1 Level 0 2 t r a d e. b u y , . . 2 t r a d e Pattern id = 2 Pattern = t r a d e “buy, ”. b u y. . Larger patterns Pattern id = 3 Pattern = “ 2 trade” 19
A Classifier – Joining the Pieces Ø Ensemble of classifiers Ø Classifier for each level in the hierarchical approach Ø Features gathered from the training messages Ø Global features include average length and number of sentences Ø Approximate matching allows treatment of noise 20
Results on Enron Binary classification: for a given message m, is m sent by a person with role r? r є {CEO, Manager, Trader, Vice-President} Multi-classification: for a given message m, which role r is the most likely for the sender? r є {CEO, Manager, Trader, Vice-President} The bars show the error of classification. Universally RDM_SVM outperforms other classifiers 21
Summing Up Ø SIGHTS: ØStructural; non-semantic; language independent ØFinds groups, their dynamics and structure; visual analytic capabilities. Ø RDM ØUses statistical semantics; language independent ØIdentifies roles within the group 22
Thank You http: //www. cs. rpi. edu/~magdon 23


