ff3a0b273718a8bdb67489f4a4905b80.ppt
- Количество слайдов: 23
Structural Link Analysis from User Profiles and Friends Networks: A Feature Construction Approach William H. Hsu, Joseph Lancaster, Martin S. R. Paradesi, Tim Weninger Monday, 26 March 2007 Laboratory for Knowledge Discovery in Databases Kansas State University http: //www. kddresearch. org/KSU/CIS/ICWSM-20070326. ppt First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Link Analysis in Social Networks: The K-State Corpus First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Problem Statement: Link Mining in Social Networks Problem Definition Given: records of users of weblog or social network service Discover Features of entities: users, communities Relationships: friendship, membership, moderatorship Explanations and predictions for relationships Goals Boost precision and recall of link existence prediction Find relevant features Significance: Recommendations (Friendship, Membership) First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Related Work: Link Mining Getoor and Diehl (2005) - Graphical model representations of link structure Ketkar et al. (2005) - Data mining techniques vs graph-based representation Sarkar & Moore (2005) - Change in link structure across discrete time steps Popescul & Ungar (2003) - ER model to predict links Hill (2003), Bhattacharya & Getoor (2004) – Statistical Relational Learning to resolve identity uncertainty Resig et al. (2004) - Predicting IM online times using friends graph degree Mc. Callum et al. (2005) - Inferring roles and topic categories based on link analysis First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Rationale Limitations of Current State of the Art Do not take graph features into account Limited ability to select, extract features Novel Contribution: Link Mining System Extracts, computes features of network model Towards dependent types for relational link mining Rationale Desired functionality: infer new links from old Evaluation: precision, recall for link existence First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
K-State Test Bed: LJMiner Corpus User Interest, Schools, Friends User Contact Info First International Conference on Weblogs And Social Media (ICWSM-2007) Community Membership Info Boulder, Colorado Computing & Information Sciences Kansas State University
Live. Journal Topology [1]: Tools and Security Model © 2007 Denga, Inc. LJMind. Map. com © 2004 mcfnord First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Live. Journal Topology [2]: Definitions First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Graph Features [1]: Node, Pair, Link-Dependent Node-Dependent Features: specific to one node (vertex) within candidate pair Indegree (u) “Source popularity” Outdegree (u) “Source fertility” u u v v Indegree (v) “Target popularity” Outdegree (v) “Target fertility” Pair-Dependent Features: specific to one candidate pair of nodes (vertices) Common entities: interests, friends, schools, etc. Attributes of common entities u v Computed from relational query on entities u, v Link-Dependent Features: specific to one link (edge) in directed graph Past, predicted duration Diagnosed cause First International Conference on Weblogs And Social Media (ICWSM-2007) u v Boulder, Colorado Computed and stored with relationship set Computing & Information Sciences Kansas State University
Graph Features [2]: Node and Pair Features in LJMiner Graph Features First International Conference on Weblogs And Social Media (ICWSM-2007) Interest-Related Features Boulder, Colorado Computing & Information Sciences Kansas State University
LJCrawler System Design Data acquisition: client, injector, parser Ancillary issues Multi-threading Distribution Storage Analytical postprocessing: LJClipper, LJStats Distinguishing features of LJCrawler Results 200 users/second maximum, 5 users/second allowed Approximately 2 million pages crawled First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Network Statistics: Graph Distance 1000 nodes First International Conference on Weblogs And Social Media (ICWSM-2007) 4000 nodes Boulder, Colorado Computing & Information Sciences Kansas State University
Interpretation of Results 941 -node graph (Hsu et al. , 2006): LJCrawler v 1 output 1000 -4000 node graphs: LJCrawler v 2 output First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Results Establishing an Interdisciplinary Research Initiative K-State / KU / UNL collaboration Resources: Linguistic Data Consortium NIST evaluations Involving End Users of Machine Translation Document users Machine learning, data mining, info extraction researchers Novel Applications Social networks and collaborative recommendation Gisting and beyond First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Continuing Work Information Extraction and Intelligent IR Learning models for IE: ontologies Latent semantic analysis Machine Learning Natural language learning Time series learning and understanding Relational and first-order models Automated Reasoning Probabilistic Case-based analogical Data Mining and Warehousing Grid Computing First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
References Knight, K. What’s New in Statistical Machine Translation. Invited Talk, International Joint Conference on Artificial Intelligence (IJCAI-2005), Edinburgh, UK, August, 2005. Knight, K. & Graehl, J. (2005). An Overview of Probabilistic Tree Transducers for Natural Language Processing. In Proceedings of CICLing 2005, p. 1 -24. Chiang, D. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL 2005), p. 263– 270. Koehn, P. , Och, F. J. , & Marcu, D. (2003). Statistical Phrase-Based Translation. In Proceedings of HLT-NAACL 2003, the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, May 27 - June 1, 2003, Edmonton, CANADA. First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Acknowledgements K-State Lab for Knowledge Discovery in Databases Vikas Bahirwani Tejaswi Pydimarri Andrew King Social Networks, Graph Theory, Graph Algorithms Kirsten Hildrum (IBM T. J. Watson Labs) Todd Easton (K-State, Industrial and Manufacturing Systems Engineering) Machine Learning Dan Roth, Cinda Heeren, Jiawei Han (University of Illinois at Urbana-Champaign) An. Hai Doan (University of Wisconsin – Madison) First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University
Questions and Discussion First International Conference on Weblogs And Social Media (ICWSM-2007) Boulder, Colorado Computing & Information Sciences Kansas State University


