Скачать презентацию CMU SCS Graph Mining Christos Faloutsos CMU i Скачать презентацию CMU SCS Graph Mining Christos Faloutsos CMU i

fcbf405efe7d458b2194c978998dbc78.ppt

  • Количество слайдов: 105

CMU SCS Graph Mining Christos Faloutsos CMU i. CAST, Jan. 09 C. Faloutsos # CMU SCS Graph Mining Christos Faloutsos CMU i. CAST, Jan. 09 C. Faloutsos #

CMU SCS Thank you! • Prof. Hsing-Kuo Kenneth Pao • Eric, Morgan, Ian, Teenet CMU SCS Thank you! • Prof. Hsing-Kuo Kenneth Pao • Eric, Morgan, Ian, Teenet i. CAST, Jan. 09 C. Faloutsos 2

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions i. CAST, Jan. 09 C. Faloutsos 3

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time i. CAST, Jan. 09 C. Faloutsos 4

CMU SCS Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) i. CMU SCS Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) i. CAST, Jan. 09 C. Faloutsos 5

CMU SCS Graphs – why should we care • Intrusion detection – who-contacts-whom i. CMU SCS Graphs – why should we care • Intrusion detection – who-contacts-whom i. CAST, Jan. 09 normal traffic destination abnormal traffic source C. Faloutsos source 6

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] Friendship Network [Moody ’ 01] i. CAST, Jan. 09 C. Faloutsos 7

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D 1 . . . DN TM • web: hyper-text graph • . . . and more: i. CAST, Jan. 09 C. Faloutsos T 1 8

CMU SCS Graphs - why should we care? • network of companies & board-of-directors CMU SCS Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . i. CAST, Jan. 09 C. Faloutsos 9

CMU SCS Problem #1 - network and graph mining • • i. CAST, Jan. CMU SCS Problem #1 - network and graph mining • • i. CAST, Jan. 09 How does the Internet look like? How does the web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? C. Faloutsos 10

CMU SCS Graph mining • Are real graphs random? i. CAST, Jan. 09 C. CMU SCS Graph mining • Are real graphs random? i. CAST, Jan. 09 C. Faloutsos 11

CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns i. CAST, Jan. 09 C. Faloutsos 12

CMU SCS Solution#1 • Power law in the degree distribution [SIGCOMM 99] internet domains CMU SCS Solution#1 • Power law in the degree distribution [SIGCOMM 99] internet domains log(degree) ibm. com att. com -0. 82 log(rank) i. CAST, Jan. 09 C. Faloutsos 13

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • A 2: power law in the eigenvalues of the adjacency matrix i. CAST, Jan. 09 C. Faloutsos 14

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • [Mihail, Papadimitriou ’ 02]: slope is ½ of rank exponent i. CAST, Jan. 09 C. Faloutsos 15

CMU SCS But: How about graphs from other domains? i. CAST, Jan. 09 C. CMU SCS But: How about graphs from other domains? i. CAST, Jan. 09 C. Faloutsos 16

CMU SCS The Peer-to-Peer Topology [Jovanovic+] • Count versus degree • Number of adjacent CMU SCS The Peer-to-Peer Topology [Jovanovic+] • Count versus degree • Number of adjacent peers follows a power-law i. CAST, Jan. 09 C. Faloutsos 17

CMU SCS More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman CMU SCS More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman log(#citations) i. CAST, Jan. 09 C. Faloutsos 18

CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic log(count) Zipf ``ebay’’ users sites log(in-degree) i. CAST, Jan. 09 C. Faloutsos 19

CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree i. CAST, Jan. 09 C. Faloutsos 20

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time i. CAST, Jan. 09 C. Faloutsos 21

CMU SCS Problem#2: Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg CMU SCS Problem#2: Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg (Cornell – sabb. @ CMU) i. CAST, Jan. 09 C. Faloutsos 22

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? i. CAST, Jan. 09 C. Faloutsos 23

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time i. CAST, Jan. 09 C. Faloutsos 24

CMU SCS Diameter – Ar. Xiv citation graph • Citations among physics papers • CMU SCS Diameter – Ar. Xiv citation graph • Citations among physics papers • 1992 – 2003 • One graph per year diameter time [years] i. CAST, Jan. 09 C. Faloutsos 25

CMU SCS Diameter – “Autonomous Systems” • Graph of Internet • One graph per CMU SCS Diameter – “Autonomous Systems” • Graph of Internet • One graph per day • 1997 – 2000 diameter number of nodes i. CAST, Jan. 09 C. Faloutsos 26

CMU SCS Diameter – “Affiliation Network” • Graph of collaborations in physics – authors CMU SCS Diameter – “Affiliation Network” • Graph of collaborations in physics – authors linked to papers • 10 years of data diameter time [years] i. CAST, Jan. 09 C. Faloutsos 27

CMU SCS Diameter – “Patents” • Patent citation network • 25 years of data CMU SCS Diameter – “Patents” • Patent citation network • 25 years of data diameter time [years] i. CAST, Jan. 09 C. Faloutsos 28

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) i. CAST, Jan. 09 C. Faloutsos 29

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ i. CAST, Jan. 09 C. Faloutsos 30

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations ? ? N(t) i. CAST, Jan. 09 C. Faloutsos 31

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 N(t) i. CAST, Jan. 09 C. Faloutsos 32

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 1: tree N(t) i. CAST, Jan. 09 C. Faloutsos 33

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations clique: 2 1. 69 N(t) i. CAST, Jan. 09 C. Faloutsos 34

CMU SCS Densification – Patent Citations • Citations among patents granted E(t) • 1999 CMU SCS Densification – Patent Citations • Citations among patents granted E(t) • 1999 1. 66 – 2. 9 million nodes – 16. 5 million edges • Each year is a datapoint i. CAST, Jan. 09 N(t) C. Faloutsos 35

CMU SCS Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. CMU SCS Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. 18 – 6, 000 nodes – 26, 000 edges • One graph per day N(t) i. CAST, Jan. 09 C. Faloutsos 36

CMU SCS Densification – Affiliation Network • Authors linked to their publications • 2002 CMU SCS Densification – Affiliation Network • Authors linked to their publications • 2002 E(t) 1. 15 – 60, 000 nodes • 20, 000 authors • 38, 000 papers – 133, 000 edges i. CAST, Jan. 09 N(t) C. Faloutsos 37

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time i. CAST, Jan. 09 C. Faloutsos 38

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time i. CAST, Jan. 09 C. Faloutsos 39

CMU SCS Problem#4: Master. Mind – ‘Ce. PS’ • w/ Hanghang Tong, KDD 2006 CMU SCS Problem#4: Master. Mind – ‘Ce. PS’ • w/ Hanghang Tong, KDD 2006 • htong cs. cmu. edu i. CAST, Jan. 09 C. Faloutsos 40

CMU SCS Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) CMU SCS Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) • App. – Social Networks – Law Inforcement, … • Idea: – Proximity -> random walk with restarts i. CAST, Jan. 09 C. Faloutsos 41

CMU SCS Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan CMU SCS Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan i. CAST, Jan. 09 C. Faloutsos 42

CMU SCS Case Study: AND query i. CAST, Jan. 09 C. Faloutsos 43 CMU SCS Case Study: AND query i. CAST, Jan. 09 C. Faloutsos 43

CMU SCS Case Study: AND query i. CAST, Jan. 09 C. Faloutsos 44 CMU SCS Case Study: AND query i. CAST, Jan. 09 C. Faloutsos 44

CMU SCS Conclusions • • Q 1: How to measure the importance? A 1: CMU SCS Conclusions • • Q 1: How to measure the importance? A 1: RWR+K_Soft. And Q 2: How to do it efficiently? A 2: Graph Partition (Fast Ce. PS) – ~90% quality – 150 x speedup (ICDM’ 06, b. p. award) i. CAST, Jan. 09 C. Faloutsos 45

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions i. CAST, Jan. 09 C. Faloutsos 46

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time i. CAST, Jan. 09 C. Faloutsos 47

CMU SCS Tensors for time evolving graphs • [Jimeng Sun+ KDD’ 06] • [ CMU SCS Tensors for time evolving graphs • [Jimeng Sun+ KDD’ 06] • [ “ , SDM’ 07] • [ CF, Kolda, Sun, SDM’ 07 tutorial] i. CAST, Jan. 09 C. Faloutsos 48

CMU SCS Social network analysis • Static: find community structures Keywords i. CAST, Jan. CMU SCS Social network analysis • Static: find community structures Keywords i. CAST, Jan. 09 Authors 1990 DB C. Faloutsos 49

CMU SCS Social network analysis • Static: find community structures i. CAST, Jan. 09 CMU SCS Social network analysis • Static: find community structures i. CAST, Jan. 09 Authors 1992 1991 1990 DB C. Faloutsos 50

CMU SCS Social network analysis • Static: find community structures • Dynamic: monitor community CMU SCS Social network analysis • Static: find community structures • Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps i. CAST, Jan. 09 C. Faloutsos 51

CMU SCS Application 1: Multiway latent semantic indexing (LSI) Philip Yu Uauthors 2004 DM CMU SCS Application 1: Multiway latent semantic indexing (LSI) Philip Yu Uauthors 2004 DM 1990 authors DB Ukeyword DB keyword Michael Stonebraker Pattern Query • Projection matrices specify the clusters • Core tensors give cluster activation level i. CAST, Jan. 09 C. Faloutsos 52

CMU SCS Bibliographic data (DBLP) • Papers from VLDB and KDD conferences • Construct CMU SCS Bibliographic data (DBLP) • Papers from VLDB and KDD conferences • Construct 2 nd order tensors with yearly windows with • Each tensor: 4584 3741 • 11 timestamps (years) i. CAST, Jan. 09 C. Faloutsos 53

CMU SCS Multiway LSI Authors Keywords Year michael carey, michael stonebraker, h. jagadish, hector CMU SCS Multiway LSI Authors Keywords Year michael carey, michael stonebraker, h. jagadish, hector garcia-molina queri, parallel, optimization, concurr, objectorient 1995 surajit chaudhuri, mitch cherniack, michael stonebraker, ugur etintemel DB jiawei han, jian pei, philip s. yu, jianyong wang, charu c. aggarwal distribut, systems, view, storage, servic, pr 2004 ocess, cache streams, pattern, support, cluster, index, gener, queri 2004 DM • Two groups are correctly identified: Databases and Data mining • People and concepts are drifting over time i. CAST, Jan. 09 C. Faloutsos 54

CMU SCS Network forensics • Directional network flows • A large ISP with 100 CMU SCS Network forensics • Directional network flows • A large ISP with 100 POPs, each POP 10 Gbps link capacity [Hotnets 2004] – 450 GB/hour with compression • Task: Identify abnormal traffic pattern and find out the cause i. CAST, Jan. 09 normal traffic destination abnormal traffic source C. Faloutsos source (with Prof. Hui Zhang and Dr. Yinglian Xie) 55

CMU SCS MDL mining on time-evolving graph (Enron emails) i. CAST, Jan. 09 Graph. CMU SCS MDL mining on time-evolving graph (Enron emails) i. CAST, Jan. 09 Graph. Scope. Faloutsos. Jimeng Sun, [w. C. 56 Spiros Papadimitriou and Philip Yu, KDD’ 07]

CMU SCS Conclusions Tensor-based methods (WTA/DTA/STA): • spot patterns and anomalies on time evolving CMU SCS Conclusions Tensor-based methods (WTA/DTA/STA): • spot patterns and anomalies on time evolving graphs, and • on streams (monitoring) i. CAST, Jan. 09 C. Faloutsos 57

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time i. CAST, Jan. 09 C. Faloutsos 58

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (e-bay fraud detection, blogs, weighted graphs) • Conclusions i. CAST, Jan. 09 C. Faloutsos 59

CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU i. CAST, CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU i. CAST, Jan. 09 C. Faloutsos 60

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? i. CAST, Jan. 09 C. Faloutsos 61

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? • or him/her? i. CAST, Jan. 09 C. Faloutsos 62

CMU SCS E-bay Fraud detection - Net. Probe i. CAST, Jan. 09 C. Faloutsos CMU SCS E-bay Fraud detection - Net. Probe i. CAST, Jan. 09 C. Faloutsos 63

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (e-bay fraud detection, blogs, weighted graphs) • Conclusions i. CAST, Jan. 09 C. Faloutsos 64

CMU SCS Blog analysis • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU) CMU SCS Blog analysis • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU) • Natalie Glance (now at Google) • Mat Hurst (now at MSR) [SDM’ 07] i. CAST, Jan. 09 C. Faloutsos 65

CMU SCS Cascades on the Blogosphere B 1 B 2 B 1 1 1 CMU SCS Cascades on the Blogosphere B 1 B 2 B 1 1 1 a B 2 1 B 3 B 4 Blogosphere blogs + posts 1 B 3 b c 2 B 4 Blog network links among blogs 3 d e Post network links among posts Q 1: popularity-decay of a post? Q 2: degree distributions? i. CAST, Jan. 09 C. Faloutsos 66

CMU SCS Q 1: popularity over time # in links 1 2 3 days CMU SCS Q 1: popularity over time # in links 1 2 3 days after post Post popularity drops-off – exponentially? i. CAST, Jan. 09 C. Faloutsos Days after post 67

CMU SCS Q 1: popularity over time # in links (log) 1 2 3 CMU SCS Q 1: popularity over time # in links (log) 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? i. CAST, Jan. 09 C. Faloutsos Days after post 68

CMU SCS Q 1: popularity over time # in links (log) -1. 6 1 CMU SCS Q 1: popularity over time # in links (log) -1. 6 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1. 6 (close to -1. 5: Barabasi’s stack model) i. CAST, Jan. 09 C. Faloutsos Days after post 69

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count B 1 ? ? 1 1 1 B 2 2 B B 3 3 4 blog in-degree i. CAST, Jan. 09 C. Faloutsos 70

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count B 1 1 B 2 2 B B 3 3 4 blog in-degree i. CAST, Jan. 09 C. Faloutsos 71

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count in-degree slope: -1. 7 out-degree: -3 ‘rich get richer’ i. CAST, Jan. 09 blog in-degree C. Faloutsos 72

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (e-bay fraud detection, blogs, weighted graphs) • [work in progress: i. CAST data analysis] • Conclusions i. CAST, Jan. 09 C. Faloutsos 73

CMU SCS Joint work with Leman Akoglu www. andrew. cmu. edu/~lakoglu Mary Mc. Glohon CMU SCS Joint work with Leman Akoglu www. andrew. cmu. edu/~lakoglu Mary Mc. Glohon www. cs. cmu. edu/~mmcgloho Thanks to Eric, Morgan, Ian, Teenet, for providing a copy of the dataset i. CAST, Jan. 09 C. Faloutsos 74

CMU SCS Summary of findings • Who-contacts-whom graph follows old (and new, surprising) patterns CMU SCS Summary of findings • Who-contacts-whom graph follows old (and new, surprising) patterns • Web servers stand out, though • we are packaging all these tools, for opensource release: – ADAGE www. cs. cmu. edu/~mmcgloho/pubs/ADAGE. tar. gz – Odd. Ball (under development) i. CAST, Jan. 09 C. Faloutsos 75

CMU SCS Shrinking diameter No surprise: shrinking diameter, as `expected’ i. CAST, Jan. 09 CMU SCS Shrinking diameter No surprise: shrinking diameter, as `expected’ i. CAST, Jan. 09 C. Faloutsos #

CMU SCS Size of GCC, NLCC GCC size grows over time, of course. What CMU SCS Size of GCC, NLCC GCC size grows over time, of course. What is your guess about the size of rest, say 2 nd CC? • shrinks? • grows? i. CAST, Jan. 09 • stays the same? C. Faloutsos #

CMU SCS Size of GCC, NLCC A: OSCILLATES! No ‘surprise’ either: Typical behavior of CMU SCS Size of GCC, NLCC A: OSCILLATES! No ‘surprise’ either: Typical behavior of graphs [Mc. Glohon+, KDD’ 08] i. CAST, Jan. 09 C. Faloutsos #

CMU SCS Packets over time # Packets over time: bursty behavior, with daily periodicity CMU SCS Packets over time # Packets over time: bursty behavior, with daily periodicity i. CAST, Jan. 09 C. Faloutsos 79

CMU SCS Densification law 1. Densification law: obeyed (with slope ~1: tree like (!)) CMU SCS Densification law 1. Densification law: obeyed (with slope ~1: tree like (!)) 2. ‘hours’ plot has a sudden gap @ 20 -80 nodes i. CAST, Jan. 09 C. Faloutsos #

CMU SCS ‘Weight’ power law 1. 3 Weight: super-linear on number of edges – CMU SCS ‘Weight’ power law 1. 3 Weight: super-linear on number of edges – also ‘expected’: • the more contacts you have, • the even more packets you send! Slope 1. 3, 1. 15; plateau @ 100 edges i. CAST, Jan. 09 C. Faloutsos 81

CMU SCS OVERALL CONCLUSIONS • Graphs pose a wealth of fascinating problems • self-similarity CMU SCS OVERALL CONCLUSIONS • Graphs pose a wealth of fascinating problems • self-similarity and power laws work, when textbook methods fail! • New patterns (shrinking diameter!) • SVD / tensors / RWR: valuable tools • Intrusion detection: closely related – ADAGE, Odd. Ball i. CAST, Jan. 09 C. Faloutsos 85

CMU SCS References • L. Akoglu, M. Mc. Glohon, C. Faloutsos. RTM : Laws CMU SCS References • L. Akoglu, M. Mc. Glohon, C. Faloutsos. RTM : Laws and a Recursive Generator for Weighted Time-Evolving Graphs. IEEE ICDM, Pisa, Italy, Dec. 2008 i. CAST, Jan. 09 C. Faloutsos 86

CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award). • Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, 2005. i. CAST, Jan. 09 C. Faloutsos 87

CMU SCS References • Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs CMU SCS References • Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA i. CAST, Jan. 09 C. Faloutsos 88

CMU SCS References • M. Mc. Glohon, L. Akoglu, C. Faloutsos. Weighted Graphs and CMU SCS References • M. Mc. Glohon, L. Akoglu, C. Faloutsos. Weighted Graphs and Disconnected Components: Patterns and a Generator. ACM SIGKDD, Las Vegas, NV, USA, Aug. 2008. i. CAST, Jan. 09 C. Faloutsos 89

CMU SCS References • Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos CMU SCS References • Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos Net. Probe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8 -12, 2007. • Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA i. CAST, Jan. 09 C. Faloutsos 90

CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. [pdf] • Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, Graph. Scope: Parameterfree Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007 i. CAST, Jan. 09 C. Faloutsos 91

CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong. • Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA • Hanghang Tong, Brian Gallagher, Christos Faloutsos, and Tina Eliassi-Rad Fast Best-Effort Pattern Matching in Large Attributed Graphs KDD 2007, San Jose, CA i. CAST, Jan. 09 C. Faloutsos 92

CMU SCS Contact info: www. cs. cmu. edu /~christos (w/ papers, datasets, code, etc) CMU SCS Contact info: www. cs. cmu. edu /~christos (w/ papers, datasets, code, etc) i. CAST, Jan. 09 C. Faloutsos 93

CMU SCS Extra: Graph Generators i. CAST, Jan. 09 C. Faloutsos 94 CMU SCS Extra: Graph Generators i. CAST, Jan. 09 C. Faloutsos 94

CMU SCS Problem#3: Generation • Given a growing graph with count of nodes N CMU SCS Problem#3: Generation • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns i. CAST, Jan. 09 C. Faloutsos 95

CMU SCS Problem Definition • Given a growing graph with count of nodes N CMU SCS Problem Definition • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters i. CAST, Jan. 09 C. Faloutsos 96

CMU SCS Problem Definition • Given a growing graph with count of nodes N CMU SCS Problem Definition • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns • Idea: Self-similarity – Leads to power laws – Communities within communities –… i. CAST, Jan. 09 C. Faloutsos 97

CMU SCS Kronecker Product – a Graph Intermediate stage i. CAST, Jan. 09 Adjacency CMU SCS Kronecker Product – a Graph Intermediate stage i. CAST, Jan. 09 Adjacency matrix C. Faloutsos 98 Adjacency matrix

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … i. CAST, Jan. 09 G 4 adjacency matrix C. Faloutsos 99

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … i. CAST, Jan. 09 G 4 adjacency matrix C. Faloutsos 100

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … i. CAST, Jan. 09 G 4 adjacency matrix C. Faloutsos 101

CMU SCS Properties: • We can PROVE that – Degree distribution is multinomial ~ CMU SCS Properties: • We can PROVE that – Degree distribution is multinomial ~ power law – Diameter: constant – Eigenvalue distribution: multinomial – First eigenvector: multinomial • See [Leskovec+, PKDD’ 05] for proofs i. CAST, Jan. 09 C. Faloutsos 102

CMU SCS Problem Definition • Given a growing graph with nodes N 1, N CMU SCS Problem Definition • Given a growing graph with nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters • First and only generator for which we can prove all these properties i. CAST, Jan. 09 C. Faloutsos 103

CMU SCS skip Stochastic Kronecker Graphs • Create N 1 probability matrix P 1 CMU SCS skip Stochastic Kronecker Graphs • Create N 1 probability matrix P 1 • Compute the kth Kronecker power Pk • For each entry puv of Pk include an edge (u, v) with probability puv 0. 4 0. 2 0. 1 0. 3 Kronecker multiplication P 1 0. 16 0. 08 0. 04 0. 12 0. 06 0. 04 0. 02 0. 12 0. 06 0. 01 0. 03 0. 09 Pk i. CAST, Jan. 09 C. Faloutsos Instance Matrix G 2 flip biased coins 104

CMU SCS Experiments • How well can we match real graphs? – Arxiv: physics CMU SCS Experiments • How well can we match real graphs? – Arxiv: physics citations: • 30, 000 papers, 350, 000 citations • 10 years of data – U. S. Patent citation network • 4 million patents, 16 million citations • 37 years of data – Autonomous systems – graph of internet • Single snapshot from January 2002 • 6, 400 nodes, 26, 000 edges • We show both static and temporal patterns i. CAST, Jan. 09 C. Faloutsos 105

CMU SCS (Q: how to fit the parm’s? ) A: • Stochastic version of CMU SCS (Q: how to fit the parm’s? ) A: • Stochastic version of Kronecker graphs + • Max likelihood + • Metropolis sampling • [Leskovec+, ICML’ 07] i. CAST, Jan. 09 C. Faloutsos 106

CMU SCS Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen CMU SCS Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen values i. CAST, Jan. 09 Network value C. Faloutsos 107

CMU SCS Conclusions • Kronecker graphs have: – All the static properties Heavy tailed CMU SCS Conclusions • Kronecker graphs have: – All the static properties Heavy tailed degree distributions Small diameter Multinomial eigenvalues and eigenvectors – All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters – We can formally prove these results i. CAST, Jan. 09 C. Faloutsos 108