Скачать презентацию CMU SCS Graph Mining Laws Generators and Tools Скачать презентацию CMU SCS Graph Mining Laws Generators and Tools

d83ea0d8f631f3c3277d3d9787804a0f.ppt

  • Количество слайдов: 105

CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU GATech 08 C. CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU GATech 08 C. Faloutsos #

CMU SCS Thank you! • Amy Bruckman • Francine Lyken GATech 08 C. Faloutsos CMU SCS Thank you! • Amy Bruckman • Francine Lyken GATech 08 C. Faloutsos 2

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions GATech 08 C. Faloutsos 3

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time GATech 08 C. Faloutsos 4

CMU SCS Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) GATech CMU SCS Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R. L. ) GATech 08 C. Faloutsos 5

CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web CMU SCS Graphs - why should we care? Internet Map [lumeta. com] Food Web [Martinez ’ 91] Protein Interactions [genomebiology. com] Friendship Network [Moody ’ 01] GATech 08 C. Faloutsos 6

CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D 1 . . . DN TM • web: hyper-text graph • . . . and more: GATech 08 C. Faloutsos T 1 7

CMU SCS Graphs - why should we care? • network of companies & board-of-directors CMU SCS Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • . . GATech 08 C. Faloutsos 8

CMU SCS Problem #1 - network and graph mining • • GATech 08 How CMU SCS Problem #1 - network and graph mining • • GATech 08 How does the Internet look like? How does the web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? C. Faloutsos 9

CMU SCS Graph mining • Are real graphs random? GATech 08 C. Faloutsos 10 CMU SCS Graph mining • Are real graphs random? GATech 08 C. Faloutsos 10

CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns GATech 08 C. Faloutsos 11

CMU SCS Solution#1 • Power law in the degree distribution [SIGCOMM 99] internet domains CMU SCS Solution#1 • Power law in the degree distribution [SIGCOMM 99] internet domains log(degree) ibm. com att. com -0. 82 log(rank) GATech 08 C. Faloutsos 12

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • A 2: power law in the eigenvalues of the adjacency matrix GATech 08 C. Faloutsos 13

CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0. 48 May 2001 Rank of decreasing eigenvalue • [Papadimitriou, Mihail, ’ 02]: slope is ½ of rank exponent GATech 08 C. Faloutsos 14

CMU SCS But: How about graphs from other domains? GATech 08 C. Faloutsos 15 CMU SCS But: How about graphs from other domains? GATech 08 C. Faloutsos 15

CMU SCS The Peer-to-Peer Topology [Jovanovic+] • Count versus degree • Number of adjacent CMU SCS The Peer-to-Peer Topology [Jovanovic+] • Count versus degree • Number of adjacent peers follows a power-law GATech 08 C. Faloutsos 16

CMU SCS More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman CMU SCS More power laws: citation counts: (citeseer. nj. nec. com 6/2001) log(count) Ullman log(#citations) GATech 08 C. Faloutsos 17

CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic log(count) Zipf ``ebay’’ users sites log(in-degree) GATech 08 C. Faloutsos 18

CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people CMU SCS epinions. com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000 -people user (out) degree GATech 08 C. Faloutsos 19

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time GATech 08 C. Faloutsos 20

CMU SCS Problem#2: Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg CMU SCS Problem#2: Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg (Cornell – sabb. @ CMU) GATech 08 C. Faloutsos 21

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? GATech 08 C. Faloutsos 22

CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) • What is happening in real data? • Diameter shrinks over time GATech 08 C. Faloutsos 23

CMU SCS Diameter – Ar. Xiv citation graph • Citations among physics papers • CMU SCS Diameter – Ar. Xiv citation graph • Citations among physics papers • 1992 – 2003 • One graph per year diameter time [years] GATech 08 C. Faloutsos 24

CMU SCS Diameter – “Autonomous Systems” • Graph of Internet • One graph per CMU SCS Diameter – “Autonomous Systems” • Graph of Internet • One graph per day • 1997 – 2000 diameter number of nodes GATech 08 C. Faloutsos 25

CMU SCS Diameter – “Affiliation Network” • Graph of collaborations in physics – authors CMU SCS Diameter – “Affiliation Network” • Graph of collaborations in physics – authors linked to papers • 10 years of data diameter time [years] GATech 08 C. Faloutsos 26

CMU SCS Diameter – “Patents” • Patent citation network • 25 years of data CMU SCS Diameter – “Patents” • Patent citation network • 25 years of data diameter time [years] GATech 08 C. Faloutsos 27

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) GATech 08 C. Faloutsos 28

CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ GATech 08 C. Faloutsos 29

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations ? ? N(t) GATech 08 C. Faloutsos 30

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 N(t) GATech 08 C. Faloutsos 31

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations 1. 69 1: tree N(t) GATech 08 C. Faloutsos 32

CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29, 555 papers, 352, 807 citations clique: 2 1. 69 N(t) GATech 08 C. Faloutsos 33

CMU SCS Densification – Patent Citations • Citations among patents granted E(t) • 1999 CMU SCS Densification – Patent Citations • Citations among patents granted E(t) • 1999 1. 66 – 2. 9 million nodes – 16. 5 million edges • Each year is a datapoint GATech 08 N(t) C. Faloutsos 34

CMU SCS Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. CMU SCS Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1. 18 – 6, 000 nodes – 26, 000 edges • One graph per day N(t) GATech 08 C. Faloutsos 35

CMU SCS Densification – Affiliation Network • Authors linked to their publications • 2002 CMU SCS Densification – Affiliation Network • Authors linked to their publications • 2002 E(t) 1. 15 – 60, 000 nodes • 20, 000 authors • 38, 000 papers – 133, 000 edges GATech 08 N(t) C. Faloutsos 36

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time GATech 08 C. Faloutsos 37

CMU SCS Problem#3: Generation • Given a growing graph with count of nodes N CMU SCS Problem#3: Generation • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns GATech 08 C. Faloutsos 38

CMU SCS Problem Definition • Given a growing graph with count of nodes N CMU SCS Problem Definition • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters GATech 08 C. Faloutsos 39

CMU SCS Problem Definition • Given a growing graph with count of nodes N CMU SCS Problem Definition • Given a growing graph with count of nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns • Idea: Self-similarity – Leads to power laws – Communities within communities –… GATech 08 C. Faloutsos 40

CMU SCS Kronecker Product – a Graph Intermediate stage GATech 08 Adjacency matrix C. CMU SCS Kronecker Product – a Graph Intermediate stage GATech 08 Adjacency matrix C. Faloutsos 41 Adjacency matrix

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … GATech 08 G 4 adjacency matrix C. Faloutsos 42

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … GATech 08 G 4 adjacency matrix C. Faloutsos 43

CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we CMU SCS Kronecker Product – a Graph • Continuing multiplying with G 1 we obtain G 4 and so on … GATech 08 G 4 adjacency matrix C. Faloutsos 44

CMU SCS Properties: • We can PROVE that – Degree distribution is multinomial ~ CMU SCS Properties: • We can PROVE that – Degree distribution is multinomial ~ power law – Diameter: constant – Eigenvalue distribution: multinomial – First eigenvector: multinomial • See [Leskovec+, PKDD’ 05] for proofs GATech 08 C. Faloutsos 45

CMU SCS Problem Definition • Given a growing graph with nodes N 1, N CMU SCS Problem Definition • Given a growing graph with nodes N 1, N 2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters • First and only generator for which we can prove all these properties GATech 08 C. Faloutsos 46

CMU SCS skip Stochastic Kronecker Graphs • Create N 1 probability matrix P 1 CMU SCS skip Stochastic Kronecker Graphs • Create N 1 probability matrix P 1 • Compute the kth Kronecker power Pk • For each entry puv of Pk include an edge (u, v) with probability puv 0. 4 0. 2 0. 1 0. 3 P 1 Kronecker multiplication 0. 16 0. 08 0. 04 0. 12 0. 06 0. 04 0. 02 0. 12 0. 06 0. 01 0. 03 0. 09 Pk GATech 08 C. Faloutsos Instance Matrix G 2 flip biased coins 47

CMU SCS Experiments • How well can we match real graphs? – Arxiv: physics CMU SCS Experiments • How well can we match real graphs? – Arxiv: physics citations: • 30, 000 papers, 350, 000 citations • 10 years of data – U. S. Patent citation network • 4 million patents, 16 million citations • 37 years of data – Autonomous systems – graph of internet • Single snapshot from January 2002 • 6, 400 nodes, 26, 000 edges • We show both static and temporal patterns GATech 08 C. Faloutsos 48

CMU SCS Arxiv – Degree Distribution Deterministic Kronecker Stochastic Kronecker count Real graph degree CMU SCS Arxiv – Degree Distribution Deterministic Kronecker Stochastic Kronecker count Real graph degree GATech 08 degree C. Faloutsos degree 49

CMU SCS Arxiv – Scree Plot Deterministic Kronecker Stochastic Kronecker Eigenvalue Real graph Rank CMU SCS Arxiv – Scree Plot Deterministic Kronecker Stochastic Kronecker Eigenvalue Real graph Rank GATech 08 Rank C. Faloutsos Rank 50

CMU SCS Arxiv – Densification Deterministic Kronecker Stochastic Kronecker Edges Real graph Nodes(t) GATech CMU SCS Arxiv – Densification Deterministic Kronecker Stochastic Kronecker Edges Real graph Nodes(t) GATech 08 Nodes(t) C. Faloutsos Nodes(t) 51

CMU SCS Arxiv – Effective Diameter Deterministic Kronecker Stochastic Kronecker Diameter Real graph Nodes(t) CMU SCS Arxiv – Effective Diameter Deterministic Kronecker Stochastic Kronecker Diameter Real graph Nodes(t) GATech 08 Nodes(t) C. Faloutsos Nodes(t) 52

CMU SCS (Q: how to fit the parm’s? ) A: • Stochastic version of CMU SCS (Q: how to fit the parm’s? ) A: • Stochastic version of Kronecker graphs + • Max likelihood + • Metropolis sampling • [Leskovec+, ICML’ 07] GATech 08 C. Faloutsos 53

CMU SCS Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen CMU SCS Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen values GATech 08 Network value C. Faloutsos 54

CMU SCS Conclusions • Kronecker graphs have: – All the static properties Heavy tailed CMU SCS Conclusions • Kronecker graphs have: – All the static properties Heavy tailed degree distributions Small diameter Multinomial eigenvalues and eigenvectors – All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters – We can formally prove these results GATech 08 C. Faloutsos 55

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time GATech 08 C. Faloutsos 56

CMU SCS Problem#4: Master. Mind – ‘Ce. PS’ • w/ Hanghang Tong, KDD 2006 CMU SCS Problem#4: Master. Mind – ‘Ce. PS’ • w/ Hanghang Tong, KDD 2006 • htong cs. cmu. edu GATech 08 C. Faloutsos 57

CMU SCS Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) CMU SCS Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) • App. – Social Networks – Law Inforcement, … • Idea: – Proximity -> random walk with restarts GATech 08 C. Faloutsos 58

CMU SCS Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan CMU SCS Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan GATech 08 C. Faloutsos 59

CMU SCS Case Study: AND query GATech 08 C. Faloutsos 60 CMU SCS Case Study: AND query GATech 08 C. Faloutsos 60

CMU SCS Case Study: AND query GATech 08 C. Faloutsos 61 CMU SCS Case Study: AND query GATech 08 C. Faloutsos 61

CMU SCS databases ML/Statistics 2_Soft. And query GATech 08 C. Faloutsos 62 CMU SCS databases ML/Statistics 2_Soft. And query GATech 08 C. Faloutsos 62

CMU SCS Conclusions • • Q 1: How to measure the importance? A 1: CMU SCS Conclusions • • Q 1: How to measure the importance? A 1: RWR+K_Soft. And Q 2: How to do it efficiently? A 2: Graph Partition (Fast Ce. PS) – ~90% quality – 150 x speedup (ICDM’ 06, b. p. award) GATech 08 C. Faloutsos 63

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions GATech 08 C. Faloutsos 64

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time GATech 08 C. Faloutsos 65

CMU SCS Tensors for time evolving graphs • [Jimeng Sun+ KDD’ 06] • [ CMU SCS Tensors for time evolving graphs • [Jimeng Sun+ KDD’ 06] • [ “ , SDM’ 07] • [ CF, Kolda, Sun, SDM’ 07 tutorial] GATech 08 C. Faloutsos 66

CMU SCS Social network analysis • Static: find community structures Keywords GATech 08 Authors CMU SCS Social network analysis • Static: find community structures Keywords GATech 08 Authors 1990 DB C. Faloutsos 67

CMU SCS Social network analysis • Static: find community structures GATech 08 Authors 1992 CMU SCS Social network analysis • Static: find community structures GATech 08 Authors 1992 1991 1990 DB C. Faloutsos 68

CMU SCS Social network analysis • Static: find community structures • Dynamic: monitor community CMU SCS Social network analysis • Static: find community structures • Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps GATech 08 C. Faloutsos 69

CMU SCS Application 1: Multiway latent semantic indexing (LSI) Philip Yu Uauthors 2004 DM CMU SCS Application 1: Multiway latent semantic indexing (LSI) Philip Yu Uauthors 2004 DM 1990 authors DB Ukeyword DB keyword Michael Stonebraker Pattern Query • Projection matrices specify the clusters • Core tensors give cluster activation level GATech 08 C. Faloutsos 70

CMU SCS Bibliographic data (DBLP) • Papers from VLDB and KDD conferences • Construct CMU SCS Bibliographic data (DBLP) • Papers from VLDB and KDD conferences • Construct 2 nd order tensors with yearly windows with • Each tensor: 4584 3741 • 11 timestamps (years) GATech 08 C. Faloutsos 71

CMU SCS Multiway LSI Authors Keywords Year michael carey, michael stonebraker, h. jagadish, hector CMU SCS Multiway LSI Authors Keywords Year michael carey, michael stonebraker, h. jagadish, hector garcia-molina queri, parallel, optimization, concurr, objectorient 1995 surajit chaudhuri, mitch cherniack, michael stonebraker, ugur etintemel DB jiawei han, jian pei, philip s. yu, jianyong wang, charu c. aggarwal distribut, systems, view, storage, servic, pr 2004 ocess, cache streams, pattern, support, cluster, index, gener, queri 2004 DM • Two groups are correctly identified: Databases and Data mining • People and concepts are drifting over time GATech 08 C. Faloutsos 72

CMU SCS Network forensics • Directional network flows • A large ISP with 100 CMU SCS Network forensics • Directional network flows • A large ISP with 100 POPs, each POP 10 Gbps link capacity [Hotnets 2004] – 450 GB/hour with compression • Task: Identify abnormal traffic pattern and find out the cause GATech 08 normal traffic destination abnormal traffic source C. Faloutsos source (with Prof. Hui Zhang and Dr. Yinglian Xie) 74

CMU SCS Conclusions Tensor-based methods (WTA/DTA/STA): • spot patterns and anomalies on time evolving CMU SCS Conclusions Tensor-based methods (WTA/DTA/STA): • spot patterns and anomalies on time evolving graphs, and • on streams (monitoring) GATech 08 C. Faloutsos 75

CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time GATech 08 C. Faloutsos 76

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions GATech 08 C. Faloutsos 77

CMU SCS Virus propagation • How do viruses/rumors propagate? • Blog influence? • Will CMU SCS Virus propagation • How do viruses/rumors propagate? • Blog influence? • Will a flu-like virus linger, or will it become extinct soon? GATech 08 C. Faloutsos 78

CMU SCS The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= b/d CMU SCS The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= b/d Healthy Prob. d N 2 Prob. b N 1 N Infected GATech 08 Pro b. β N 3 C. Faloutsos 79

CMU SCS Epidemic threshold t of a graph: the value of t, such that CMU SCS Epidemic threshold t of a graph: the value of t, such that if strength s = b / d < t an epidemic can not happen Thus, • given a graph • compute its epidemic threshold GATech 08 C. Faloutsos 80

CMU SCS Epidemic threshold t What should t depend on? • avg. degree? and/or CMU SCS Epidemic threshold t What should t depend on? • avg. degree? and/or highest degree? • and/or variance of degree? • and/or third moment of degree? • and/or diameter? GATech 08 C. Faloutsos 81

CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ = CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1, A GATech 08 C. Faloutsos 82

CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ <τ = 1/ λ 1, A attack prob. largest eigenvalue of adj. matrix A Proof: [Wang+03] GATech 08 C. Faloutsos 83

CMU SCS Experiments (Oregon) b/d > τ (above threshold) b/d = τ (at the CMU SCS Experiments (Oregon) b/d > τ (above threshold) b/d = τ (at the threshold) b/d < τ (below threshold) GATech 08 C. Faloutsos 84

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions GATech 08 C. Faloutsos 85

CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU GATech 08 CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU GATech 08 C. Faloutsos 86

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? GATech 08 C. Faloutsos 87

CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? • or him/her? GATech 08 C. Faloutsos 88

CMU SCS E-bay Fraud detection - Net. Probe GATech 08 C. Faloutsos 89 CMU SCS E-bay Fraud detection - Net. Probe GATech 08 C. Faloutsos 89

CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators CMU SCS Outline • • Problem definition / Motivation Static & dynamic laws; generators Tools: Center. Piece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions GATech 08 C. Faloutsos 90

CMU SCS Blog analysis • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU) CMU SCS Blog analysis • with Mary Mc. Glohon (CMU) • Jure Leskovec (CMU) • Natalie Glance (now at Google) • Mat Hurst (now at MSR) [SDM’ 07] GATech 08 C. Faloutsos 91

CMU SCS Cascades on the Blogosphere B 1 B 2 B 1 1 1 CMU SCS Cascades on the Blogosphere B 1 B 2 B 1 1 1 a B 2 1 B 3 B 4 Blogosphere blogs + posts 1 B 3 b c 2 B 4 Blog network links among blogs 3 d e Post network links among posts Q 1: popularity-decay of a post? Q 2: degree distributions? GATech 08 C. Faloutsos 92

CMU SCS Q 1: popularity over time # in links 1 2 3 days CMU SCS Q 1: popularity over time # in links 1 2 3 days after post Post popularity drops-off – exponentially? GATech 08 C. Faloutsos Days after post 93

CMU SCS Q 1: popularity over time # in links (log) 1 2 3 CMU SCS Q 1: popularity over time # in links (log) 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? GATech 08 C. Faloutsos Days after post 94

CMU SCS Q 1: popularity over time # in links (log) -1. 6 1 CMU SCS Q 1: popularity over time # in links (log) -1. 6 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1. 6 (close to -1. 5: Barabasi’s stack model) GATech 08 C. Faloutsos Days after post 95

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count B 1 ? ? 1 1 1 B 2 2 B B 3 3 4 blog in-degree GATech 08 C. Faloutsos 96

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count B 1 1 B 2 2 B B 3 3 4 blog in-degree GATech 08 C. Faloutsos 97

CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of CMU SCS Q 2: degree distribution 44, 356 nodes, 122, 153 edges. Half of blogs belong to largest connected component. count in-degree slope: -1. 7 out-degree: -3 ‘rich get richer’ GATech 08 blog in-degree C. Faloutsos 98

CMU SCS OVERALL CONCLUSIONS • Graphs pose a wealth of fascinating problems • self-similarity CMU SCS OVERALL CONCLUSIONS • Graphs pose a wealth of fascinating problems • self-similarity and power laws work, when textbook methods fail! • New patterns (shrinking diameter!) • New generator: Kronecker • SVD / tensors / RWR: valuable tools GATech 08 C. Faloutsos 99

CMU SCS Next steps: • edges with – weights; and/or – categorical attributes and/or CMU SCS Next steps: • edges with – weights; and/or – categorical attributes and/or – time-stamps • nodes with attributes • scalability (hadoop – Peta. Scale [Bader]) GATech 08 C. Faloutsos 100

CMU SCS ‘Philosophical’ observations Graph mining brings together: • ML/AI / IR; Stat, Num. CMU SCS ‘Philosophical’ observations Graph mining brings together: • ML/AI / IR; Stat, Num. analysis; Systems (DB (Gb/Tb), Networks ) AND • sociology, epidemiology • physics (phase transitions, Ising spins, percolation) • biology (PPI, regulatory gene networks) • business – (blogs; facebook/linked. In/2 nd. Life. . . ) – recommendation systems (Net. Flix) GATech 08 C. Faloutsos 101

CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong. • Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA GATech 08 C. Faloutsos 102

CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award). • Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, 2005. GATech 08 C. Faloutsos 103

CMU SCS References • Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs CMU SCS References • Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA • Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos Net. Probe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8 -12, 2007. • Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA GATech 08 C. Faloutsos 104

CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. [pdf] GATech 08 C. Faloutsos 105

CMU SCS Contact info: www. cs. cmu. edu /~christos (w/ papers, datasets, code, etc) CMU SCS Contact info: www. cs. cmu. edu /~christos (w/ papers, datasets, code, etc) GATech 08 C. Faloutsos 106