
2064bf757a5b7d5af7383a721a984dc6.ppt
- Количество слайдов: 96
Latent Variable Models of Social Networks and Text Andrew Mc. Callum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha Mohanty, Andres Corrada, Chris Pal, Wei Li, David Mimno and Gideon Mann.
Social Network in an Email Dataset 2
Outline Social Network Analysis with Topic Models • Role Discovery (Author-Recipient-Topic Model, ART) • Group Discovery (Group-Topic Model, GT) • Enhanced Topic Models – Time Localized Topics (Topics-over-Time Model, TOT) – Time Localized Groups (Groups-over-Time Model, GOT) – Markov Dependencies in Topics (Topical N-Grams Model, TNG) • Bibliometric Impact & Transfer Measures using Topics Multi-Conditional Mixtures [AAAI 2006] 3
Clustering words into topics with Latent Dirichlet Allocation [Blei, Ng, Jordan 2003] hip rs be m Me del d ixe mo M Generative Process: Example: For each document: Sample a distribution over topics, Multinomial over topics 70% Iraq war 30% US election For each word in doc Sample a topic, z Iraq war Topic Sample a word from the topic, w “bombing” Word Per-topic multinomial over words 4
Example topics induced from a large collection of text JOB SCIENCE BALL FIELD STORY MIND DISEASE WATER WORK STUDY GAME MAGNETIC STORIES WORLD BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET TELL DREAM DISEASES SEA CAREER SCIENTIFIC FOOTBALL WIRE CHARACTER DREAMS GERMS SWIM KNOWLEDGE BASEBALL EXPERIENCE NEEDLE THOUGHT CHARACTERS FEVER SWIMMING WORK PLAYERS EMPLOYMENT CURRENT AUTHOR IMAGINATION CAUSE POOL OPPORTUNITIES RESEARCH PLAY COIL READ MOMENT CAUSED LIKE WORKING CHEMISTRY FIELD POLES TOLD THOUGHTS SPREAD SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING OWN VIRUSES SHARK SKILLS MANY BASKETBALL COMPASS TALES REAL INFECTION TANK CAREERS MATHEMATICS COACH LINES PLOT LIFE VIRUS SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING IMAGINE MICROORGANISMS SHARKS FIND FIELD PLAYING ELECTRIC SHORT SENSE PERSON DIVING POSITION PHYSICS HIT DIRECTION INFECTIOUS DOLPHINS CONSCIOUSNESS FICTION FIELD LABORATORY TENNIS FORCE ACTION STRANGE COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS TRUE FEELING CAUSING LONG REQUIRE WORLD GAMES BE EVENTS WHOLE SMALLPOX SEAL OPPORTUNITY SPORTS MAGNETISM SCIENTIST TELLS BEING BODY DIVE EARN STUDYING BAT POLE TALE MIGHT INFECTIONS DOLPHIN ABLE SCIENCES TERRY INDUCED NOVEL HOPE CERTAIN UNDERWATER [Tennenbaum et al] 5
Example topics induced from a large collection of text JOB SCIENCE BALL FIELD STORY MIND DISEASE WATER WORK STUDY GAME MAGNETIC STORIES WORLD BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET TELL DREAM DISEASES SEA CAREER SCIENTIFIC FOOTBALL WIRE CHARACTER DREAMS GERMS SWIM KNOWLEDGE BASEBALL EXPERIENCE NEEDLE THOUGHT CHARACTERS FEVER SWIMMING WORK PLAYERS EMPLOYMENT CURRENT AUTHOR IMAGINATION CAUSE POOL OPPORTUNITIES RESEARCH PLAY COIL READ MOMENT CAUSED LIKE WORKING CHEMISTRY FIELD POLES TOLD THOUGHTS SPREAD SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING OWN VIRUSES SHARK SKILLS MANY BASKETBALL COMPASS TALES REAL INFECTION TANK CAREERS MATHEMATICS COACH LINES PLOT LIFE VIRUS SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING IMAGINE MICROORGANISMS SHARKS FIND FIELD PLAYING ELECTRIC SHORT SENSE PERSON DIVING POSITION PHYSICS HIT DIRECTION INFECTIOUS DOLPHINS CONSCIOUSNESS FICTION FIELD LABORATORY TENNIS FORCE ACTION STRANGE COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS TRUE FEELING CAUSING LONG REQUIRE WORLD GAMES BE EVENTS WHOLE SMALLPOX SEAL OPPORTUNITY SPORTS MAGNETISM SCIENTIST TELLS BEING BODY DIVE EARN STUDYING BAT POLE TALE MIGHT INFECTIONS DOLPHIN ABLE SCIENCES TERRY INDUCED NOVEL HOPE CERTAIN UNDERWATER [Tennenbaum et al] 6
From LDA to Author-Recipient-Topic (ART) [Mc. Callum et al 2005] 7
Inference and Estimation Gibbs Sampling: - Easy to implement - Reasonably fast r 8
Enron Email Corpus • 250 k email messages • 23 k people Date: Wed, 11 Apr 2001 06: 56: 00 -0700 (PDT) From: debra. perlingiere@enron. com To: steve. hooser@enron. com Subject: Enron/Trans. Alta. Contract dated Jan 1, 2001 Please see below. Katalin Kiss of Trans. Alta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions. DP Debra Perlingiere Enron North America Corp. Legal Department 1400 Smith Street, EB 3885 Houston, Texas 77002 dperlin@enron. com 9
Topics, and prominent senders / receivers Topic names, discovered by ART by hand 10
Topics, and prominent senders / receivers discovered by ART Beck = “Chief Operations Officer” Dasovich = “Government Relations Executive” Shapiro = “Vice President of Regulatory Affairs” Steffes = “Vice President of Government Affairs” 11
Comparing Role Discovery Traditional SNA ART Author-Topic distribution over authored topics connection strength (A, B) = distribution over recipients 12
Comparing Role Discovery Tracy Geaconne Dan Mc. Carty Traditional SNA ART Similar roles Different roles Author-Topic Different roles Geaconne = “Secretary” Mc. Carty = “Vice President” 13
Comparing Role Discovery Lynn Blair Kimberly Watson Traditional SNA Different roles ART Very similar Author-Topic Very different Blair = “Gas pipeline logistics” Watson = “Pipeline facilities planning” 14
ART: Roles but not Groups Traditional SNA Block structured ART Not Author-Topic Not Enron Trans. Western Division 23
Outline Social Network Analysis with Topic Models • a Role Discovery (Author-Recipient-Topic Model, ART) • Group Discovery (Group-Topic Model, GT) • Enhanced Topic Models – Time Localized Topics (Topics-over-Time Model, TOT) – Time Localized Groups (Groups-over-Time Model, GOT) – Markov Dependencies in Topics (Topical N-Grams Model, TNG) • Bibliometric Impact & Transfer Measures using Topics Multi-Conditional Mixtures [AAAI 2006] 24
Groups and Topics • Input: – Observed relations between people – Attributes on those relations (text, or categorical) • Output: – Attributes clustered into “topics” – Groups of people---varying depending on topic 25
Discovering Groups from Observed Set of Relations Student Roster Academic Admiration Adams Bennett Carter Davis Edwards Frederking Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C) Admiration relations among six high school students. 26
Adjacency Matrix Representing Relations Student Roster Academic Admiration Adams Bennett Carter Davis Edwards Frederking Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C) A B C D E F G 1 G 2 G 3 G 3 ABCDEF A B C D E F G 1 G 2 G 3 A C B D E F G 1 G 1 G 2 G 2 G 3 G 3 A C B D E F G 1 G 2 G 3 27
Group Model: Partitioning Entities into Groups Stochastic Blockstructures for Relations [Nowicki, Snijders 2001] Beta Multinomial Dirichlet S: number of entities G: number of groups Binomial Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004] 28
Two Relations with Different Attributes Student Roster Academic Admiration Social Admiration Adams Bennett Carter Davis Edwards Frederking Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C) Soci(A, B) Soci(A, D) Soci(A, F) Soci(B, A) Soci(B, C) Soci(B, E) Soci(C, B) Soci(C, D) Soci(C, F) Soci(D, A) Soci(D, C) Soci(D, E) Soci(E, B) Soci(E, D) Soci(E, F) Soci(F, A) Soci(F, C) Soci(F, E) A C B D E F G 1 G 1 G 2 G 2 G 3 G 3 A C B D E F G 1 G 2 G 3 A C E B D F G 1 G 1 G 1 G 2 G 2 G 2 A C E B D F G 1 G 1 G 2 G 2 29
The Group-Topic Model: Discovering Groups and Topics Simultaneously [Wang, Mohanty, Mc. Callum 2006] Uniform Dirichlet Multinomial Beta Multinomial Dirichlet Binomial 30
Inference and Estimation Gibbs Sampling: - Many r. v. s can be integrated out - Easy to implement - Reasonably fast We assume the relationship is symmetric. 31
Dataset #1: U. S. Senate • 16 years of voting records in the US Senate (1989 – 2005) • a Senator may respond Yea or Nay to a resolution • 3423 resolutions with text attributes (index terms) • 191 Senators in total across 16 years S. 543 Title: An Act to reform Federal deposit insurance, protect the deposit insurance funds, recapitalize the Bank Insurance Fund, improve supervision and regulation of insured depository institutions, and for other purposes. Sponsor: Sen Riegle, Donald W. , Jr. [MI] (introduced 3/5/1991) Cosponsors (2) Latest Major Action: 12/19/1991 Became Public Law No: 102 -242. Index terms: Banks and banking Accounting Administrative fees Cost control Credit Deposit insurance Depressed areas and other 110 terms Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay …… 32
Topics Discovered (U. S. Senate) Education Mixture of Unigrams Energy Military Misc. Economic education school aid children drug students elementary prevention energy power water nuclear gas petrol research pollution government military foreign tax congress aid law policy federal labor insurance aid tax business employee care Foreign Economic Social Security + Medicare labor insurance tax congress income minimum wage business social security insurance medical care medicare disability assistance Education + Domestic Group-Topic Model education foreign school trade federal chemicals aid tariff government congress tax drugs energy communicable research diseases 33
Groups Discovered (US Senate) Groups from topic Education + Domestic 34
Senators Who Change Coalition the most Dependent on Topic e. g. Senator Shelby (D-AL) votes with the Republicans on Economic with the Democrats on Education + Domestic with a small group of maverick Republicans on Social Security + Medicaid 35
Dataset #2: The UN General Assembly • Voting records of the UN General Assembly (1990 - 2003) • A country may choose to vote Yes, No or Abstain • 931 resolutions with text attributes (titles) • 192 countries in total • Also experiments later with resolutions from 1960 -2003 Vote on Permanent Sovereignty of Palestinian People, 87 th plenary meeting The draft resolution on permanent sovereignty of the Palestinian people in the occupied Palestinian territory, including Jerusalem, and of the Arab population in the occupied Syrian Golan over their natural resources (document A/54/591) was adopted by a recorded vote of 145 in favour to 3 against with 6 abstentions: In favour: Afghanistan, Argentina, Belgium, Brazil, Canada, China, France, Germany, India, Japan, Mexico, Netherlands, New Zealand, Pakistan, Panama, Russian Federation, South Africa, Spain, Turkey, and other 126 countries. Against: Israel, Marshall Islands, United States. Abstain: Australia, Cameroon, Georgia, Kazakhstan, Uzbekistan, Zambia. 36
Topics Discovered (UN) Mixture of Unigrams Group-Topic Model Everything Nuclear Human Rights Security in Middle East nuclear weapons use implementation countries rights human palestine situation israel occupied israel syria security calls Nuclear Non-proliferation Nuclear Arms Race Human Rights nuclear states united weapons nations nuclear arms prevention race space rights human palestine occupied israel 37
Groups Discovered (UN) The countries list for each group are ordered by their 2005 GDP (PPP) and only 5 countries are shown in groups that have more than 5 members. 38
Groups and Topics, Trends over Time (UN) 40
Outline Social Network Analysis with Topic Models • a Role Discovery (Author-Recipient-Topic Model, ART) a Group Discovery (Group-Topic Model, GT) • • Enhanced Topic Models – Time Localized Topics (Topics-over-Time Model, TOT) – Time Localized Groups (Groups-over-Time Model, GOT) – Markov Dependencies in Topics (Topical N-Grams Model, TNG) • Bibliometric Impact & Transfer Measures using Topics Multi-Conditional Mixtures [AAAI 2006] 41
Want to Model Trends over Time • Is prevalence of topic growing or waning? • Pattern appears only briefly – Capture its statistics in focused way – Don’t confuse it with patterns elsewhere in time • How do roles, groups, influence shift over time? 42
Topics over Time (TOT) [Wang, Mc. Callum, KDD 2006] multinomial over topics Dirichlet prior z word w T Multinomial over words Dirichlet topic index T Uniform prior z topic index word T D multinomial over topics Dirichlet prior time stamp Nd time stamp t t Beta over time Uniform prior distribution on time stamps Beta over time w T Multinomial over words Nd D 43
State of the Union Address 208 Addresses delivered between January 8, 1790 and January 29, 2002. To increase the number of documents, we split the addresses into paragraphs and treated them as ‘documents’. One-line paragraphs were excluded. Stopping was applied. • 17156 ‘documents’ • 21534 words • 669, 425 tokens Our scheme of taxation, by means of which this needless surplus is taken from the people and put into the public Treasury, consists of a tariff or duty levied upon importations from abroad and internal-revenue taxes levied upon the consumption of tobacco and spirituous and malt liquors. It must be conceded that none of the things subjected to internal-revenue taxation are, strictly speaking, necessaries. There appears to be no just complaint of this taxation by the consumers of these articles, and there seems to be nothing so well able to bear the burden without hardship to any portion of the people. 1910 44
Comparing TOT against LDA 47
TOT on 17 years of NIPS proceedings 48
topic mass (in vertical height) Topic Distributions Conditioned on Time time 49
TOT on 17 years of NIPS proceedings TOT LDA 50
TOT versus LDA on my email 51
Discovering Group Structure Trends over Time Group Model without Time Group Model with Time G per group beta over time multinomial distribution over groups group id timestamp observed relation per group-pair binomial over relation absent / present 53
Outline Social Network Analysis with Topic Models • a Role Discovery (Author-Recipient-Topic Model, ART) a Group Discovery (Group-Topic Model, GT) • • Enhanced Topic Models – a. Time Localized Topics (Topics-over-Time Model, TOT) – a. Time Localized Groups (Groups-over-Time Model, GOT) – Markov Dependencies in Topics (Topical N-Grams Model, TNG) • Bibliometric Impact & Transfer Measures using Topics Multi-Conditional Mixtures [AAAI 2006] 54
Topics Modeling Phrases • Topics based only on unigrams often difficult to interpret • Topic discovery itself is confused because important meaning / distinctions carried by phrases. 55
Topic Interpretability LDA Topical N-grams algorithm genetic problems efficient genetic algorithms genetic algorithm evolutionary computation evolutionary algorithms fitness function 56
Topical N-gram Model [Wang, Mc. Callum 2005] topic uni- / bi-gram status z 1 z 2 y 1 y 2 w 1 words z 3 z 4 y 3 . . . y 4 w 2 w 3 . . . w 4 . . . D 1 W T 1 uni- 2 2 bi- W T 57
Features of Topical N-Grams model • Easily trained by Gibbs sampling – Can run efficiently on millions of words • Topic-specific phrase discovery – “white house” has special meaning as a phrase in the politics topic, –. . . but not in the real estate topic. 58
Topic Comparison LDA learning optimal reinforcement state problems policy dynamic action programming actions function markov methods decision rl continuous spaces step policies planning Topical N-grams (2) reinforcement learning optimal policy dynamic programming optimal control function approximator prioritized sweeping finite-state controller learning system reinforcement learning rl function approximators markov decision problems markov decision processes local search state-action pair markov decision process belief states stochastic policy action selection upright position reinforcement learning methods Topical N-grams (1) policy action states actions function reward control agent q-learning optimal goal learning space step environment system problem steps sutton policies 59
Topic Comparison LDA Topical N-grams (2) motion visual field position figure direction fields eye location retina receptive velocity vision moving system flow edge center light local receptive field spatial frequency temporal frequency visual motion energy tuning curves horizontal cells motion detection preferred direction visual processing area mt visual cortex light intensity directional selectivity high contrast motion detectors spatial phase moving stimuli decision strategy visual stimuli Topical N-grams (1) motion response direction cells stimulus figure contrast velocity model responses stimuli moving cell intensity population image center tuning complex directions 60
Topic Comparison LDA word system recognition hmm speech training performance phoneme words context systems frame trained speaker sequence speakers mlp frames segmentation models Topical N-grams (2) Topical N-grams (1) speech recognition training data neural network error rates neural net hidden markov model feature vectors continuous speech training procedure continuous speech recognition gamma filter hidden control speech production neural nets input representation output layers training algorithm test set speech frames speaker dependent speech word training system recognition hmm speaker performance phoneme acoustic words context systems frame trained sequence phonetic speakers mlp hybrid 61
Outline Social Network Analysis with Topic Models • a Role Discovery (Author-Recipient-Topic Model, ART) a Group Discovery (Group-Topic Model, GT) • • Enhanced Topic Models – a. Time Localized Topics (Topics-over-Time Model, TOT) – a. Time Localized Groups (Groups-over-Time Model, GOT) – a. Markov Dependencies in Topics (Topical N-Grams Model, TNG) • Bibliometric Impact & Transfer Measures using Topics Multi-Conditional Mixtures [AAAI 2006] 62
Social Networks in Research Literature • Better understand structure of our own research area. • Structure helps us learn a new field. • Aid collaboration • Map how ideas travel through social networks of researchers. • Aids for hiring and finding reviewers! 63
Traditional Bibliometrics • Analyses a small amount of data (e. g. 19 articles from a single issue of a journal) • Uses “journal” as a proxy for “research topic” (but there is no journal for information extraction) • Uses impact measures almost exclusively based on simple citation counts. How can we use topic models to create new, interesting impact measures? Can create a social network of scientific sub-fields? 64
Our Data • Over 1. 6 million research papers, gathered as part of Rexa. info portal. • Cross linked references / citations. 65
Previous Systems 66
67
Previous Systems Cites Research Paper 68
More Entities and Relations Expertise Cites Research Paper Grant Venue Person University Groups 69
70
71
72
73
74
75
76
77
78
79
80
Finding Topics with TNG Traditional unigram LDA run on 1. 6 million titles / abstracts (200 topics) . . . select ~300 k papers on ML, NLP, robotics, vision. . . Find 200 TNG topics among those papers. 81
Topical Bibliometric Impact Measures [Mann, Mimno, Mc. Callum, 2006] • Topical Citation Counts • Topical Impact Factors • Topical Longevity • Topical Precedence • Topical Diversity • Topical Transfer 82
Topical Diversity Can also be measured on particular papers. . . 83
Topical Diversity Entropy of the topic distribution among papers that cite this paper (this topic). Low Diversity High Diversity 84
Topical Transfer from Digital Libraries to other topics Other topic Cit’s Paper Title Web Pages 31 Trawling the Web for Emerging Cyber. Communities, Kumar, Raghavan, . . . 1999. Computer Vision 14 On being ‘Undigital’ with digital cameras: extending the dynamic. . . Video 12 Lessons learned from the creation and deployment of a terabyte digital video Graphs 12 Trawling the Web for Emerging Cyber. Communities Web Pages 11 Web. Base: a repository of Web pages 85
Topical Transfer Citation counts from one topic to another. Map “producers and consumers” 86
87
88
93
Topical Transfer Through Time • Can we predict which research topics will be “hot” at ICML next year? • . . . based on – the hot topics in “neighboring” venues last year – learned “neighborhood” distances for venue pairs 94
How do Ideas Progress Through Social Networks? Hypothetical Example: “ADA Boost” SIGIR (Info. Retrieval) COLT ICML ICCV (Vision) ACL (NLP) 95
How do Ideas Progress Through Social Networks? Hypothetical Example: “ADA Boost” SIGIR (Info. Retrieval) COLT ICML ICCV (Vision) ACL (NLP) 96
How do Ideas Progress Through Social Networks? Hypothetical Example: “ADA Boost” SIGIR (Info. Retrieval) COLT ICML ICCV (Vision) ACL (NLP) 97
Topic Prediction Models Static Model Transfer Model Linear Regression and Ridge Regression Used for Coefficient Training. 99
Preliminary Results Mean Squared Prediction Error (Smaller Is better) Transfer Model # Venues used for prediction Transfer Model with Ridge Regression is a good Predictor 100
Estimated Neighborhood Distances Transfer into NIPS, 1988 -1989 ML. 079 Neural Computation. 023 UAI -0. 0035 PAMI. 0998 Theoretical CS. 0955 AI. 032 AAAI. 082 101
Outline Social Network Analysis with Topic Models • a Role Discovery (Author-Recipient-Topic Model, ART) a Group Discovery (Group-Topic Model, GT) • • a Enhanced Topic Models – a. Time Localized Topics (Topics-over-Time Model, TOT) – a. Time Localized Groups (Groups-over-Time Model, GOT) – a. Markov Dependencies in Topics (Topical N-Grams Model, TNG) • a Bibliometric Impact & Transfer Measures using Topics Multi-Conditional Mixtures [AAAI 2006] 102
Outline Social Network Analysis with Topic Models • a Role Discovery (Author-Recipient-Topic Model, ART) a Group Discovery (Group-Topic Model, GT) • • Enhanced Topic Models – a. Time Localized Topics (Topics-over-Time Model, TOT) – a. Time Localized Groups (Groups-over-Time Model, GOT) – a. Markov Dependencies in Topics (Topical N-Grams Model, TNG) a. Bibliometric Impact & Transfer Measures using Topics • Multi-Conditional Mixtures [AAAI 2006] 103
Want a “topic model” with the advantages of CRFs • Use arbitrary, overlapping features of the input. • Undirected graphical model, so we don’t have to think about avoiding cycles. • Integrate naturally with our other CRF components. • Train “discriminatively” ? d! mean upervise his oes t are uns td Wha models c Topi • Natural semi-supervised training 104
“Multi-Conditional Mixtures” Latent Variable Models fit by Multi-way Conditional Probability [Mc. Callum, Wang, Pal, 2005], [Mc. Callum, Pal, Druck, Wang, 2006] • For clustering structured data, ala Latent Dirichlet Allocation & its successors • But an undirected model, like the Harmonium [Welling, Rosen-Zvi, Hinton, 2005] • But trained by a “multi-conditional” objective: O = P(A|B, C) P(B|A, C) P(C|A, B) e. g. A, B, C are different modalities 105
Objective Functions for Parameter Estimation Traditional, joint training (e. g. naive Bayes, most topic models) Traditional mixture model (e. g. LDA) Traditional, conditional training (e. g. Max. Ent classifiers, CRFs) New, multi-conditional Conditional mixtures (e. g. Jebara’s CEM, Mc. Callum CRF string edit distance, . . . ) Multi-conditional (mostly conditional, generative regularization) Multi-conditional (for semi-sup) Multi-conditional (for transfer learning, 2 tasks, shared hiddens) 106
“Multi-Conditional Learning” (Regularization) [Mc. Callum, Pal, Wang, 2006] 107
Predictive Random Fields mixture of Gaussians on synthetic data Data, classify by color Generatively trained [Mc. Callum, Wang, Pal, 2005] Multi-Conditionally-trained [Jebara 1998] 109
Multi-Conditional Mixtures vs. Harmoniun on document retrieval task [Mc. Callum, Wang, Pal, 2005] Multi-Conditional, multi-way conditionally trained Conditionally-trained, to predict class labels Harmonium, joint, with class labels and words Harmonium, joint with words, no labels 110
Multi-Conditional “Topics” Strong positive and negative indicators 111
Outline Social Network Analysis with Topic Models • Role Discovery (Author-Recipient-Topic Model, ART) • Group Discovery (Group-Topic Model, GT) • Enhanced Topic Models – Correlations among Topics (Pachinko Allocation, PAM) – Time Localized Topics (Topics-over-Time Model, TOT) – Markov Dependencies in Topics (Topical N-Grams Model, TNG) • Bibliometric Impact Measures enabled by Topics Multi-Conditional Mixtures 112
Summary 113
Summary 116
2064bf757a5b7d5af7383a721a984dc6.ppt