Скачать презентацию VISA A VIsual Sentiment Analysis System Dongxu Duan Скачать презентацию VISA A VIsual Sentiment Analysis System Dongxu Duan

7108f0b22cfdcb0b8d635bb78c981bb6.ppt

  • Количество слайдов: 45

VISA: A VIsual Sentiment Analysis System Dongxu Duan 1 Weihong Qian 1 Shimei Pan VISA: A VIsual Sentiment Analysis System Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research 2 IBM T. J. Watson — China Research Center Chinese Academy of Sciences 3 Institute of Software Sept. 2012 4 Tsinghua University

What is Sentiment Analysis • Sentiment analysis or opinion mining refers to the application What is Sentiment Analysis • Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials. ---- From Wikipedia • A survey of sentiment analysis works by Pang and Lee in 2008: “Opinion mining and sentiment analysis”, cited 1189 times in Google Scholar, including 326 references A probably earliest study: 2

Motivation The truth: sentiment analysis is becoming even more important – Corporate * Brand Motivation The truth: sentiment analysis is becoming even more important – Corporate * Brand analysis, sales campaign design, etc. * Crisis relationship management – Government • As we all know. . Observations: – Sentiment analysis technologies are going deeper and versatile: * Aspect-oriented, domain-specific lexicon expansion, MT technology – The average users are still leveraging rather simple sentiment results • It’s hard for them (even domain expert) to understand sophisticated SA results – There is big gap and huge potential for sentiment visualization (visual opinion mining) 3

Agenda • Related Works • Research Problem and Challenges • Sentiment-Tuple based Data Model Agenda • Related Works • Research Problem and Challenges • Sentiment-Tuple based Data Model • VISA System Framework • Visualization Optimizations • Cases • User Studies • Summary 4

Basic Sentiment Representation • Raw text/table or simple visualization Basic Sentiment Representation • Raw text/table or simple visualization

Brand Association Map Brand Association Map

COBRA (COrporate Brand Reputation Analysis) Behal et al. (HCI 2009) COBRA (COrporate Brand Reputation Analysis) Behal et al. (HCI 2009)

Opinion Observer Liu et al. (KDD 2005); Liu et al. (IW 3 C 2 Opinion Observer Liu et al. (KDD 2005); Liu et al. (IW 3 C 2 2005)

Visual Sentiment Analysis of RSS News Feeds Wanner et al. (VISSW 2009) Visual Sentiment Analysis of RSS News Feeds Wanner et al. (VISSW 2009)

Pulse: Mining Customer Opinions from Free Text Gamon et al. (IDA 2005) Pulse: Mining Customer Opinions from Free Text Gamon et al. (IDA 2005)

Visualizing Sentiments in Financial Texts Ahmad and Almas (IV 2005) Visualizing Sentiments in Financial Texts Ahmad and Almas (IV 2005)

Visual Analysis of Conflicting Opinions Chen et al. (VAST 2006) Visual Analysis of Conflicting Opinions Chen et al. (VAST 2006)

Who Votes For What? A Visual Query Language for Opinion Data Draper and Riesenfeld Who Votes For What? A Visual Query Language for Opinion Data Draper and Riesenfeld (Vis 2008)

Visual Opinion Analysis of Customer Feedback Data Summary Report of printers Scatterplot of customer Visual Opinion Analysis of Customer Feedback Data Summary Report of printers Scatterplot of customer reviews on printers Oelke et al. (VAST 2009) Circular Correlation Map

Opinion. Seer: Interactive Visualization of Hotel Customer Feedback Wu et al. (Info. Vis 2010) Opinion. Seer: Interactive Visualization of Hotel Customer Feedback Wu et al. (Info. Vis 2010)

Taking the Pulse of the Web: Assessing Sentiment on Topics in Online Media Brew Taking the Pulse of the Web: Assessing Sentiment on Topics in Online Media Brew et al. (Web. Sci 2010)

Understanding Text Corpora with Multiple Facets Shi et al. (VAST 2010) Understanding Text Corpora with Multiple Facets Shi et al. (VAST 2010)

Research Problem • Can we design a sentiment visualization system that: – Show the Research Problem • Can we design a sentiment visualization system that: – Show the sentiment evolves over time (trend) – Visualize both the sentiment analysis results and the structured facet data, e. g. profile of the reviewer (facet) – Rather than only showing which document or feature tends to be positive or negative, also demonstrate how the positives/ negatives are described in documents (context) • Most existing sentiment visualization fails to meet all the requirements simultaneously – Our VISA design is based on the TIARA prototype, which already brings together most features (trend, context, facet switching) 18

Retrospect on TIARA Visualization (Emergency Room Record) 19 Retrospect on TIARA Visualization (Emergency Room Record) 19

Challenges for TIARA Sentiment Visualization • Failure of the document trend visualization – Binary/ternary/scored Challenges for TIARA Sentiment Visualization • Failure of the document trend visualization – Binary/ternary/scored classification of document-level sentiments will drop valuable pieces BUT: It has BED BUGS and they BITE me!!! 20

Challenges for TIARA Sentiment Visualization • Keyword Summarization – Content visualized are keywords summarized Challenges for TIARA Sentiment Visualization • Keyword Summarization – Content visualized are keywords summarized from all the text, not echoing the sentiment-centric design • Structured Facet – Sentiment-aware facet associations and distributions – Spatial (location) information • Comparison – Categorical, temporal comparison, and sentiment comparison as well • Compatibility with sentiment analysis engines – Consumability of all kinds of sentiment analysis results 21

Sentiment Tuple • {Aspect, feature, opinion, polarity} – Aspect: a sub-topic shared by some Sentiment Tuple • {Aspect, feature, opinion, polarity} – Aspect: a sub-topic shared by some document In a hotel review, the room, the view, or the service – Feature: specific object the users are commenting Entity, person, location, or abstract concepts – An opinion is a particular word or phrase describing a feature – Polarity of the opinion word/phrase in the context { “view”, + } aspect: feature: opinion: polarity …… Sentiment Analysis Model aspect: feature: opinion: polarity …… Aggregate

Keyword Summarization (TIARA) kth document in the collection {…, P(Ti | Dk), …} A Keyword Summarization (TIARA) kth document in the collection {…, P(Ti | Dk), …} A set of topics A set of topic probabilities {T 1, …Ti, … TN } A set of keywords Rank the topics to present most valuable ones first {W 1, …, Wj, …, WM} A set of word probabilities Select keyword sub-set for each time segment for content summary {…, P(Wj | Ti), …} {…} t-1, {…, Wj, …}t, {…} t+1,

VISA Sentiment Keyword Summarization kth document in the collection {…, P(Ti | Dk), …} VISA Sentiment Keyword Summarization kth document in the collection {…, P(Ti | Dk), …} Aspects/Hotels A set of topic probabilities {C 1, …Ci, … CN } A set of sentiment keywords (opinions/features) Let user select to compare aspects of a hotel or an aspect of several hotels {W 1, …, Wj, …, WM} A set of word probabilities Select keyword sub-set for each time {…, P(Wj | Ti), …} segment for sentiment summary {…} t-1, {…, Wj, …}t, {…} t+1,

VISA Mashup Visualization Search Filters Sentiment Tuple Trend Sentiment. Centric Document Ranking Sentiment Snippets VISA Mashup Visualization Search Filters Sentiment Tuple Trend Sentiment. Centric Document Ranking Sentiment Snippets Facet Correlations

VISA Sentiment Visualization Framework • Offline: – Document pre-processing – Sentiment analysis – Meta VISA Sentiment Visualization Framework • Offline: – Document pre-processing – Sentiment analysis – Meta data parsing – Indexing • Online: – Data Retrieval – Visualization – Interactions 26

Offline Analysis Data Analysis Framework Raw Data Reader Filter Open. NLP Extractor Segment Extractor Offline Analysis Data Analysis Framework Raw Data Reader Filter Open. NLP Extractor Segment Extractor Sentence Extractor l l Text Extractor l Dictionary Sentiment Entity Class No/Not Statistic. Manager Entity Policy Sentiment Data Meta Data aspect: feature: opinion: polarity Index. Writer

Offline Analysis Raw Data Reader 3 rd Party Sentiment Analysis Framework Sentiment Data Meta Offline Analysis Raw Data Reader 3 rd Party Sentiment Analysis Framework Sentiment Data Meta Data aspect: feature: opinion: polarity Index. Writer

Data Server VISA Hermes Http. Servlet Query Parser Data Adapter Data Retrieval Lucene Index Data Server VISA Hermes Http. Servlet Query Parser Data Adapter Data Retrieval Lucene Index

Sentiment Trend Optimizations • Sentiment tuple based negative/positive/(neutral) trends Time Sensitive Feature/Opinion words Y Sentiment Trend Optimizations • Sentiment tuple based negative/positive/(neutral) trends Time Sensitive Feature/Opinion words Y axis: sentiment value Positive Negative X axis: time

Sentiment-Centric Interactions Sentiment-Centric Interactions

Case Study ---- Summarizing Hotel Reviews • Initial View 32 Case Study ---- Summarizing Hotel Reviews • Initial View 32

Case Study ---- Summarizing Hotel Reviews • Switch to ”Family” type only (traveling in Case Study ---- Summarizing Hotel Reviews • Switch to ”Family” type only (traveling in this type) 33

Case Study ---- Summarizing Hotel Reviews • Click on the “Free” sentiment word (want Case Study ---- Summarizing Hotel Reviews • Click on the “Free” sentiment word (want to enjoy the free time or free breakfast? ) • It’s 30 min distance from the harbor! 34

Case Study ---- Summarizing Hotel Reviews • For two selected hotels • Drill down Case Study ---- Summarizing Hotel Reviews • For two selected hotels • Drill down to the “cleanliness” and “room” aspects • Switch to the negative sentiments 35

Case Study ---- Summarizing Hotel Reviews • Comparing the recent reviews 36 Case Study ---- Summarizing Hotel Reviews • Comparing the recent reviews 36

Case Study ---- NFL on Twitter • Crawling tweets from Twitter on the topic Case Study ---- NFL on Twitter • Crawling tweets from Twitter on the topic of National Football League (NFL), from 03/2011 to 08/2011. (when the famous lock out happened) • 665360 tweets from 307973 users, with an average length of 16. 8 words. • Tweet collection pre-processing: – Classify into 5 content topics: “season play”, “player draft”, “lockout bad”, “lockout end” and “football return”. – Categorize according to the subject of the sentiments – 32 NFL teams, by manually creating relevant subject keyword list for each team (full/nick name, city, stadium, head, owner and super stars) 37

Case Study ---- NFL on Twitter • Overview of sentiments on content topics – Case Study ---- NFL on Twitter • Overview of sentiments on content topics – Reach peak in July when the new CBA signed 38

Case Study ---- NFL on Twitter • Subject-comparing view on 4 NFL Teams – Case Study ---- NFL on Twitter • Subject-comparing view on 4 NFL Teams – “Green Bay Packers”, “Pittsburgh Steelers”, “New York Jets”, “New England Patriots” – A very large RED “CBA” for the Steelers: the only team to vote “NO” to CBA – “Brett Favre” for the Packers: the former NFL all-star quarterback in Packers, who has claimed to return for several times. The fans are tired of the similar news at all. 39

 • Subject User Study ---- Setup Trip. Advisor – VISA System with all • Subject User Study ---- Setup Trip. Advisor – VISA System with all functionalities – Trip. Advisor. com – A plain text editor with search function • Data Text Editor – HK hotel cases with 3 hotels’ reviews – Both structured (ratings) and unstructured (review comments) data inputs • User – 12 users (7 male, 5 female), age 26~35 – Each is given a gift as incentive • Task – TI: look up specific sentiment-related information of a hotel (e. g. traveler’s ratings). – T 2: summarize opinions on a general aspect of a hotel (e. g. the view of a hotel) VISA • Procedure – Within-subject design: user perform all tasks with all the systems – Record user demographics, time of completion and satisfactions and open-ended questions 40

User Study ---- Objective Results • Three metrics: Elapsed time (in minutes), task completion User Study ---- Objective Results • Three metrics: Elapsed time (in minutes), task completion rate and task correctness. Significant advantages of VISA over the compared systems (t-test significance p< 0. 004~ 0. 034) 41

User Study ---- Subjective Results • Three metrics: Usefulness, userability and satisfaction. 42 User Study ---- Subjective Results • Three metrics: Usefulness, userability and satisfaction. 42

User Study ---- Open Surveys • Why VISA is thought better than the baseline User Study ---- Open Surveys • Why VISA is thought better than the baseline systems: – “mash-up visualizations” and “rich interactions” – “Mash-up visualizations provide more information and it’s quite intuitive”, “rich interactions make it easy to search what I want to know” – Improvements to VISA: “it now needs some learning efforts to use VISA”, “It could introduce better UI design and richer interactions”. 43

Summary • We have presented the VISA system for generic sentiment visualization purpose – Summary • We have presented the VISA system for generic sentiment visualization purpose – The backend core is the new sentiment-tuple definition, as well as the faceted data model – In visualization, we introduce several critical optimizations over TIARA in sentiment visualization scenarios: sentiment-tuple based trending, sentiment keywords, comparison, sentiment in document context, interactions – Evaluated with two real-life case studies – Conduct formal user study to compare with two baseline systems and demonstrate the clear advantage 44

Thai Korean Traditional Chinese Russian Gracias Thank You English Italian Obrigado Brazilian Portuguese Arabic Thai Korean Traditional Chinese Russian Gracias Thank You English Italian Obrigado Brazilian Portuguese Arabic Grazie Spanish Danke German Simplified Chinese Merci French Japanese Tamil 45 Hindi