INARC Task 1 I 1 1 Fundamentals of

Скачать презентацию INARC Task 1 I 1 1 Fundamentals of

e8afbce51de41425ac24b6b53e5bd05c.ppt

Количество слайдов: 44

INARC Task 1 (I 1. 1): Fundamentals of Context-aware Real-time Data Fusion

Fundamental of Multi-modal Data Fusion on Multimedia Information Networks Principal Investigator: Thomas Huang Post Doctor: Xi Zhou Ph. D Student: Guo-Jun Qi Electrical and Computer Engineering University of Illinois at Urbana-Champaign

Project Team • Principal Investigator: Thomas Huang • Collaborators: – IBM: Charu Aggarval (Qo. I and sensor networks) – IBM: Zhen Wen (social networks) – UIUC: Tarek Abdelzaher (communication networks) – CUNY: Heng Ji (natural language processing) • Post doctorate researcher: Xi Zhou • Ph. D student: Guo-Jun Qi, Mert Dickman and Zhaowen Wang • Undergraduate student: Shiyu Chang

Motivation • Structured information networks – Can handle heterogeneous structure with various input types – can effectively model large structured ontological network at semantic level – Structure is a way to represent context • Utilization – efficient and effective inference engine – Information and knowledge extraction from ontological networks

Contributions to I 1. 1 • Connections to constrained conditional model (CCM) – Discover constraint links • between heterogeneous objects • Between concept nodes • Connections to latency analysis – Reveal cross-media redundancy/relationship – Trade-off between low-latency and high quality

Multimedia Information Network • Is graph with both data nodes and concept nodes – Edges linking concepts: ontology – Edges linking data nodes: similarity, association and co-occurrence – Edges linking concept and data: attachment of concept to data

Multimedia Information Network (MINet) • Data nodes: heterogeneous networks with crossmedia contents – Videos/Images/Speech – Surrounding text/user tags – GPS meta-data • Concept nodes: Ontological Networks with correlated categories – Non-flat concept structure – Example links between concepts • A is a subclass of B • C is a part of D • X attacks Y

Network Structure

Potential Army Impact • Construct large scale MINets combining – Cross-media heterogeneous data networks • Examples – Battlefield videos/images – Satellite images – Acoustics Sensor signal – Ontological concept networks • Military-related concepts and their links • Make better military decisions – More timely and more accurately – More robust with missing information

Technical Contributions • Cross-Domain Knowledge Propagation – Propagating Knowledge in surrounding text to visual data – Published in WWW’ 11, collaboration with Dr. Charu Aggarwal, IBM • Cross-Category Knowledge Sharing – Exploring the concept correlations to enhance the inference accuracy – To appear in CVPR’ 11, collaboration with Dr. Charu Aggarwal, IBM • Modeling Context-Aware Image Similarity – Using Hierarchical Gaussianization (HG), ICCV’ 09 – Applications into Disaster Assessment (Collaboration with Prof. Tarek) – KDD’ 11, submitted

Cross-Domain Knowledge Propagation: Two Steps • How to bridge the domain gap between text and image? – Our approach: We construct a translator function between text and images that establishes “virtual” links between them. • How can we annotate image labels from text labels? – Our approach: The labels of text can be propagated into that of images via the learned translator.

Challenges • The model can – Work in constrained environment • Missing links between text and images • Learn translation function to link text and images – Be resistant to noisy cross-media links, improved Qo. I • Misleading related text surrounding images • Use a compact intermediate representation to remove nonessential and noisy links – Low-rank principle with fewest topics for across-domain translation

Cross-Domain Label Propagation:

Cross-Domain Label Propagation Source labels Label Propagation:

Cross-Domain Label Propagation Cross-domain translator Label Propagation:

Cross-Domain Label Propagation Prediction function Label Propagation:

Learning Optimal Translator Learning formulation via optimizing translator function: • The first term: maximize across-domain association from a set of cooccurrence pairs of source-target instances. • The second term: minimize the training loss • The third term: regularizer for preference of concise translator to tedious one • Improve Qo. I : remove nonessential and noisy observation from translation process

Constructing Cross-Domain Translator Bridge the cross-domain gap? Source instances (text) Target instances (images)

Constructing Cross-Domain Translator Inner product in latent space as translator W(s) Source instances (text) W(t) Target instances (images) Common Latent Space

Constructing Cross-Domain Translator • A low dimensional latent space is preferred – Impose Normal l 2 regularizer to improve the prediction accuracy Trace norm – Equivalent to a low-rank prior on latent space – Indicate Principle of concise cross-domain translation: “fewer latent topics (dimensionality) are preferred!”

Experiments: Cross-Domain Dataset • Text corpus and associated images are crawled from Flickr. com and wikipedia. com. • We extract and spam all tokens in each text document, whose frequencies are used as text features. • For each image, visual words are extracted with a size of 500 codebook.

Dataset Statistics The number of text and image pairs for each category Category Number of crawled pairs Birds 930 Horses 654 Buildings 9216 Mountain 4153 Cars 728 Plane 1356 Cat 229 Train 457 Dog 486 Waterfall 22006

Dataset (cont’d)

Compared Algorithms • Image only – only the visual features are used for modeling classifiers on the target image domain. • Translated Learning by minimizing Risk (TLRisk) – Transfer text labels in the source domain to the target image domain via a Markovian chain. • Heterogeneous Transfer Learing (HTL) – Implicitly construct a distance function between images by a matrix factorization between images and text documents

Results • Average error rates with respect to different number of training samples in image domain. 0. 31 0. 3 0. 29 0. 28 image only 0. 27 HTL 0. 26 RTL TTI 0. 25 0. 24 0. 23 1 2 3 4 5 6 7 8 9 10

Results • Average error rates with respect to different number of text/image co-occurrence pairs with five training examples) 0. 275 0. 27 0. 265 0. 26 0. 255 0. 245 0. 24 0. 235 0. 23 image only HTL RTL TTI 500 1000 1500 2000

Results • Number of Topics in latent space for establishing cross-domain translator Category # topics Birds 11 Buildings 88 Cars 19 Cat 18 Dog 7 Horses 4 Mountain 6 Plane 15 Train 6 Waterfall 21 Too many building variants!

Revisit Technical Contributions • Cross-Domain Knowledge Propagation – Propagating Knowledge in surrounding text to visual data – Published in WWW’ 11, collaboration with Dr. Charu Aggarwal, IBM • Cross-Category Knowledge Sharing – Exploring the concept correlations to enhance the inference accuracy – To appear in CVPR’ 11, collaboration with Dr. Charu Aggarwal, IBM • Modeling Image Similarity – Hierarchical Gaussianization (HG), ICCV’ 09 – Applications into Disaster Assessment (Collaboration with Prof. Tarek)

Future Work (Q 3) • Resource allocation based on heterogeneous links for communication – Low-redundancy: In base station, send the most informative message (text/multimedia data) – High-quality: In data center, recover the lost information based on redundancy in cross-media links • Effective linkage analysis with constraints in CCM

Future Work (Q 4) • Develop the stochastic and dynamic model and theory for MINet – The effect of structural changes in MINet • For latency analysis in communication networks • For constrained linkage discovery in CCM – The changes of Qo. I in a dynamic MINet

Path Ahead: Theory and Algorithm • Construct Cross-Media Analysis (CMA) Theory – Stochastic model for cross-media relation and redundancy • • Qo. I theory in cross-media networks Information recovery based on cross-media redundancy Dynamic model for cross-media networks Analyze constrained links for CCM • Practical algorithms for sharing and transmitting information in cross-media links • Improve low latency and high quality in communication networks based on cross-media analysis • Applications into CCM for robust constrained link discovery • Cross-media knowledge sharing and discovery

Collaboration Summary • INARC 1. 1: Prof. Tarek Abdelzaher – Cross-media analysis for communication networks – Trade-off between Low latency and high quality • INARC 1. 2: Dr. Charu Aggarwal – Cross-domain knowledge propagation – Cross-Category knowledge sharing – Quality of Information

Publications • Collaboration with Dr. Charu Aggarwal (IBM) – Guo-Jun Qi, Charu Aggarwal and Thomas Huang, Towards Cross-Domain Knowledge Propagation from Text Corpus to Web Images, to appear in Proc. of International World Wide Web conference (WWW 2011), Hyderabad, India, March 28 -April 1, 2011. – Guo-Jun Qi, Charu Aggarwal, Yong Rui, Qi Tian, Shiyu Chang and Thomas Huang. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts. To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, Colorado, June 21 -23, 2011. – Guo-Jun Qi, Charu Aggarwal, Thomas Huang. Transfer learning with distance functions between text and web images. Submitted to the ACM KDD Conference, 2011. • Collaboration with Prof. Tarek Abdelzaher – Md Y. S. Uddin, Guo-Jun Qi, and Tarek Abdelzaher, Thomas Huang, Guohong Cao, “Photo. Net: A Similarity-aware Image Delivery Service for Situation Awareness, ” IPSN Demo, April 2011

Thanks! Q&A

Dataset in Target Domain The number of images for each category in target domain Category # Positive examples # Negative examples Birds 338 349 Buildings 2301 2388 Cars 120 125 Cat 67 72 Dog 132 142 Horses 263 268 Mountain 927 1065 Plane 509 549 Train 52 53 Waterfall 5153 5737

Learning Optimal Translator Learning formulation via optimizing translator function: • The first term: measuring the consistency between the observed occurrence of text and images. • Occurrence set • is monotonically decreasing function, so that a pair with larger occurrence number ck, l will be weighted more. • Co-occurring pairs of source and target samples probably share the same labels, and the translator T shall have larger response to propagate the labels between them.

Learning Optimal Predictor Learning formulation via optimizing translator function: • The second term: the loss function of predictor f. T on training set (e. g. , logistic loss). • encode the discriminative knowledge in the training set. • Large margin principle: it can reduces the noisy information in the occurring set for the classification task.

Learning Optimal Predictor Learning formulation via optimizing translator function: • The third term: encoding the preference of concise semantic translation to the tedious one. • The Principle of constructing “Cross-Domain translator. ” • Nonessential and noisy observation can be filtered out from translation process

Results • Number of Topics in latent space for establishing cross-domain translator Category Two Trn. Ex. Ten Trn. Ex. Birds 11 9 Buildings 88 102 Cars 19 3 Cat 18 2 Dog 7 5 Horses 4 1 Mountain 6 1 Plane 15 25 Train 6 3 Waterfall 21 26 Too many building variants!

Modeling Context-Aware Image Similarity • Current method – Image visual similarity – Hierarchical Gaussianization ICCV’ 09 (Zhou, Huang etc. ) – Hard to model image similarity at semantic level • Model image semantic similarity – Link images to text documents by translator – Compare associated text similarity for comparing image semantics – Advantage • ``Semantic gap” in text documents are smaller • Such similarity reflects semantic level information

Diagram Image Similarity (target domain) Text-image Association by learned translator T (x , y) Text Similarity (source domain)

Path ahead • Improve the Quality of Information (Qo. I) transmitted across domains. – In some cases, the transmitted information may make a negative effect on classification task (negative information transfer). – Construct a new model which allows to predict upon target domain itself when the cross-domain information is detected to be noise.

Future Work • Semantic Level Image similarity in heterogeneous networks – Different sources of heterogeneous sensors, e. g. , cameras, human annotations and textual descriptions – Fusing heterogeneous sources in the networks to learn a more descriptive image similarity – Collaboration with Dr. Charu Aggarwal in IBM on sensor networks and Prof. Tarek Abdelzaher in UIUC on Fact Finder

Linked to INARC Projects • Collaborator – Prof. Tarek Abdelzaher in CS, UIUC (I 1. 1) • Fact Finder: Compare the image similarity at semantic level for discovering trustful sources – Dr. Charu Aggarwal in IBM (I 1. 2) • Sensor networks: comparing the signal similarity with cross-domain knowledge