Lecture 10 Metadata for Media SIMS 202 Information

Lecture 10: Metadata for Media SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10: 30 am - 12: 00 pm Fall 2003 http: //www. sims. berkeley. edu/academics/courses/is 202/f 03/ IS 202 – FALL 2003. 09. 23 - SLIDE 1

Today’s Agenda • Review of Last Time • Metadata for Motion Pictures – Representing Video – Current Approaches – Media Streams • Discussion Questions • Action Items for Next Time IS 202 – FALL 2003. 09. 23 - SLIDE 2

Today’s Agenda • Review of Last Time • Metadata for Motion Pictures – Representing Video – Current Approaches – Media Streams • Discussion Questions • Action Items for Next Time IS 202 – FALL 2003. 09. 23 - SLIDE 3

The Media Opportunity • Vastly more media will be produced • Without ways to manage it (metadata creation and use) we lose the advantages of digital media • Most current approaches are insufficient and perhaps misguided • Great opportunity for innovation and invention • Need interdisciplinary approaches to the problem IS 202 – FALL 2003. 09. 23 - SLIDE 4

What is the Problem? • Today people cannot easily find, edit, share, and reuse media • Computers don’t understand media content – Media is opaque and data rich – We lack structured representations • Without content representation (metadata), manipulating digital media will remain like wordprocessing with bitmaps IS 202 – FALL 2003. 09. 23 - SLIDE 5

Traditional Media Production Chain Metadata-Centric Production Chain METADATA PRE-PRODUCTION IS 202 – FALL 2003 PRODUCTION POST-PRODUCTION METADATA DISTRIBUTION 2003. 09. 23 - SLIDE 6

Automated Media Production Process 1 Active Capture 3 Automatic Editing Adaptive Media Engine 4 Personalized/ Customized Delivery Web Integration and Streaming Media Services Flash Generator HTML Email Annotation of Media Assets WAP 2 Annotation and Retrieval Asset Retrieval and Reuse Print/Physical Media Reusable Online Asset Database IS 202 – FALL 2003. 09. 23 - SLIDE 7

Technology Summary • Media Streams provides a framework for creating metadata throughout the media production cycle to make media assets searchable and reusable • Active Capture automates direction and cinematography using real-time audio-video analysis in an interactive control loop to create reusable media assets • Adaptive Media uses adaptive media templates and automatic editing functions to mass customize and personalize media and thereby eliminate the need for editing on the part of end users • Together, these technologies will automate, personalize, and speed up media production, distribution, and reuse IS 202 – FALL 2003. 09. 23 - SLIDE 8

Active Capture IS 202 – FALL 2003. 09. 23 - SLIDE 9

Active Capture: Reusable Shots IS 202 – FALL 2003. 09. 23 - SLIDE 10

Marc Davis in Godzilla Scene IS 202 – FALL 2003. 09. 23 - SLIDE 11

Evolution of Media Production • Customized production – Skilled creation of one media product • Mass production – Automatic replication of one media product • Mass customization – Skilled creation of adaptive media templates – Automatic production of customized media IS 202 – FALL 2003. 09. 23 - SLIDE 12

Central Idea: Movies as Programs • Movies change from being static data to programs Media Parser Content Representation Producer Media Parser Media Content Representation • Shots are inputs to a program that computes new media based on content representation and functional dependency (US Patents 6, 243, 087 & 5, 969, 716) IS 202 – FALL 2003. 09. 23 - SLIDE 13

Today’s Agenda • Review of Last Time • Metadata for Motion Pictures – Representing Video – Current Approaches – Media Streams • Discussion Questions • Action Items for Next Time IS 202 – FALL 2003. 09. 23 - SLIDE 14

Representing Video • Streams vs. Clips • Video syntax and semantics • Ontological issues in video representation IS 202 – FALL 2003. 09. 23 - SLIDE 15

Video is Temporal IS 202 – FALL 2003. 09. 23 - SLIDE 16

Streams vs. Clips IS 202 – FALL 2003. 09. 23 - SLIDE 17

Stream-Based Representation • Makes annotation pay off – The richer the annotation, the more numerous the possible segmentations of the video stream • Clips – Change from being fixed segmentations of the video stream, to being the results of retrieval queries based on annotations of the video stream • Annotations – Create representations which make clips, not representations of clips IS 202 – FALL 2003. 09. 23 - SLIDE 18

Video Syntax and Semantics • The Kuleshov Effect • Video has a dual semantics – Sequence-independent invariant semantics of shots – Sequence-dependent variable semantics of shots IS 202 – FALL 2003. 09. 23 - SLIDE 19

Ontological Issues for Video • Video plays with rules for identity and continuity – Space – Time – Person – Action IS 202 – FALL 2003. 09. 23 - SLIDE 20

Space and Time: Actual vs. Inferable • Actual Recorded Space and Time – GPS – Studio space and time • Inferable Space and Time – Establishing shots – Cues and clues IS 202 – FALL 2003. 09. 23 - SLIDE 21

Time: Temporal Durations • Story (Fabula) Duration – Example: Brushing teeth in story world (5 minutes) • Plot (Syuzhet) Duration – Example: Brushing teeth in plot world (1 minute: 6 steps of 10 seconds each) • Screen Duration – Example: Brushing teeth (10 seconds: 2 shots of 5 seconds each) IS 202 – FALL 2003. 09. 23 - SLIDE 22

Character and Continuity • Identity of character is constructed through – Continuity of actor – Continuity of role • Alternative continuities – Continuity of actor only – Continuity of role only IS 202 – FALL 2003. 09. 23 - SLIDE 23

Representing Action • Physically-based description for sequenceindependent action semantics – Abstract vs. conventionalized descriptions – Temporally and spatially decomposable actions and subactions • Issues in describing sequence-dependent action semantics – Mental states (emotions vs. expressions) – Cultural differences (e. g. , bowing vs. greeting) IS 202 – FALL 2003. 09. 23 - SLIDE 24

“Cinematic” Actions • Cinematic actions support the basic narrative structure of cinema – Reactions/Proactions • Nodding, screaming, laughing, etc. – Focus of Attention • Gazing, headturning, pointing, etc. – Locomotion • Walking, running, etc. • Cinematic actions can occur • Within the frame/shot boundary • Across the frame boundary • Across shot boundaries IS 202 – FALL 2003. 09. 23 - SLIDE 25

Today’s Agenda • Review of Last Time • Metadata for Motion Pictures – Representing Video – Current Approaches – Media Streams • Discussion Questions • Action Items for Next Time IS 202 – FALL 2003. 09. 23 - SLIDE 26

The Search for Solutions • Current approaches to creating metadata don’t work – Signal-based analysis – Keywords – Natural language • Need standardized metadata framework – – Designed for video and rich media data Human and machine readable and writable Standardized and scaleable Integrated into media capture, archiving, editing, distribution, and reuse IS 202 – FALL 2003. 09. 23 - SLIDE 27

Signal-Based Parsing • Practical problem – Parsing unstructured, unknown video is very, very hard • Theoretical problem – Mismatch between percepts and concepts IS 202 – FALL 2003. 09. 23 - SLIDE 28

Perceptual/Conceptual Issue Similar Percepts / Dissimilar Concepts Clown Nose IS 202 – FALL 2003 Red Sun 2003. 09. 23 - SLIDE 29

Perceptual/Conceptual Issue Dissimilar Percepts / Similar Concepts John Dillinger’s Car IS 202 – FALL 2003 Timothy Mc. Veigh’s Car 2003. 09. 23 - SLIDE 30

Signal-Based Parsing • Effective and useful automatic parsing – Video • • • Shot boundary detection Camera motion analysis Low level visual similarity Feature tracking Face detection – Audio • • Pause detection Audio pattern matching Simple speech recognition Speech vs. music detection IS 202 – FALL 2003 • Approaches to automated parsing – At the point of capture, integrate the recording device, the environment, and agents in the environment into an interactive system – After capture, use “humanin-the-loop” algorithms to leverage human and machine intelligence 2003. 09. 23 - SLIDE 31

Keywords vs. Semantic Descriptors dog, biting, Steve IS 202 – FALL 2003. 09. 23 - SLIDE 32

Keywords vs. Semantic Descriptors dog, biting, Steve IS 202 – FALL 2003. 09. 23 - SLIDE 33

Why Keywords Don’t Work • Are not a semantic representation • Do not describe relations between descriptors • Do not describe temporal structure • Do not converge • Do not scale IS 202 – FALL 2003. 09. 23 - SLIDE 34

Natural Language vs. Visual Language Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking. IS 202 – FALL 2003. 09. 23 - SLIDE 35

Natural Language vs. Visual Language Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking. IS 202 – FALL 2003. 09. 23 - SLIDE 36

Notation for Time-Based Media: Music IS 202 – FALL 2003. 09. 23 - SLIDE 37

Visual Language Advantages • A language designed as an accurate and readable representation of time-based media – For video, especially important for actions, expressions, and spatial relations • Enables Gestalt view and quick recognition of descriptors due to designed visual similarities • Supports global use of annotations IS 202 – FALL 2003. 09. 23 - SLIDE 38

Today’s Agenda • Review of Last Time • Metadata for Motion Pictures – Representing Video – Current Approaches – Media Streams • Discussion Questions • Action Items for Next Time IS 202 – FALL 2003. 09. 23 - SLIDE 39

After Capture: Media Streams IS 202 – FALL 2003. 09. 23 - SLIDE 40

Media Streams Features • Key features – – – Stream-based representation (better segmentation) Semantic indexing (what things are similar to) Relational indexing (who is doing what to whom) Temporal indexing (when things happen) Iconic interface (designed visual language) Universal annotation (standardized markup schema) • Key benefits – More accurate annotation and retrieval – Global usability and standardization – Reuse of rich media according to content and structure IS 202 – FALL 2003. 09. 23 - SLIDE 41

Media Streams GUI Components • Media Time Line • Icon Space – Icon Workshop – Icon Palette IS 202 – FALL 2003. 09. 23 - SLIDE 42

Media Time Line • Visualize video at multiple time scales • Write and read multi-layered iconic annotations • One interface for annotation, query, and composition IS 202 – FALL 2003. 09. 23 - SLIDE 43

Media Time Line IS 202 – FALL 2003. 09. 23 - SLIDE 44

Icon Space • Icon Workshop – Utilize categories of video representation – Create iconic descriptors by compounding iconic primitives – Extend set of iconic descriptors • Icon Palette – Dynamically group related sets of iconic descriptors – Reuse descriptive effort of others – View and use query results IS 202 – FALL 2003. 09. 23 - SLIDE 45

Icon Space IS 202 – FALL 2003. 09. 23 - SLIDE 46

Icon Space: Icon Workshop • General to specific (horizontal) – Cascading hierarchy of icons with increasing specificity on subordinate levels • Combinatorial (vertical) – Compounding of hierarchically organized icons across multiple axes of description IS 202 – FALL 2003. 09. 23 - SLIDE 47

Icon Space: Icon Workshop Detail IS 202 – FALL 2003. 09. 23 - SLIDE 48

Icon Space: Icon Palette • Dynamically group related sets of iconic descriptors • Collect icon sentences • Reuse descriptive effort of others IS 202 – FALL 2003. 09. 23 - SLIDE 49

Icon Space: Icon Palette Detail IS 202 – FALL 2003. 09. 23 - SLIDE 50

Video Retrieval In Media Streams • Same interface for annotation and retrieval • Assembles responses to queries as well as finds them • Query responses use semantics to degrade gracefully IS 202 – FALL 2003. 09. 23 - SLIDE 51

Media Streams Technologies • Minimal video representation distinguishing syntax and semantics • Iconic visual language for annotating and retrieving video content • Retrieval-by-composition methods for repurposing video IS 202 – FALL 2003. 09. 23 - SLIDE 52

Non-Technical Challenges • Standardization of media metadata (MPEG-7) • Broadband infrastructure and deployment • Intellectual property and economic models for sharing and reuse of media assets IS 202 – FALL 2003. 09. 23 - SLIDE 53

Today’s Agenda • Review of Last Time • Metadata for Motion Pictures – Representing Video – Current Approaches – Media Streams • Discussion Questions • Action Items for Next Time IS 202 – FALL 2003. 09. 23 - SLIDE 54

Discussion Questions (Davis) • John Snydal on Media Streams – What is the target audience of users (annotators/retrievers) for Media Streams? In the article the following groups are mentioned: • • • Content providers Video editors News teams Documentary film makers Film archives Stock photo houses Video archivists Video producers (international audience) (illiterate and preliterate people) – Is it possible that Media Streams could satisfy the needs, goals and requirements of all of these groups, or would it be more appropriate to develop separate, tailored applications for the unique needs of each group? IS 202 – FALL 2003. 09. 23 - SLIDE 55

Discussion Questions (Davis) • danah boyd on Media Streams – Icons require visual literacy. Icons are also culturally constructed. Thus, for them to work as an information access bit, people must learn the visual language; it is not inherent. What are the social consequences of a system dependent on unfamiliar cues? IS 202 – FALL 2003. 09. 23 - SLIDE 56

Discussion Questions (Davis) • danah boyd on Media Streams – Films are constructed narratives. But most commonplace storytelling is not. Even in a creative form, people often piece together found objects instead of finding objects to fit their story. (Think teenage girls making collages out of the latest YM. ) Storytelling also happens around media far more than through media (i. e. telling a story about a picture rather than using a collection of pictures to tell a story). My guess is that this social phenomenon goes beyond the retrieval issues. Do you think that Media Streams would encourage new behavior regarding storytelling or will it only be useful for those with a constructed narrative in mind? Why (not)? IS 202 – FALL 2003. 09. 23 - SLIDE 57

Discussion Questions (Davis) • Jesse Mendelsohn on Media Streams – Media Streams does not allow iconic descriptions of emotion or scene-interpretation. How would someone searching stock footage for a “suspenseful scene of two men beating each other” go about doing it? The actual sense of “suspense” and the act of “beating” cannot be iconified. Does this limit Media Streams' ability or is there a way around it within its capabilities as described? IS 202 – FALL 2003. 09. 23 - SLIDE 58

Discussion Questions (Davis) • Jesse Mendelsohn on Media Streams – In order for Media Streams to work well it relies on a the availability of a very large and extensive resource of well-annotated video. Is the current annotation process too primitive and/or time consuming to allow Media Streams to work to its full potential? Will changing how Media Streams can be used to annotate video or changing video annotation methods in general make Media Streams more effective? IS 202 – FALL 2003. 09. 23 - SLIDE 59

Today’s Agenda • Review of Last Time • Metadata for Motion Pictures – Representing Video – Current Approaches – Media Streams • Discussion Questions • Action Items for Next Time IS 202 – FALL 2003. 09. 23 - SLIDE 60

Assignment 4. 1 • Phone Metadata Design - Part 1 – Due Oct 2 IS 202 – FALL 2003. 09. 23 - SLIDE 61

Next Time • Database Design (RRL) • Readings – Handouts in Class • Database Modeling and Design -- Ch. 2 The ER Model - Basic Concepts (Teorey, T. J. ) • Logical Database Design and the Relational Model (F. R. Mc. Fadden, J. A. Hoffer) IS 202 – FALL 2003. 09. 23 - SLIDE 62