8a6204df0a9450b730a7edb484976558.ppt
- Количество слайдов: 24
CONFUCIUS: an Intelligent Multi. Media storytelling interpretation & presentation system Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics University of Ulster, Magee
Objectives of CONFUCIUS w To interpret natural language story and movie (drama) script input and to extract conceptual semantics from the natural language w To generate 3 D animation and virtual worlds automatically from natural language w To integrate 3 D animation with speech and nonspeech audio, to form an intelligent multimedia storytelling system for presenting multimodal stories
CONFUCIUS’ context diagram Storywriter /playwright ural la Story in nat nguage Movie/drama script Tailored m CONFUCIUS enu for sc ript input Speec h (dia logue ) 3 D animation udio a speech non- User /story listener
Previous systems w Schank’s CD Theory (1972) n n Primitive & scripts SAM & PAM w Automatic Text-to-Graphics Systems n n n Words. Eye (Coyne & Sproat, 2001) ‘Micons’ and CD-based language animation (Narayanan et al. 1995) Spoken Image (Ó Nualláin & Smith, 1994) & its successor SONAS (Kelleher et al. 2000)
w Multi. Modal interactive storytelling n n n Aesop. World Kids. Room Larsen & Petersen’s Interactive Storytelling Oz Computer games w. Virtual humans & embodied agents BEAT (Cassell et al. , 2000) n Jack (University of Pennsylvania) n Improv (Perlin and Goldberg, 1996) n Sim. Human n Gandalf n PPP persona n
Architecture of CONFUCIUS Natural language stories Script writer Script parser Prefabricated objects (knowledge base) Language knowledge mapping 3 D authoring tools, existing 3 D models & character models visual knowledge (3 D graphic library) lexicon grammar etc Natural Language Processing Text To Speech Sound effects semantic representations visual knowledge Animation generation Synchronizing & fusion 3 D world with audio in VRML
Semantic representations
Multi. Modal semantic representation High-level multimodal semantic representation: XML/frame-based Multimodal semantics Media-independent representation Visual media-dependent representation Intermediate level Visual modality Audio media-dependent representation Language modality Non-speech audio modality
Mental imagery & meaning processing Meanings, communicable ideas, thoughts, manifestable messages, proverbs, examples, parables, etc. Simulation: presentation via language or other modalities Mental world Communicati on Simulation: Image recognition Cognition Physical world Mental world Simulation: Language understanding Re-cognition Virtual world
Knowledge base of CONFUCIUS knowledge base Language knowledge Visual knowledge Semantic knowledge - lexicons (eg. Word. Net) Syntactic knowledge - grammars Statistical models of language Associations between words Object model (nouns) Event model (event verbs, describes the motion of objects) Functional information Internal coordinate axes (for spatial reasoning) Associations between objects World knowledge Spatial & qualitative reasoning knowledge
Graphic library objects/props characters Simple geometry files geometry & joint hierarchy files instantiation motions animation library (key frames)
Data Flow Diagram Primitives library Natural language processor Visual semantics Animation generator VRML without sound nodes Scene&Actor descriptions dialogues script Script parser Non-speech audio script story TTS Sound effect driver Script writer Music library Media coordination Synthesized animation
Animation generator LCS representation verb semantic analysis match basic motions in library? use lexical relations (Word. Net) to replace synonyms, scripts application, etc. Y N motion decomposition animation controller motion instantiation environment placement VRML format of the virtual story world examples demo
Categories of events w. Atomic entities Change physical location such as position and orientation, e. g. “bounce”, “turn” n. Change intrinsic attributes such as shape, size, color, and texture, e. g. “bend”, and even visibility, e. g. “disappear”, “fade” (in/out) n w. Non-atomic entities n Non-character events l. Two or more individual objects fuse together, e. g. “melt” (in) l. One object divides into two or more individual parts, e. g. “break” (into pieces) l. Change sub-components (their position, size, color), e. g. “blossom” l. Environment events (weather verbs), e. g. “snow”, “rain” n Character events l. Action verbs w. Intransitive verbs w. Transitive verbs l. Non-action verbs (stative, emotion, possession, mental activities, cognition & perception) l. Idioms & metaphor verbs
Categories of action verbs w Intransitive verbs n n n Biped kinematics, e. g. “walk”, “swim”, & other motion models like “fly” Face expressions, e. g. “laugh”, “anger” involve speech modality Lip movement, e. g. “speak”, “say” w Transitive verbs n n single object, e. g. “throw”, “push”, “kick” multiple objects l l direct and indirect objects, e. g. “give”, “pass”, “show” indirect object & the instrument, e. g. “cut”, “hammer”
Visual definition & word sense polysemy verb many synonymy many word sense one visual definition entry mapping Example: “close” (a door) 1. 2. 3. a normal door (rotation on y axis) a sliding door (moving on x axis) a rolling shutter door (a combination of rotation on x axis and moving on y axis) word sense -- minimal complete unit of meaning in the language modality visual definition entry -- minimal complete unit of meaning in the visual modality
Troponyms & verbs derived from adjectives/nouns w troponym n n n elaborates the manners of a base verb (Fellbaum 1998) examples: “trot”-“walk” (fast), “gulp”-“eat” (quickly) base verb + adverb present the base verb + modify the manner (speed, the agent’s state, duration of the activity, iteration, etc. ) w Verbs derived from adjectives or nouns n n n change objects’ properties (size, color, shape) or the world state verbs with affixes such as –en, -ify, or –ize, e. g. “lengthen” using predicates scale(), squash() or changing the corresponding property fields of the object in VRML
Representing active & passive voice w active and passive voice w converse verb pairs such as “give/take”, “buy/sell”, “lend/borrow” w same activity from different point of view w use of VRML Viewpoint node
Implementation: semantics VRML Example: “A ball is bouncing” bounce(ball): [move. To(ball, [0, 0, 0]), move. To(ball, [0, 20, 0])]L. (a) visual definition of “bounce” DEF ball Transform { translation 0 0 0 children [ Shape { appearance Appearance{ material Material{} } geometry Sphere { radius 5 } } ] } (b) VRML code of a static ball DEF ball Transform { translation 0 0 0 children [ DEF ball-TIMER Time. Sensor { loop TRUE cycle. Interval 0. 5 }, DEF ball-POS-INTERP Position. Interpolator { key [0, 0. 5, 1 ] key. Value [0 0 0, 0 20 0, 0 0 0 ] }, Shape { appearance Appearance { material Material {} } geometry Sphere { radius 5 } }] ROUTE ball-TIMER. fraction_changed TO ball-POS-INTERP. set_fraction ROUTE ball-POS-INTERP. value_changed TO ball. set_translation } (c) Output VRML code of a bouncing ball
Categories of adjectives Objects’ attributes/states: dark/light, large/small, big/little, white/black (color adj. ), long/short, new/old, high/low, full/empty, open/closed Visually observable Observable human attributes Relational adj. : Feelings: happy/sad, angry, excited, surprised, terrified Others: old/young, beautiful/ugly, strong/weak, poor/rich, fat/thin nasal (nose), mural (wall), dental (teeth) Perceivable by other modalities: wet/dry, warm/cold, coarse/smooth, hard/soft, heavy/light Visually unobservable Unobservable human attributes (virtue): Abstract attributes good/evil, kind, mean, ambitious Others: easy/difficult, real, important, particular, right/wrong, early/late Reference-modifying adj. : possible/impossible, former, past/present, last, other, different/same
Software Analysis w Java programming language n n n parsing intermediate representation changing VRML code to create/modify animation integrating modules w Natural language processing tools n n n Gate (pre-processing) PC-PARSE (morphologic and syntax analysis) Word. Net (lexicon, semantic inference) w 3 D graphic modelling n n n existing 3 D models on the Internet 3 D Studio Max (props & stage) VRML (Virtual Reality Modelling Language) 97, H-anim 2001 spec. w The Actors – using embodied agents n n Microsoft Agent (the narrator and minor actors) Character Studio, Internet Character Animator (protagonists)
Natural Language Processing Preprocessing PC-PARSER Part-of-speech tagger Syntactic parser Semanti c inferenc Word. Net 1. 6 e Coreferen ce resolution FEATURES Tempor al reasonin g LEXICON & MORPHOLOGICAL RULES morphologi cal parser
Contribution & prospective applications u multimodal semantic representation of natural language u automatic animation generation u multimodal fusion and coordination u Children’s education u Multimedia presentation u Movie/drama production u Script writing u Computer games u Virtual Reality
Conclusion The objectives of CONFUCIUS meet the challenging problems in language visualisation: u formalizes meaning of action verbs and states u mapping language primitives with visual primitives u a reusable ‘common sense’ knowledge base for other systems u sophisticated spatial and temporal reasoning u representing stories by temporal multimedia requires significant coordination


