
f83ac224c3fb9f3537e189e48224c158.ppt
- Количество слайдов: 65
Handling Spatially Complex English -to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH Presentation November 3, 2003
ASL Machine Translation with Pyramids and Invisible Worlds Matt Huenerfauth CLUNCH Presentation November 3, 2003
Today’s Talk This is work in progress. • ASL Linguistics and Machine Translation • Initial Approaches to ASL MT • Handling Spatially Complex ASL – A Multi-Path MT Architecture. – Adopting some HMS lab technology. – Interesting Linguistic Motivations. • Current and Future Work
Motivations and Applications • Only half of deaf high school graduates can read English at a fourth-grade level – despite sophisticated ASL fluency. • Many efforts to help the deaf access the hearing world forget English is their 2 nd language (& different than ASL). • Applications for a Machine Translation System: – – TV captioning, teletype telephones. Human interpreters intrusive/expensive. Educational tools, access to information. Storage and transmission of ASL.
Output: Signing Virtual Humans • Virtual reality models of the human form are now articulate & fast enough to produce ASL. • ASL Generator produces instructions for the avatar, and the avatar performs the signs -producing animated output for the user. • Our problem is how to build these instructions.
Virtual Signing Humans Photos: Seamless Solutions, Inc. Simon the Signer (Bangham et al. 2000. ) Vcom 3 D Corporation
ASL Linguistics I • What is ASL? – Real language? Who uses it? – Different than SEE or SSE. • How is it different than English? – Grammar, Vocabulary, Visual/Spatial. – More than the Hands: Simultaneity! – How signs can be changed: Morphology! – Use of Space around the Signer…
ASL Linguistics II • Discourse Space – Put discourse entities on “shelves” for later referential use. – “Agreement” - Pronouns, Possessives, Verbs. – Don’t interpret locations literally. (Bob to the left of Tim. ) • Three-Dimensional Space – Space around signer is visually analogous to a real scene. – Classifier Predicates • Signers describe 3 D scenes with their hands. • Meaningful handshape and 3 D representative movement path.
ASL Linguistics III • Traditional Sentences: (No classifier predicates. ) Where does Billy attend college? wh #BILLY IXx GO-TO UNIVERSITY WHERE • Spatially Complex: (Uses classifier predicates. ) I parked my car next to his cat. POSSx CAT Class. Pred-bent-V-{locate cat in space} POSS 1 s CAR Class. Pred-3 -{park next to cat} The truck drove down the windy road. IXx TRUCK Class. Pred-3 -{drive on windy road} 8
Initial Approaches to ASL MT Non-statistical Direct and Transfer MT Architectures
Corpora for ASL? • ASL has no written form; so, there’s no newswires or ready-made sources of text. • Some groups have attempted to record annotate video tapes, but the difficulty of creating a useful and consistent manual transcription standard and then performing the transcription makes for very slow work. • No statistical approaches to ASL MT.
Machine Translation Pyramid MT Pyramid Dorr 1998. • Options in MT design. • No stats? higher path: – more work – domain size – subtler divergences handled
Option 1: Direct Translation • What kind of non-statistical translation possible if all we do is word-level analysis (i. e. morphology, POS & sense tagging) ? • Word-for-sign dictionary look-up system. • Probably not sophisticated enough analysis to produce ASL, but could produce SEE.
Option 2: Transfer Translation • Syntactically analyze English text before crossing over to ASL. – Capture more divergences and handle more complex phenomena. – Can successfully translate many English sentences into ASL. • Some previous work along these lines. – some use deep syntax or simple semantics
Transfer Issues for ASL • ASL Discourse Model: topics, referents in space. • Representing & Generating Non-Manual Signals. • Computational Model of ASL Phonology – facilitate creation of an ASL lexicon – define morphological and phonological operations • Parameterizing ASL Features for Morphology • Note: If system couldn’t handle a particular input, just fall back on direct translation to produce signing output closer to SEE than fluent ASL. 14
Handling Spatially Complex ASL Failings of direct and transfer approaches to ASL MT.
But what’s the hard part? • Previous ASL generation work has ignored spatially complex ASL sentences. – Classifier predicates and spatial verbs – Very common, very communicatively useful. • Difficult to handle in transfer architecture. (More going on than just syntax with these. )
Translate to a Classifier Predicate The car drove down the bumpy road past my house. POSS 1 s HOUSE Class. Pred-C-{locate house} IXx CAR Class. Pred-3 -{drive on bumpy road} • Where’s the house, the road, and the car? How close? Where does the path start/stop? How show path is bumpy, winding, or hilly?
Paralinguistic? Iconic? Spatial? • Linguists debate whether classifier predicates are: – Paralinguistic visually iconic gestural movements – Complex non-spatial polymorphemic constructions – Semantically compositional yet still spatially aware • Pushing the boundaries of ‘language’… – May involve gradient information, spatial analogy, scene visualization, and a degree of iconicy. – Not clear traditional linguistic approaches can capture. – Still seems linguistic however: many constraints… 18
When the going gets tough… • …the tough try an interlingua. – Hard to address using morphological, syntactic, and simple semantic information of the English text. – Direct or transfer architecture appear insufficient. • What about an interlingual approach? – Problem: Hard to build interlingua system for unlimited (or even medium-sized) domain. Lots of overhead! – Interlingual systems only for limited domains.
Getting by with limited domain? • Special about ASL: can identify ‘hard’ sentences. – Spatially descriptive text: English spatial verbs describing locations, orientations, or movements; spatial prepositions or adverbs; concrete or animate entities; other common motifs or situations when classifier predicates are used (detect lexically). • Use broad-coverage transfer approach for most inputs, and detect when we need to use something more powerful when we have a spatially complex English input sentence.
“Multi-Path” MT? • Whenever possible, Use simpler easier-to-build MT approach. • Only when needed, Use more sophisticated resource-intensive. • We take advantage of the ‘breadth’ of one and the ‘depth’ of the other. • If we add direct translation (to SEE) to the picture, we actually have three pathways.
“Pyramidal” MT? MT Pyramid Dorr 1998. Don’t interpret this picture as a set of options anymore… Now it’s a skeleton for a multi-path MT architecture.
What is our Interlingua? • What is the language-neutral representation between the English and ASL when talking about a spatially complex scene? • Intuitively, the signer has a visualization of the 3 D scene which they are discussing. • So, a spatial representation of reality (or the signer’s imagination/conception of this reality) is serving as the interlingua. This sounds rather ambitious… How could the computer model spatial reality?
What about Virtual Reality? • Analyze the English text, construct 3 D virtual reality representation of the scene, and use VR as basis for generating the spatially iconic classifier predicate movements. • But has anyone ever attempted to construct a 3 D virtual reality representation of a changing scene as it is described by English sentences? • Actually, the University of Pennsylvania has. 22
A Useful Technology Natural Language Command Control of Virtual Reality Scenes
HMS & NLP Labs: 3 D Scene NL-Command • Have a virtual reality model of characters and objects in a three-dimensional scene. • Accepts English text input (directions for the characters or objects to follow). • Produces an animation in which the characters obey the English commands. • Updates the 3 D scene to show changes. Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Schuler. 2003.
An NL-Controlled 3 D Scene http: //hms. upenn. edu/software/PAR/images. html
NL Command Control Animated 3 D Scene Actionary: PAR Templates for Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
NL Command Control Animated 3 D Scene Actionary: PAR Templates for Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR “Actionary” = Action Dictionary = List of PAR Templates Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text What’s a PAR?
Parameterized Action Representation participants: [ agent: objects: semantics: Specify Locomotion [ motion: path: termination: duration: Verb manner: start: prep conditions: sub-actions: parent action: previous action: next action: AGENT OBJECT list Arguments ] {Object, Translate? , Rotate? } {Direction, Start, End, Distance} CONDITION Adjuncts TIME-LENGTH MANNER ] TIME CONDITION boolean-exp sub-PARs PAR PAR Planning Operator This is a subset of PAR info. http: //hms. upenn. edu/software/PAR
NL Command Control Animated 3 D Scene Actionary: PAR Templates for Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
NL Command Control Animated 3 D Scene Actionary: PAR Templates for Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
NL Command Control Animated 3 D Scene Actionary: PAR Templates for Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
MT Approach to Classifier Predicates Using the HMS NL Command Control Technology 25
Using this technology… http: //hms. upenn. edu/software/PAR/images. html An NL-Controlled 3 D Scene
Using this technology… An NL-Controlled 3 D Scene
Using this technology… An NL-Controlled 3 D Scene Signing Character Original image from: Simon the Signer (Bangham et al. 2000. )
Using this technology… An NL-Controlled 3 D Scene Signing Character Original image from: Simon the Signer (Bangham et al. 2000. )
“Invisible World” Approach • Mini VR scene in front of the signer containing entities from English text. (They’re invisible. ) • Interpret the English sentences as NL commands. Instantiate PARs which position, move, reorient, and otherwise modify the entities in this world. • Update VR model. • Use hand to show changes in the invisible scene. • VR acts as intermediary between English & ASL.
Interlingual Pathway for ASL Original image: MT Pyramid Dorr 1998. Our MT picture… We now have an interlingual pathway.
Interlingual Pathway for ASL
Interlingual Pathway for ASL The NL-Command Technology
Interlingual Pathway for ASL
Interlingual Pathway for ASL This step harder than seems…
VR Scene Doesn’t Do It All • Various factors aside from the movement of the scene itself can affect this generation choice: – conventional motifs of expression • e. g. furniture or items in a room – restrictions on use of multiple hands simultaneously – handshape-movement combination constraints • e. g. ‘approaching’ constructions – discourse or semantic concerns/priorities, etc. • There’s generation work to be done!
An NL Engineering Solution • How to create the classifier predicates from VR? – Write rules obeying restrictions that inspect the VR scene, consider English text semantics, and combine many small units/morphemes to slowly produce or narrow-in on a classifier predicate output. – Easier approach: Lexicalize classifier predicates as much as possible. Define and specify a big list of classifier predicate templates – their performance and semantics. Fill slots based on info in the VR scene. • HMS: To define set of possible movement templates, build a PAR “actionary” specifying the animation possibilities. 30
A Second Actionary: For ASL • The first actionary (list of PAR templates) we saw was used while analyzing the English text. It listed possible types of movements the imaginary entities perform in the virtual reality scene. • This second actionary would describe the possible movements of the signer’s hands while performing one or more interrelated classifier predicates (& discourse/semantic effects). Original image from: Simon the Signer (Bangham et al. 2000. )
Interlingual Pathway for ASL This step could be hard…
Interlingual Pathway for ASL We now have an architecture for the interlingual pathway!
Multi-Path Pyramidal MT MT Pyramid Dorr 1998. Interlingual: Spatial Text Transfer: Most Sentences Direct: Unanalyzable Text
A Final Consideration Other motivations for the lexicalized classifier predicate “double actionary” architecture… 34
Interlingual Pathway for ASL Practical engineering motivations for design: Just a hack? Does relying on template actionary limit output too much?
Linguistic Motivations • “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). – Double-Actionary design analogous to model of how humans generate classifier predicates. – This model assumes signers imagine entities under discussion occupying space before them. – It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.
Linguistic Motivations • “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). – Double-Actionary design analogous to model of how humans generate classifier predicates. – This model assumes signers imagine entities under discussion occupying space before them. – It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.
Linguistic Motivations • “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). – Double-Actionary design analogous to model of how humans generate classifier predicates. – This model assumes signers imagine entities under discussion occupying space before them. – It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.
Linguistic Motivations • “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). – Double-Actionary design analogous to model of how humans generate classifier predicates. – This model assumes signers imagine entities under discussion occupying space before them. – It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.
Linguistic Motivations • “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). – Double-Actionary design analogous to model of how humans generate classifier predicates. – This model assumes signers imagine entities under discussion occupying space before them. – It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities. • Both engineering & linguistic motivations.
Liddell’s Argument for Lexicalization • Rejects assertion that spatial model not necessary. – Failings of non-spatial polymorphemic CP models. Unless very many morphemes: under-productive. • Rejects naïve visually representative/analogous paralinguistic description of classifier predicates. – These models are over-productive, predicting unseen ASL constructions corresponding to imaginable movements, but model can’t explain these restrictions. • Parameterized CP lexicon explains restrictions (template not in lexicon) but incorporates spatial productivity of the visually analogous model. 38
Summary
Where we’re at… • • Seen MT approach for ASL classifier predicates. Proposed “Multi-Path Pyramidal” architecture. Uses HMS lab virtual reality software. Design is analogous to Liddell’s recent CP model. – Reached same design from engineering approach. – System could serve as test-bed for the model. • Survey, analysis, design draft, and specification. Implementation not started yet… Suggestions?
Questions?
Is the VR really an interlingua? • Depends on your definition & how implemented. – Language neutral: 3 D coordinates & VR info: not language specific. But ASL PAR selection/filling might use other info. – Semantic representation: Yes, model for 3 D spatial domains. – Useful for translation: Let’s consider this… We’ve shown how it can be. – World knowledge beyond input semantics: Yes, in that it handles spatial/physics matters.
Ontology vs. Domain • Special property of ASL: easy to identify ‘hard sentences’ requiring interlingua. – Only need to build interlingual resources to cover these domains (e. g. moving vehicles, furniture layout, etc. ). • But limited domains all similar: discuss 3 D location, movements, and dimensions. – So the ontological expressiveness of this interlingua doesn’t have to be nearly as powerful as most systems. – Abstract concepts, beliefs/intentions, quantification… – Not just things – but types of things – are limited.
References Cited Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Parameterized action representations and natural language instructions for dynamic behavior modification of embodied agents. AAAI Spring Symposium. Bangham, Cox, Lincoln, Marshall. 2000. Signing for the deaf using virtual humans. IEE 2000. Liddell. 2003. Sources of Meaning in ASL Classifier Predicates. In Karen Emmorey (ed. ). Perspectives on Classifier Constructions in Sign Languages. Workshop on Classifier Constructions, La Jolla, San Diego, California. Schuler. 2003. Using model-theoretic semantic interpretation to guide statistical parsing and word recognition in a spoken language interface. Proceedings of the 41 st Annual Meeting of the Association for Computational Linguistics (ACL’ 03), Sapporo, Japan.