c21d3c70ce33f6b63a7e92ee5da4bc38.ppt
- Количество слайдов: 18
Computer-supported Interaction Hamed Ketabdar Shiva Sundaram
Computer-supported interaction • Technologies which support interaction between human, machine and environment • Capturing, processing, and retrieving multimedia
Computer-supported interaction Schedule: Thursday 14 -16 h, FR 0512 C, starting 04. 11. 2010 Hamed Ketabdar: Ph. D in Electrical Engineering from Swiss Federal Institute of Technology at Lausanne (EPFL) Hamed. Ketabdar@telekom. de Shiva Sundaram: Ph. D in Electrical Engineering from University of Southern California (USC) Shiva. Sundaram@telekom. de
Outline • Multi-modal Interfaces – Input methods: keyboard, pen, voice, gesture, tactile (touch) … – Output modalities: audio, video, tactile – Fusion of modalities • Speech Processing – Speech Recognition: statistical methods, acoustic modelling, and decoding – Meta data extraction (age, gender, language, emotion) – Audio-visual speech recognition – Multi-lingual speech recognition • Information Retrieval – Representation – Clustering/segmentation/classifi cation – Integration with other processes • System and Architecture – Natural Language Processing • Translation
Practical sessions Possibility for small class projects: Quickly develop multi-modal interfaces based on our context aware SDK for i. Phone …
User Activity and Context Detection with Mobile Phones Detect whether you are walking, sitting, in a meeting, concert or party, or in an emergency situation … § § § Mobile phones are equipped with microphone and tilt sensors Audio context is detected using microphone output Physical activity signature is captured using tilt sensor output Tilt and audio information are combined to detect context and/or user activity Time, duration and other prior knowledge can be also integrated Applications: § Smart mobile phones • Control ring and other functionalities according to context § Surveillance and organization (employees, elderly, children) • Information about user activity can be used for better organization of employees and taking care of elderly and children § Smart home environment
General Purpose Audio Switches A Switch which can be triggered based on speech commands or non-speech events The commands or speech events can be learned automatically § Switch can be easily reconfigured for a new command or application Involves: § Automatic language/event acquisition § Robustness to different sources of variabilities yeah ae Applications: § Smart environments, security and surveillance Dreams: § § You buy it, as you may buy a normal mechanical switch in stores It can be installed everywhere the same way as a normal switch y
Call Classification Anger, Gender, Age, Language, … Hierarchical design and discriminative training: § § § Discriminative representation of emotional states Efficient fusion of different acoustic features with higher level information (e. g. duration, message content) Efficient feature selection mechanism, less computational load for feature extraction Pitch, Intensity Discriminative transformatio n Combination Textual data, duration, … Call classification
Digital Logging of Physical Activities and Context Enhancing Emergency and Security/Privacy Functionalities in Mobile Phones • Unexpected physical events experienced by a mobile phone can be signs of critical security or emergency scenarios: • Having phone under the risk of being lost or stolen: confidential information on the phone can be exposed • Phone user experiencing an accident Mobile. HCI 2009, Ubicomp 2009
Digital Logging of Physical Activities and Context: Entertainment: What Type of Music You May Like to Hear? § Automatic selection of music based on context: § Actual activity of user § Audio activity in the environment § Habits and music taste can be also integrated 11 th International ACM Conference on Computers and Accessibility (ASSETS 2009)
Interaction with Mobile User Interface Sending commands Turning pages Zooming Click and Double Click Calling an application or service
Motivating Design of Very Small Mobile Devices, Headsets, Wrist Watches, and Portable Music Players
Magi. Sign: “ 3 D Magnetic Signatures” for User Identification/Authentication • The user creates his own arbitrary 3 D signature using a properly shaped magnet in the 3 D space around the device. • • • Wider choice for authentication as it can be flexibly drawn in 3 D space around the device. No hardcopy of 3 D magnetic signature can be easily generated. Unlike Regular signatures can not be affected by the quality of paper, pen, ink, etc. • 3 D Magnetic Signature: • A simple 3 D motion • Regular signature of the user drawn on the air! • Any other combination of even higher complexity actively using all 3 D space around the device. Call classification • A magnet as a physical key? A personalized magnet in terms of shape and polarity can enhance the authentication process … • Can be used for accessing a service or data, entrance doors, or simply instead of regular signature during a purchase … Even simple gestures may be used for authentication
Magi. Write: Write It in the Air! • Text entry based on magnetic field interaction • Character shaped gestures are written in the space around the device • Suitable for dialling a number, entering a pin code, selecting a text entry, etc. • Especially useful for very small mobile devices in which it is hard to operate or design small keypads or touch screens
Magi. Entertain: Using Magnetic Interaction in Mobile Entertainment Applications (Gaming and Audio Synthesis) • Conventionally touch pads and touch screens are used for gaming • Screen occlusion • Magi. Game: Actions of a game avatar such as shooting, jumping, and changing the aim can be controlled • No screen occlusion, natural gesture based interaction, more actions per minute, possibility of multi-player gaming on a device • Adjusting different audio and DJ effects based on position, orientation, and movements of the magnet • Changing sound volume and audio tracks in a portable music player • New music instruments …, two players can play on the same instrument
Literature Basics: • Laurence Rabiner and Biing-Hwang Juang: „Fundamentals of speech recognition“ (Prentice Hall, 1993) • Bernd Pompino-Marschall: „Einführung in die Phonetik“ (de Gruyter, 1995) • Richard O. Duda, Peter E. Hart, David G. Stork: „Pattern Classification“ (Wiley, 2000) • Keinosuke Fukunaga: „Statistical Pattern Recognition“ (Academic Press, 1990) • Thomas H. Cormen: „Introduction to Algorithms“ (MIT, 1990) Automatic Speech Recognition: • Ernst Günter Schukat-Talamazzini: „Automatische Spracherkennung -- Grundlagen, statistische Modelle und effiziente Algorithmen. “ (Vieweg, 1995) • Andreas Wendemuth: „Grundlagen der stochastischen Sprachverarbeitung“ (Oldenburg, 2004) • Tanja Schultz und Katrin Kirchhoff: "Multilingual Speech Processing" (Academic Press, 2006) • Fred Jelinek: „Statistical methods for speech processing“ (MIT, 1997)
Exam?
Webpage • Detailed information about our projects can be found at http: //www. deutsche-telekomlaboratories. de/~ketabdar. hamed/ • All the updated information, slides, etc. can be soon found at: http: //www. deutsche-telekomlaboratories. de/~ketabdar. hamed/teachingsection/index. htm
c21d3c70ce33f6b63a7e92ee5da4bc38.ppt