45ed69f872fbeec87002ba7b20aa7f87.ppt
- Количество слайдов: 20
IBM Research IBM India Research Laboratory Overview with an effort to be in the context of FIRE Debapriyo Majumdar (debapriyo@in. ibm. com) IBM India Research Lab, Bangalore FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research - Overview § The largest private research institution in the world § Annual R&D budget of around $6 B (includes development as well) § Over 3, 000 researchers § Mathematics, Computer Science, Physics, Service Science, … § Over 40, 000 US patents since 1993 − Most patents of all companies in the world in the last 15 years § Eight labs across the world 2 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Labs Worldwide Columbia University Watson Almaden San Jose Established: 1986 1952 Established: Austin Established: 1995 1961 1945 Zürich Established: 1956 Beijing Established: Tokyo Haifa Established: 1995 Established: 1972 India–DEL/BLR Established: 1998/2005 4 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 1982
IBM Research IBM India and India Research Lab § IBM India - Second largest population of IBM outside the US (over 75, 000) . − Current technical population 40, 000+ Delhi § India Research Lab − Delhi, since 1998 − Bangalore, since 2005 − About 150 technical people . Kolkata Mumbai . . . Pune Hyderabad Bangalore Chennai IBM India Business Units Application Services Business Process Transformation Services India Software Lab (ISL) Global Service Delivery Center India Research Lab (IRL) Domestic Operations/Others 5 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research IRL Focus Areas Business Areas Service Delivery Infrastructure Services Application Services Emerging Solutions Contact Center Services Telecom Software Others (Banking, etc. ) Systems Technical Competencies Computer Science Service Science • • • 7 Math Science • • • Distributed Systems – systems mgmt. , middleware Information Management – IE, Data mining Interaction Technologies – speech Programming Technologies – parallel and hi-perf. prog. Software Engineering – model-driven, distributed dev. FIRE 2008, Kolkata Operations Research Algorithms Optimization Game Theory © 2008 IBM Corporation Service Engineering Service Productivity Service Management Service Quality Service Supply Chains December 14, 2008
IBM Research Why do we care? Services dominate the world’s GDP… Japan United States China India 8 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research IRL Focus Areas Business Areas Service Delivery Infrastructure Services Application Services Emerging Solutions Contact Center Services Telecom Software Others (Banking, etc. ) Systems In the context of FIRE Technical Competencies Computer Science Math Science Service Science • Distributed Systems – systems mgmt. , middleware • • • Information Management – IE, Data Mining • Interaction Technologies – speech • Programming Technologies – parallel and hi-perf. prog. • Software Engineering – model-driven, distributed dev. 9 FIRE 2008, Kolkata Operations Research Algorithms Optimization Game Theory © 2008 IBM Corporation Service Engineering Service Productivity Service Management Service Quality Service Supply Chains December 14, 2008
IBM Research Information and Knowledge Management @ IRL § Speech recognition and synthesis − Hindi, Indian English & Hinglish § Translation: Hindi English and English Hindi § UIMA Annotators (rule based) − with IIT-Bombay § Linking structured and unstructured data § Learning attributes from noisy or incomplete information − For example, customer transaction logs § More… 10 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Challenges § Data − Noisy − Incomplete − Could be ill-structured § Problem − Defining the problem is often our job too § Focus on the application − What you build must work − Users must be satisfied − Firefighting 11 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Some Examples… 12 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Speech - Core Technologies § Desktop speech recognition (Hindi & Indian English) − More than 1100 speakers − More than 250 hours of broadband speech data − Vocabulary of 75000 words − Accuracy: 90 -95% § Telephony speech recognition − 500 speakers each for Hindi, English & Hinglish − A prototype for movie booking system in Indian English 13 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research SENSEI: Voice and Accent Training for Call-Centers § Challenges − Increase in the number of call centers in India − Agents need to speak in foreign accent − Very high attrition rates in call-centers − Hiring involves evaluation and training § Solution: Sensei, a tool that is used for: − Candidate Screening: evaluates a candidate’s pronunciation, grammar and fluency − On-board Training: evaluates correctness of sounds produced, syllable stress, speaking rate and fluency − Monitoring: analyzes pre-recorded calls to determine if the agent maintained the required quality of voice/accent § Application: Cost reduction by automation of Accent Training and Evaluation The Hindu, 30 Oct. 2006 14 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Machine Translation: Linguistic & Statistical 15 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research English-Hindi Machine Translation system 16 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Speech Recognition in IBM-IRL § Nitendra Rajput, “Statistical Language Modeling for Hindi Speech Recognition” National Symposium on Modelling and Shallow Parsing of Indian Languages, MSPIL 2006. § M Kumar, N Rajput, A Verma, “Hybrid Baseform Builder for Phonetic Languages, ” International Conference on Intelligent Sensor and Information Processing, Jan 2005, Chennai. § Mohit Kumar, Nitendra Rajput, Ashish Verma, “A large-vocabulary continuous speech recognition system for Hindi, ” IBM Journal of Research and Development, Vol. 48, No. 5/6, 2004. § Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma, Adapting Phonetic Decision Trees Between Languages for Continuous Speech Recognition, ” Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China, Oct 16 -20, 2000. § Niloy Mukherjee, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma, On Deriving a Phoneme Model for a New Language, ” Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China, Oct 16 -20, 2000. § Raghavendra Udupa U, Tanveer A Faruquie, Hemanta K Maji, "An algorithmic framework for the decoding problem in statistical machine translation, " COLING 2004. § R. Udupa and T. Faruquie, "An english-hindi statistical machine translation system, " in Proceedings of the 1 st IJCNLP, Sanya, Hainan Island, China. § Tanveer Faruquie, Nitendra Rajput, Vimal Raj, “Improving automatic call classification using machine translation, ” IEEE ICASSP 2007, Honolulu, Hawai, USA, Jan 2007. 17 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research EROCS: Entity Rec. Ognition in Context of Structured data Original text Complain: I have noticed in my statement that you have deducted Rs 750 from my a/c (#20310284) as account maintenance fee. Can you please explain why you have charged this money? Extracted entities and keywords/features I have noticed in my statement that you have deducted Rs 750 from my a/c (#20310284) as account maintenance fee. Can you please explain why you have charged this money? Cust. ID: 0205492 Saving. ID: 20310284 Unhappy, Simple Saving A/C Complaint metadata + Customer/account data brought together by automatically linking the complaint with the customer/account Exploit linked information analysis in core business § Up-sell/Cross-sell, customer segmentation, campaign assessment, churn analysis. 18 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research EROCS: Entity Rec. Ognition in Context of Structured data Linkage Discovery 19 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Call. Assist Transcript of Call Customer to Agent: Hi, I am John …. …. . status of a DVD player …. ……. Agent to Customer: …tell me the brand…? Customer to Agent: …… I bought a Sony…. Present relevant transaction data and follow-up question to the agent within seconds Consistent, high-quality customer experience Reduce agent training cost Reduces privacy concerns 20 FIRE 2008, Kolkata Customer Store. Id Product Brand John Smith S 8976 DVD Player LG John Parker S 8976 DVD Player Sony © 2008 IBM Corporation December 14, 2008
IBM Research That’s all for now… § IBM Research – India § Technical areas, applications, Services § Examples on: − Speech related works… − Translation… − Information Extraction… FIRE: It has been a great start! 21 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
IBM Research Thank you! 22 FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008
45ed69f872fbeec87002ba7b20aa7f87.ppt