79914aca155ebb02862145d4847fd2c6.ppt
- Количество слайдов: 17
Lecture 4 CS 4705 Sound Systems and Text-to. Speech CS 4705
Sound Systems of Language • Phonetics – The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced • Phonology – Rules that govern how phones are realized differently in different contexts • Technologies: – Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses – Text-to-Speech (TTS) systems take text as input and produce speech
Letters and Sounds • same spelling = different sounds o comb, tomb, bomb c court, center, cheese oo blood, food, good s reason, surreal, shy • same sound = different spellings [i] sea, see, scene, receive, thief [s] cereal, same, miss [u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie • combination of letters = single sound ch child, beach oo good, foot th that, bathe gh laugh • single letter = combination of sounds x exit, Texas u use, music • ‘silent’ letters k knife, know e moose, bone p psycho, pterodactyl gh through
Articulators teeth lips Alveolar ridge palate velum uvula pharyngeal larynx vocal folds: glottis trachea
Articulators in action (Sample from the Queen’s University / ATR Labs X-ray Film Database) “Why did Ken set the soggy net on top of his deck? ”
Vocal fold vibration [UCLA Phonetics Lab demo]
Places of articulation dental labial alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal http: //www. chass. utoronto. ca/~danhall/phonetics/sammy. html
Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION PLACE OF ARTICULATION bilabial stop p labio- inter- alveolar palatal velar glottal dental b t d k g q fric. f v th dh s z sh zh affric. ch jh nasal m n approx w l/r flap h ng y dx VOICING: voiceless voiced
American English vowel space HIGH iy uw eh ae uh ow ey FRONT ux oy ax ah ay aw ix ih ao aa LOW BACK
Acoustic landmarks [p] [t] [ih] [ix] [sh] [ax] [p] [ae] [t] [iy][n] [s] [ae] [l] [n] [s] [iy] “Patricia and Patsy and Sally” [p] [ix] [t] [ih]
Syllables • Syllabification important for – pronunciation: deny/denim – speaking rate calculation: syllables per second – word recognition in ASR • (onset) + nucleus + (coda): – – cat a at to • Lexical stress: primary, secondary, terciary – telephone
Phonological Rules • Not all instances of a given phone [x] sound/look alike • Phoneme /x/ may have many allophones • Phonological rules map phonemes in context to allophones, e. g. – simple rules: /{t, d}/ --> [ ]/ V’ _ V – FSA’s, FST’s – declarative constraints: t: V’ _ V
Allophones of /t/ • What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/: Figure 4. 8: Jurafsky & Martin (2000), page 104.
Application: Word Pronunciation for TTS • Pronouncing dictionaries (the: [‘dhax], [‘dhiy]) • Problems: – – – Homographs (bass/bass, wind/wind, desert/desert) Abbreviation (dr. , st. ) Numbers (2125551212) Acronyms (NAACL, IDIAP) Morphological variation (unrelentingly) Proper names and unknown words • rules + dictionaries/dictionaries + rules
• Hybrid model: – FSTs model individual word pronunciation in lexicon (e. g. reg-noun-stem entry c: k a: ae t: t) – FSAs model morphology (e. g. reg-noun-stem + s) – FSTs for pronunciation rules (e. g. s--> z) – special rules to model name and acronym pronunciation – default letter 2 sound rules for other words
Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words • Rhyming analogy: varoom/room, todo/dodo • Linguistic origin: Infiniti, vingt, Perez • Abbreviation expansion: – spacious living/dining rm w/frplc/dining room with fireplace – pls?
Summary • Phones realize phonemes in different contexts – Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people • Versatile FSTs can model phonological as well as morphological and spelling systems • Many creative approaches toward pronunciation modeling for TTS • Next time: Read Ch 5