601b5c3eb49e9cff39f36cbbdf255a98.ppt
- Количество слайдов: 36
1
Wideband Codecs for Enhanced Voice Quality Ensuring optimum wideband speech quality in converged Vo. IP/mobile applications/services Claude Gravel VP Engineering Voice. Age Corporation
Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 3
Voice. Age Corporation – who are we? Business Low bit rate audio compression technologies research, IPR licensing and optimized implementations development Headquarters Montreal, Canada Technologies AMR : 3 GPP, Cable. Labs narrowband voice codec AMR-WB : 3 GPP, ITU-T, Cable. Labs wideband voice codec VMR-WB : 3 GPP 2, Cable. Labs wideband voice codec AMR-WB+ : 3 GPP, DVB-H audio codec Achievements Won every international audio compression standard for which Voice. Age competed in the last 10 years at 3 GPP, 3 GPP 2, ITU, ETSI, TIA, Cable. Labs Implementations World Class optimized implementations and proprietary solutions on multiple O/S and processors/platforms (including TI- & ARM-based systems) 4 Deployment More than 2 B mobile phones and over 500 M PCs currently use Voice. Age’s technologies
International Standards Using ACELP® 5
Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 6
Speech Synthesis Model Used in CELP/ACELP® Speech Coding 1 = air from lungs 3 2 = vocal chords (periodicity) 3 = vocal tract articulators (including jaw, lips, tongue, velum) 2 1 c(n) Innovative excitation 1 7 Long-term Prediction 2 v(n) Short-term Prediction 3 ^ s(n) S ynthesized speech
Speech Signal – Basically, same synthesis model for everyone – So, speech has a “universal” structure or signature 1. 25 sec v oi ce a ge 180 ms 45 ms Voiced fricative 45 ms • quasi periodic + noise Purely Voiced • lower energy • quasi periodic • high energy • more low frequency energy • strongly correlated 8 70 ms 45 ms Unvoiced • non periodic • low energy • uncorrelated • more high frequency energy Transient • variable energy • fast spectral evolution
What is Wideband Communication? • Delivers double the audio signal bandwidth • Enables digital end-to-end packet-based services to deliver much better speech communication quality than traditional PSTN circuitswitched telephony • Vo. IP quality differentiator 9 Signal Power • Substantially increases captured speech information Frequency Range An Emerging Opportunity to Deliver Vastly Improved Speech Quality
Signal Bandwidth Wideband Speech: Below 200 Hz: increased naturalness, presence, and comfort. Above 3400 Hz: increased intelligibility and fricative differentiation Voiced segment Unvoiced segment 10
Typical Speech Signal Acoustics Improved voice quality & intelligibility (e. g. , s & f differentiation) Improved speech naturalness, presence and comfort “Everyone looked extremely confused about the news” 11 Wideband telephony covers much more speech signal information
Why Wideband Speech Now? • Improved intelligibility, naturalness and presence – Reduces listener fatigue – Improved hands-free/speakerphone sound quality – Improves speaker and speech recognition • High-quality low-bit-rate wideband codecs – G. 722. 2/AMR-WB at ~7– 24 kbps – No need to increase network capacity to deliver better quality sound • Wideband capable devices are available now – Wideband audio microphones and device acoustics more affordable • Rising user awareness of enhanced sound quality – Wideband teleconferencing – Wideband enterprise/ASP IP telephony – Wireless/Vo. IP multimedia services 12 Speech Coding Technology, Network/Device Capabilities and Market Demand are Converging Towards Pervasive Wideband Communications
Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 13
Voice Processing -- Key for Speech Quality Control & Management Voice Processing (Digital Communications Domain) Echo Canceller PCM I/F Speech Codec Noise Suppressor VAD CNG DTX PLC Variable. Multi Rate Switching Voice MIBs System MIBs Jitter Buffer Call Processing SNMP Signaling Protocol Packet De-Packet [RTP] UDP Analog Domain TCP IP MAC Layer Physical Layer 14 Codec choice impacts network cost and interoperability + A major contributor to the listener quality experience
Speech Coding Attributes As required by specific applications Ø Bit rate • As low as possible Ø Delay • As little as possible Ø Quality • As high as possible Difficult to attain all of these often divergent objectives at the same time Ø Complexity • As algorithmically simple as possible to constrain platform processing and memory requirements and reduce battery consumption in mobile devices Ø Robustness • Effective operation under background noise and channel impairment conditions Ø Standards compliance • Open, tested and interoperable solutions 15
Vo. IP Speech Quality Challenges • Missing packets • Packet delay • Due to network congestion or transmission errors • Real-time communication can’t wait too long for packets or retransmission • Transcoding • Needed when end-devices and network equipment support incompatible speech/audio coding technologies – traversing diverse networks such as across fixed/mobile environments • Increases system costs, adds delays and introduces audio quality impairments • Background noise 16 • Due to network congestion or transmission errors • Wireless networks are more prone to losing packets • Reduces intelligibility and comfort level of conversations • Ambient office/workplace/household noise • Street/car noise in mobile applications
Speech Processing Techniques for Improving Vo. IP Voice Quality • Missing packet impairments can be mitigated through… – Sending additional data to help preserve information • FEC/Repetition of frames • Works well for sporadic packet losses but not so well for bursts of lost packets • Increases transmitted bit rate to send redundant information frames f(n-2) f(n-1) f(n+1) f(n+2) f(n+3) f(n+4) p(n-1) p(n+1) packets p(n+2) p(n+3) time 17 p(n+4) A simple forward error correction scheme based on repeating the previous frame in each packet
Speech Processing Techniques for Improving Vo. IP Voice Quality • Missing packet impairments can be mitigated through …(cont’d) – Packet loss concealment (PLC) • Techniques used by the decoder to estimate parameter values for missing frames based on the characteristics of preceding frames • Can be improved by classifying frames and repeating or adjusting parameters based on heuristics driven by the classes of the frames preceding the missing frame(s) – Extrapolate missing frame parameters as a function of the expected frame class (e. g. , voiced/unvoiced, stops, nasals, …) – E. g. , for voiced frames, repeat the pitch parameters – Objective: limit abrupt changes in energy that can cause annoying clicks • Late packet arrival processing can also be leveraged to benefit from some of the information in a packet that arrives too late – Can benefit PLC methods as applied to subsequent delayed or lost packets 18
Speech Processing Techniques for Improving Vo. IP Voice Quality • Missing packet impairments can be mitigated through…(cont’d) – Frame Interleaving • Each packet contains non-contiguous frames to lower the overall impact on the reconstructed speech signal of a lost packet • Introduces delays which may make it unsuitable for real-time speech communication • Works well for audio streaming frames f 0 f 1 f 3 packet 1 f 2 f 3 f 4 19 f 6 f 1 f 4 f 7 f 8 I. e. , loss of packet 2 leads to non-contiguous missing frames which are easier to compensate for in the decoder through PLC f 6 packet 2 time f 5 f 7 f 2 f 5 f 8 packet 3
Speech Processing Techniques for Improving Vo. IP Voice Quality • Network congestion, which can lead to delayed or dropped packets, can be alleviated by lowering the average communication bit rate … – VAD/DTX/CNG • Using Voice Activity Detection (VAD), Discontinuous Transmission (DTX) and Comfort Noise Generation (CNG) capabilities to limit consumed bandwidth during periods of silence during a conversation – Adaptive codecs – Source controlled » Optimal selection of the bit rate and coding scheme based on active speech – Network controlled » Adapt the bit rate to make best use of varying available bandwidth 20
Transcoder-Free Network Design for Fixed/Mobile Convergence 21
Improving Vo. IP Speech Quality Mitigating the main issues impacting Vo. IP speech quality • Missing packets • Delayed packets • Transcoding • Background noise 22 • Proper network engineering with integrated Qo. S mechanisms (in closed systems) • Choosing the best speech coding/processing technology (adaptive, enhanced voice quality, robust and extensible) • Improved packet loss concealment • Late packet arrival processing • Time scale modification • Adaptive jitter buffering • Transcoder-free network design to avoid increased system costs, delays and audio quality impairments • Leverage seamlessly interoperable standardsproven codecs • Choose codecs that can readily accommodate background noise suppression algorithms • Proven noise suppression in standards selection & characterization testing results
Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 23
Why AMR-WB/G. 722. 2 • AMR-WB/G. 722. 2 is the right wideband codec for network convergence – Very robust • Supports dynamic adaptation to mobile network conditions • Includes built-in efficient packet loss concealment • Performs well even with high bit error rates – Multi-rate codec delivers very good quality even at bit rates comparable to those of narrowband (~12 kbps) • No need for potentially costly and time-consuming network capacity upgrades – – 24 Supports VAD/DTX/CNG for enhanced efficiency Low-complexity encoder and decoder Standardized in 3 GPP, ITU-T & Cable. Labs Packet. Cable 2. 0 Can interoperate transcoder free across mobile/IP networks • Eliminates latency, impairments, costs
Subjective NB-WB Quality Comparison NB-WB Voice Quality as a Function of Bit Rate Ericsson Review, No. 3, 2006 25 AMR-WB/G. 722. 2 Greatly Improves Perceived Voice Quality
AMR-WB Subjective Testing Results 5. 0 4. 5 Clean Condition Test (English Language) AMR-WB/G. 722. 2 Characterization Test G. 722 @ 64 kbps 4. 0 G. 722 @ 48 kbps MOS 3. 5 3. 0 G. 722. 2 @ 8. 85 kbps 2. 5 G. 722. 2 @ 12. 65 kbps G. 722. 2 @ 18. 25 kbps 2. 0 G. 722. 2 @ 23. 05 kbps 1. 5 1. 0 26 No Tandem -26 d. Bov Self-Tandem -26 d. Bov AMR-WB/G. 722. 2 Delivers Excellent Wideband Speech Quality Even at Low Bit Rates (e. g. , MOS at 8. 85 kbps exceeds G. 722 at 48 kbps)
AMR-WB CPU efficiency • AMR-WB/G. 722. 2 performance on widely deployed communications device processors show the codec’s relatively low complexity Mode Bit rate (kbps) 0 6. 6 1 8. 85 2 12. 65 3 14. 25 4 15. 85 5 18. 25 6 19. 85 7 23. 05 8 23. 85 39 11 34 9 39 8 41 8 42 8 43 9 19. 67 4. 88 21. 24 4. 35 24. 64 4. 20 27. 02 4. 30 27. 23 4. 39 28. 20 4. 55 29. 33 4. 61 29. 13 4. 83 26. 64 5. 21 22. 15 5. 94 23. 75 5. 00 26. 98 4. 81 29. 36 4. 85 29. 58 4. 88 30. 68 4. 95 32. 10 4. 98 31. 76 5. 05 29. 97 5. 40 ARM 9 E (MHz) Encoder Decoder TI C 55 x (MIPS) Encoder Decoder TI C 64 x (MIPS) Encoder Decoder 27 Supported by most commonly used communications processors
The Standard Solution Advantage • Open, collaborative and competitive process • Requirements specifically address target applications • Published algorithms and source code – Permits wider and more effective scrutiny – Clearer intellectual property ownership • Rigorous comparative testing under diverse conditions – – 28 Background noise types and levels Spoken languages Speaker types Various network impairments Interoperable, Open and Fully Tested Ensures that the best technologies are chosen
Interoperability between Fixed/Mobile Network Services Transcoder-free Interoperability in Fixed/Mobile Convergence • • 3 GPP – Wi-Fi/Wi. MAX – ITU-T interoperability AMR-WB / G. 722. 2 end-to-end across networks No need for transcoding at media gateways Improves on service quality end to end • • 29 Reduces network delays and equipment complexity Lowers network costs (equipment costs and licensing)
Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 30
Growing Market Momentum Chipset / Silicon Vendors • Veri. Silicon • Texas Inst. • Freescale • Renesas • ST Micro • … 31 Test Set Vendors Terminal Device Manufacturers • Nokia • Sony-Ericsson • Motorola • Samsung • Panasonic • NEC • Counter. Path • Polycom • Mobiles, Softphones, Vo. IP terminals, Conferencing terminals… Network Equipment Vendors • Nokia • Ericsson • Audio. Codes • Gateways, ATA/MTA, Softswitches, … • Voice. Age • Others… Codec Developers • T-Mobile Trial • Wireless Operators • Cablecos • Vo. IP ASPs • … • Ixia • Tektronix • GL Comms • Net. Hawk • Many others Network Operators Service Providers Accelerating Adoption of AMR-WB/G. 722. 2 leads to Happy Consumers and a Wealthy Telecom Service Value Chain
Successful Ericsson/T-Mobile Trial > 90% +’ve 35% Extremely Good 36% 11% Good Quite Good 4% 2% 3% Nice to Have Ericsson Review, No. 3, 2006 • 150 consumers participated for 4 weeks in Germany, April/May 2006 – confirmed earlier lab MUSHRA tests Quite Bad Extremely Bad – More than 90% perceived better voice quality & clarity – Felt a greater sense of privacy, discretion & comfort due to improved voice quality & intelligibility – Could more easily place & complete calls in environments with high background noise – Business users highly valued voice quality for improving communication, reducing expenses & giving a positive impression • Ericsson anticipates positive outcomes for operators 32 – More mobile traffic, i. e. , more calls for longer durations – Can offer enhanced services for conferencing, personalized ringback signals, automatic voice recognition, voice mail … – Can cut costs, e. g. , by reducing cost of acquiring new subscribers, reducing helpdesk costs
Wideband Speech Communications An Evolutionary Migration • Wideband speech coding is consistent with narrowband codecs – Bit rates comparable to narrowband codecs – Similar robustness techniques to handle packet losses and delays can be used – Low-complexity implementations available for all popular communications processor types – While vastly improving perceived voice quality • Strategically deploying wideband capability in terminal and network equipment enables evolution to wideband speech communications 33 – Compatible with existing network infrastructure – No forklift replacements needed … a graceful evolutionary migration, not a disruptive revolution
Conclusions Speech communications are rapidly moving to end-to-end digital packets over all networks – wired and wireless – towards fixed/mobile convergence • Provides an opportunity to vastly improve communications quality through widescale deployment of wideband speech – Efficient codecs, devices with wideband acoustics and processing are already available • Many benefits but also some challenges to consistently delivering high-quality voice end to end in real-world deployments • Enhanced speech coding and processing techniques have been developed to help overcome these challenges • The selection of standards-based advanced wideband speech coding technologies such as AMR-WB/G. 722. 2 is one of the fundamental steps towards improving voice quality between diverse devices and converging networks • Adoption of AMR-WB/G. 722. 2 in the telecom service delivery value chain is growing – wideband speech quality has been shown to be highly preferred by consumers 34 Are your devices, systems, solutions, services ready?
Hear the rich sound of wideband Wideband Demo 35
Wideband Codecs for Enhanced Voice Quality Thank you! claude. gravel@voiceage. com www. voiceage. com Come and talk to Voice. Age at Booth #107 36
601b5c3eb49e9cff39f36cbbdf255a98.ppt