| |
| |
Preface | |
| |
| |
| |
About Speech | |
| |
| |
| |
Introduction | |
| |
| |
| |
How Speech Is Produced | |
| |
| |
| |
The Vocal Tract | |
| |
| |
| |
Articulatory Phonetics | |
| |
| |
| |
Phonetic Alphabets | |
| |
| |
| |
Prosody and Suprasegmentals | |
| |
| |
| |
Syllables | |
| |
| |
| |
Dialects | |
| |
| |
| |
Languages (Other Than English) | |
| |
| |
| |
Acoustic Phonetics | |
| |
| |
| |
Phonemics | |
| |
| |
| |
Articulatory Processes | |
| |
| |
References | |
| |
| |
| |
Representing Speech in the Computer | |
| |
| |
| |
Introduction | |
| |
| |
| |
Microphones | |
| |
| |
| |
Sampling | |
| |
| |
| |
Sampling Rate | |
| |
| |
| |
Quantization | |
| |
| |
| |
Speech Digitization | |
| |
| |
| |
Wave Form Coders | |
| |
| |
| |
Voice Coders (Vocoders) | |
| |
| |
| |
The Frequency Domain | |
| |
| |
| |
The Game of Jumble: Spectrum-Cepstrum, Frequency-Quefrency, Filtering-Liftering | |
| |
| |
| |
Spectrograms: A Hybrid Representation of Speech | |
| |
| |
References | |
| |
| |
| |
Speech Recognition | |
| |
| |
| |
Introduction | |
| |
| |
| |
Speech Recognition: What It Is; What It Isn't | |
| |
| |
| |
Why Is Speech Recognition Easy for Us and Difficult for Our Computers? | |
| |
| |
| |
A Brief History of Speech Recognition | |
| |
| |
| |
The Era of ARPA | |
| |
| |
| |
After ARPA | |
| |
| |
| |
Three Dimensions of Speech Recognition | |
| |
| |
| |
Continuous Versus Noncontinuous | |
| |
| |
| |
Speaker-Independent Versus Speaker-Dependent | |
| |
| |
| |
Vocabulary Size | |
| |
| |
| |
Tradeoffs and Interactions | |
| |
| |
| |
Units of Speech Recognition | |
| |
| |
| |
Words and Phrases | |
| |
| |
| |
Syllables | |
| |
| |
| |
Phonemes | |
| |
| |
| |
Diphones and Triphones | |
| |
| |
| |
Representing the Units | |
| |
| |
| |
Acoustic Features | |
| |
| |
| |
Comparing the Units | |
| |
| |
| |
Dynamic Time Warping (DTW) | |
| |
| |
| |
Hidden Markov Models (HMMs) | |
| |
| |
| |
Future Challenges I | |
| |
| |
| |
Artificial Neural Networks (ANNs) | |
| |
| |
| |
Errors | |
| |
| |
| |
Types of Errors | |
| |
| |
| |
Error Tolerances | |
| |
| |
| |
Performance Evaluation of Speech Recognizers | |
| |
| |
| |
Error Rates | |
| |
| |
| |
Other Factors | |
| |
| |
| |
Error Reduction | |
| |
| |
| |
Environmental Effects | |
| |
| |
| |
Human Factors | |
| |
| |
| |
Subsetting | |
| |
| |
| |
Vocabulary Selection | |
| |
| |
| |
Error Detection and Correction | |
| |
| |
| |
Feedback Systems | |
| |
| |
| |
Higher Levels of Linguistic Knowledge | |
| |
| |
| |
Automatic Error Correction | |
| |
| |
| |
Future Challenges II | |
| |
| |
References | |
| |
| |
| |
Speech Synthesis | |
| |
| |
| |
Introduction and History | |
| |
| |
| |
Parametric Coding (Electronic Synthesis) | |
| |
| |
| |
Parameters of Parametric Speech Synthesis | |
| |
| |
| |
Input Units of Parametric Speech Synthesis | |
| |
| |
| |
Concatenative Synthesis | |
| |
| |
| |
Allophone Concatenation | |
| |
| |
| |
Diphone Concatenation | |
| |
| |
| |
Demisyllable Concatenation | |
| |
| |
| |
Waveform of Concatenative Units | |
| |
| |
| |
Text-to-Speech Processing | |
| |
| |
| |
Rules and Exceptions | |
| |
| |
| |
Morphological Analysis | |
| |
| |
| |
Articulation Effects | |
| |
| |
| |
Prosody | |
| |
| |
| |
Special Problems | |
| |
| |
| |
Concept-to-Speech | |
| |
| |
| |
Languages of the World | |
| |
| |
| |
Dialects | |
| |
| |
| |
Performance Evaluation | |
| |
| |
| |
Intelligibility | |
| |
| |
| |
Comprehensibility | |
| |
| |
| |
Pleasantness/Naturalness | |
| |
| |
| |
Future Challenges | |
| |
| |
References | |
| |
| |
| |
Speaker Recognition, Language Identification, and Lip Synchronization | |
| |
| |
| |
Speaker Recognition | |
| |
| |
| |
Speaker Recognition Versus Speech Recognition | |
| |
| |
| |
Types of Speaker Recognition | |
| |
| |
| |
Text-Dependent, Text-Independent, and Text-Prompted Speaker Recognition | |
| |
| |
| |
"Voiceprints" | |
| |
| |
| |
Methods of Speaker Recognition | |
| |
| |
| |
Noise | |
| |
| |
| |
Performance Evaluation of Speaker Recognition Systems | |
| |
| |
| |
Co-channel Speaker Separation | |
| |
| |
| |
Language Identification | |
| |
| |
| |
Four Computational Approaches to Language Identification | |
| |
| |
| |
Performance Evaluation of Language Identification Systems | |
| |
| |
| |
Lip Synchronization | |
| |
| |
| |
Visemes | |
| |
| |
| |
Mapping Directly From the Speech Signal to Mouth Shapes | |
| |
| |
| |
Future Challenges | |
| |
| |
References | |
| |
| |
| |
Applications in Speech Recognition | |
| |
| |
| |
Criteria for a Viable Speech Recognition Application | |
| |
| |
| |
Hands Busy, Eyes Busy | |
| |
| |
| |
Remoteness | |
| |
| |
| |
Miniaturization | |
| |
| |
| |
2001 Won't Be 2001 | |
| |
| |
| |
The Role of Human Factors in Speech Recognition Applications | |
| |
| |
| |
Application Areas | |
| |
| |
| |
Assistive Technology | |
| |
| |
| |
Telecommunications | |
| |
| |
| |
Command and Control | |
| |
| |
| |
Data Entry and Retrieval | |
| |
| |
| |
Education | |
| |
| |
References | |
| |
| |
| |
Applications in Speech Synthesis | |
| |
| |
| |
"At the Tone, the Time Will Be..." | |
| |
| |
| |
When To Use Text-to-Speech; When To Use Digitally Recorded Speech | |
| |
| |
| |
Interactive Voice Response Systems (IVRs) | |
| |
| |
| |
Human Factors Revisited | |
| |
| |
| |
Application Areas | |
| |
| |
| |
Aid for Persons With Disabilities | |
| |
| |
| |
Education | |
| |
| |
| |
Emergency Scenarios | |
| |
| |
| |
En Masse Advisories | |
| |
| |
| |
Information Retrieval | |
| |
| |
| |
Information Reporting | |
| |
| |
| |
Electronic Mail and Fax Readers | |
| |
| |
| |
In the Dark | |
| |
| |
| |
Toys and Games | |
| |
| |
| |
Transportation | |
| |
| |
| |
Government Services | |
| |
| |
| |
Disguise | |
| |
| |
References | |
| |
| |
| |
Applications in Speaker Recognition, Language Identification, and Lip Synchronization | |
| |
| |
| |
Applications in Speaker Recognition | |
| |
| |
| |
Access | |
| |
| |
| |
Authentication | |
| |
| |
| |
Monitoring | |
| |
| |
| |
Fraud Prevention | |
| |
| |
| |
Forensics | |
| |
| |
| |
Personal Services | |
| |
| |
| |
Applications in Language Identification | |
| |
| |
| |
Telecommunications | |
| |
| |
| |
Communications Monitoring | |
| |
| |
| |
Public Information Systems | |
| |
| |
| |
Applications in Automatic Lip Synching | |
| |
| |
| |
Animation | |
| |
| |
References | |
| |
| |
Glossary | |
| |
| |
About the Author | |
| |
| |
Index | |