| |
| |
| |
Introduction | |
| |
| |
Motivations | |
| |
| |
Spoken Language System Architecture | |
| |
| |
Book Organization | |
| |
| |
Target Audiences | |
| |
| |
I | |
| |
| |
| |
Spoken language Structure | |
| |
| |
Sound and Human Speech Systems | |
| |
| |
Phonetics and Phonology | |
| |
| |
Syllables and Words | |
| |
| |
Syntax and Semantics | |
| |
| |
| |
Probability, Statistics, and Information Theory | |
| |
| |
Probability Theory | |
| |
| |
Estimation Theory | |
| |
| |
Significance Testing | |
| |
| |
Information Theory | |
| |
| |
| |
Pattern Recognition | |
| |
| |
Bayes' Decision Theory | |
| |
| |
How to Construct Classifiers | |
| |
| |
Discriminative Training | |
| |
| |
Unsupervised Estimation Methods | |
| |
| |
Classification and Regression Trees | |
| |
| |
| |
Speech Processing | |
| |
| |
| |
Digital Signal Processing | |
| |
| |
Digital Signals and Systems | |
| |
| |
Continuous-Frequency Transforms | |
| |
| |
Discrete-Frequency Transforms | |
| |
| |
Digital Filters and Windows | |
| |
| |
Digital Processing of Analog Signals | |
| |
| |
Multirate Signal Processing | |
| |
| |
Filterbanks | |
| |
| |
Stochastic Processes | |
| |
| |
| |
Speech Signal Representations | |
| |
| |
Short-Time Fourier Analysis | |
| |
| |
Acoustical Model of Speech Production | |
| |
| |
Linear Predictive Coding | |
| |
| |
Cepstral Processing | |
| |
| |
Perceptually Motivated Representations | |
| |
| |
Formant Frequencies | |
| |
| |
The Role of Pitch | |
| |
| |
| |
Speech Coding | |
| |
| |
Speech Coders Attributes | |
| |
| |
Scalar Waveform Coders | |
| |
| |
Scalar Frequency Domain Coders | |
| |
| |
Code Excited Linear Prediction (CELP) | |
| |
| |
Low-Brit Speech Coders | |
| |
| |
III | |
| |
| |
| |
Hidden Markov Models | |
| |
| |
The Markov Chain | |
| |
| |
Definition of the Hidden Markov Model | |
| |
| |
Continuous and Semicontinuous HMMs | |
| |
| |
Practical Issues in Using HMMs | |
| |
| |
HMM Limitations | |
| |
| |
| |
Acoustic Modeling | |
| |
| |
Variability in the Speech Signal | |
| |
| |
How to Measure Speech Recognition Errors | |
| |
| |
Signal Processing-Extracting Features | |
| |
| |
Phonectic Modeling-Selecting Appropriate Units | |
| |
| |
Acoustic Modeling-Scoring Acoustic Features | |
| |
| |
Adaptive Techniques-Minimizing Mismatches | |
| |
| |
Confidence Measures: Measuring the Reliability | |
| |
| |
Other Techniques | |
| |
| |
Case Study: Whisper | |
| |
| |
| |
Environmental Robustness | |
| |
| |
The Acoustical Environment | |
| |
| |
Acoustical Transducers | |
| |
| |
Adaptive Echo Cancellation (AEC) | |
| |
| |
Multimicrophone Speech Enhancement | |
| |
| |
Environment Compensation Preprocessing | |
| |
| |
Environment Model Adaptation | |
| |
| |
Modeling Nonstationary Noise | |
| |
| |
| |
Language Modeling | |
| |
| |
Formal Language Theory | |
| |
| |
Stochastic Language Models | |
| |
| |
Complexity Measure of Language Models | |
| |
| |
N-Gram Smoothing | |
| |
| |
Adaptive Language Models | |
| |
| |
Practical Issues | |
| |
| |
| |
Basic Search Algorithms | |
| |
| |
Basic Search Algorithms | |
| |
| |
Search Algorithms for Speech Recognition | |
| |
| |
Language Model States | |
| |
| |
Time-Synchronous Viterbi Beam Search | |
| |
| |
Stack Decoding (A Search) | |
| |
| |
| |
Large-Vocabulary Search Algorithms | |
| |
| |
Efficient Manipulation of a Tree Lexicon | |
| |
| |
Other Efficient Search Techniques | |
| |
| |
N-Best and Multipass Search Strategies | |
| |
| |
Search-Algorithm Evaluation | |
| |
| |
Case Study-Microsoft Whisper | |
| |
| |
IV | |
| |
| |
| |
Text and Phonetic Analysis | |
| |
| |
Modules and Data Flow | |
| |
| |
Lexicon | |
| |
| |
Document Structured Detection | |
| |
| |
Text Normalization | |
| |
| |
Linguistic Analysis | |
| |
| |
Homograph Disambiguation | |
| |
| |
Morphological Analysis | |
| |
| |
Letter-to-Sound | |