Skip to content

Speech Synthesis and Recognition

Best in textbook rentals since 2012!

ISBN-10: 0748408576

ISBN-13: 9780748408573

Edition: 2nd 2001 (Revised)

Authors: Wendy Holmes, Wendy Holmes

List price: $80.95
Blue ribbon 30 day, 100% satisfaction guarantee!
Rent eBooks
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

With the growing impact of information technology on daily life, speech is becoming increasingly important for providing a natural means of communication between humans and machines. This extensively reworked and updated new edition of Speech Synthesis and Recognition is an easy-to-read introduction to current speech technology.Aimed at advanced undergraduates and graduates in electronic engineering, computer science and information technology, the book is also relevant to professional engineers who need to understand enough about speech technology to be able to apply it successfully and to work effectively with speech experts. No advanced mathematical ability is required and no specialist…    
Customers also bought

Book details

List price: $80.95
Edition: 2nd
Copyright year: 2001
Publisher: CRC Press LLC
Publication date: 12/6/2001
Binding: Paperback
Pages: 316
Size: 6.00" wide x 9.00" long x 0.75" tall
Weight: 1.100

Preface to the First Edition
Preface to the Second Edition
List of Abbreviations
Human Speech Communication
Value of speech for human-machine communication
Ideas and language
Relationship between written and spoken language
Phonetics and phonology
The acoustic signal
Phonemes, phones and allophones
Vowels, consonants and syllables
Phonemes and spelling
Prosodic features
Language, accent and dialect
Supplementing the acoustic signal
The complexity of speech processing
Chapter 1 summary
Chapter 1 exercises
Mechanisms and Models of Human Speech Production
Introduction
Sound sources
The resonant system
Interaction of laryngeal and vocal tract functions
Radiation
Waveforms and spectrograms
Speech production models
Excitation models
Vocal tract models
Chapter 2 summary
Chapter 2 exercises
Mechanisms and Models of the Human Auditory System
Introduction
Physiology of the outer and middle ears
Structure of the cochlea
Neural response
Psychophysical measurements
Analysis of simple and complex signals
Models of the auditory system
Mechanical filtering
Models of neural transduction
Higher-level neural processing
Chapter 3 summary
Chapter 3 exercises
Digital Coding of Speech
Introduction
Simple waveform coders
Pulse code modulation
Deltamodulation
Analysis/synthesis systems (vocoders)
Channel vocoders
Sinusoidal coders
LPC vocoders
Formant vocoders
Efficient parameter coding
Vocoders based on segmental/phonetic structure
Intermediate systems
Sub-band coding
Linear prediction with simple coding of the residual
Adaptive predictive coding
Multipulse LPC
Code-excited linear prediction
Evaluating speech coding algorithms
Subjective speech intelligibility measures
Subjective speech quality measures
Objective speech quality measures
Choosing a coder
Chapter 4 summary
Chapter 4 exercises
Message Synthesis from Stored Human Speech Components
Introduction
Concatenation of whole words
Simple waveform concatenation
Concatenation of vocoded words
Limitations of concatenating word-size units
Concatenation of sub-word units: general principles
Choice of sub-word unit
Recording and selecting data for the units
Varying durations of concatenative units
Synthesis by concatenating vocoded sub-word units
Synthesis by concatenating waveform segments
Pitch modification
Timing modification
Performance of waveform concatenation
Variants of concatenative waveform synthesis
Hardware requirements
Chapter 5 summary
Chapter 5 exercises
Phonetic synthesis by rule
Introduction
Acoustic-phonetic rules
Rules for formant synthesizers
Table-driven phonetic rules
Simple transition calculation
Overlapping transitions
Using the tables to generate utterances
Optimizing phonetic rules
Automatic adjustment of phonetic rules
Rules for different speaker types
Incorporating intensity rules
Current capabilities of phonetic synthesis by rule
Chapter 6 summary
Chapter 6 exercises
Speech Synthesis from Textual or Conceptual Input
Introduction
Emulating the human speaking process
Converting from text to speech
TTS system architecture
Overview of tasks required for TTS conversion
Text analysis
Text pre-processing
Morphological analysis
Phonetic transcription
Syntactic analysis and prosodic phrasing
Assignment of lexical stress and pattern of word accents
Prosody generation
Timing pattern
Fundamental frequency contour
Implementation issues
Current TTS synthesis capabilities
Speech synthesis from concept
Chapter 7 summary
Chapter 7 exercises
Introduction to automatic speech recognition: template matching
Introduction
General principles of pattern matching
Distance metrics
Filter-bank analysis
Level normalization
End-point detection for isolated words
Allowing for timescale variations
Dynamic programming for time alignment
Refinements to isolated-word DP matching
Score pruning
Allowing for end-point errors
Dynamic programming for connected words
Continuous speech recognition
Syntactic constraints
Training a whole-word recognizer
Chapter 8 summary
Chapter 8 exercises
Introduction to stochastic modelling
Feature variability in pattern matching
Introduction to hidden Markov models
Probability calculations in hidden Markov models
The Viterbi algorithm
Parameter estimation for hidden Markov models
Forward and backward probabilities
Parameter re-estimation with forward and backward probabilities
Viterbi training
Vector quantization
Multi-variate continuous distributions
Use of normal distributions with HMMs
Probability calculations
Estimating the parameters of a normal distribution
Baum-Welch re-estimation
Viterbi training
Model initialization
Gaussian mixtures
Calculating emission probabilities
Baum-Welch re-estimation
Re-estimation using the most likely state sequence
Initialization of Gaussian mixture distributions
Tied mixture distributions
Extension of stochastic models to word sequences
Implementing probability calculations
Using the Viterbi algorithm with probabilities in logarithmic form
Adding probabilities when they are in logarithmic form
Relationship between DTW and a simple HMM
State durational characteristics of HMMs
Chapter 9 summary
Chapter 9 exercises
Introduction to front-end analysis for automatic speech recognition
Introduction
Pre-emphasis
Frames and windowing
Filter banks, Fourier analysis and the mel scale
Cepstral analysis
Analysis based on linear prediction
Dynamic features
Capturing the perceptually relevant information
General feature transformations
Variable-frame-rate analysis
Chapter 10 summary
Chapter 10 exercises
Practical techniques for improving speech recognition performance
Introduction
Robustness to environment and channel effects
Feature-based techniques
Model-based techniques
Dealing with unknown or unpredictable noise corruption
Speaker-independent recognition
Speaker normalization
Model adaptation
Bayesian methods for training and adaptation of HMMs
Adaptation methods based on linear transforms
Discriminative training methods
Maximum mutual information training
Training criteria based on reducing recognition errors
Robustness of recognizers to vocabulary variation
Chapter 11 summary
Chapter 11 exercises
Automatic speech recognition for large vocabularies
Introduction
Historical perspective
Speech transcription and speech understanding
Speech transcription
Challenges posed by large vocabularies
Acoustic modelling
Context-dependent phone modelling
Training issues for context-dependent models
Parameter tying
Training procedure
Methods for clustering model parameters
Constructing phonetic decision trees
Extensions beyond triphone modelling
Language modelling
N-grams
Perplexity and evaluating language models
Data sparsity in language modelling
Discounting
Backing off in language modelling
Interpolation of language models
Choice of more general distribution for smoothing
Improving on simple N-grams
Decoding
Efficient one-pass Viterbi decoding for large vocabularies
Multiple-pass Viterbi decoding
Depth-first decoding
Evaluating LVCSR performance
Measuring errors
Controlling word insertion errors
Performance evaluations
Speech understanding
Measuring and evaluating speech understanding performance
Chapter 12 summary
Chapter 12 exercises
Neural networks for speech recognition
Introduction
The human brain
Connectionist models
Properties of ANNs
ANNs for speech recognition
Hybrid HMM/ANN methods
Chapter 13 summary
Chapter 13 exercises
Recognition of speaker characteristics
Characteristics of speakers
Verification versus identification
Assessing performance
Measures of verification performance
Speaker recognition
Text dependence
Methods for text-dependent/text-prompted speaker recognition
Methods for text-independent speaker recognition
Acoustic features for speaker recognition
Evaluations of speaker recognition performance
Language recognition
Techniques for language recognition
Acoustic features for language recognition
Chapter 14 summary
Chapter 14 exercises
Applications and performance of current technology
Introduction
Why use speech technology?
Speech synthesis technology
Examples of speech synthesis applications
Aids for the disabled
Spoken warning signals, instructions and user feedback
Education, toys and games
Telecommunications
Speech recognition technology
Characterizing speech recognizers and recognition tasks
Typical recognition performance for different tasks
Achieving success with ASR in an application
Examples of ASR applications
Command and control
Education, toys and games
Dictation
Data entry and retrieval
Telecommunications
Applications of speaker and language recognition
The future of speech technology applications
Chapter 15 summary
Chapter 15 exercises
Future research directions in speech synthesis and recognition
Introduction
Speech synthesis
Speech sound generation
Prosody generation and higher-level linguistic processing
Automatic speech recognition
Advantages of statistical pattern-matching methods
Limitations of HMMs for speech recognition
Developing improved recognition models
Relationship between synthesis and recognition
Automatic speech understanding
Chapter 16 summary
Chapter 16 exercises
Further Reading
Books
Journals
Conferences and workshops
The Internet
Reading for individual chapters
References
Solutions to Exercises
Glossary
Index