Speech Synthesis and Recognition

Name: Speech Synthesis and Recognition
Availability: InStock
ISBN: 9780748408573

ISBN-10: 0748408576

ISBN-13: 9780748408573

Edition: 2nd 2001 (Revised)

Authors: Wendy Holmes, Wendy Holmes

List price: $80.95

30 day, 100% satisfaction guarantee!

Rent eBooks

180 day rental: $39.58 Expiration date: 10/20/2024

365 day rental: $46.77 Expiration date: 4/23/2025

Marketplace

3 new & used from $8.83

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

With the growing impact of information technology on daily life, speech is becoming increasingly important for providing a natural means of communication between humans and machines. This extensively reworked and updated new edition of Speech Synthesis and Recognition is an easy-to-read introduction to current speech technology.Aimed at advanced undergraduates and graduates in electronic engineering, computer science and information technology, the book is also relevant to professional engineers who need to understand enough about speech technology to be able to apply it successfully and to work effectively with speech experts. No advanced mathematical ability is required and no specialist…

Book details

List price: $80.95
Edition: 2nd
Copyright year: 2001
Publisher: CRC Press LLC
Publication date: 12/6/2001
Binding: Paperback
Pages: 316
Size: 6.00" wide x 9.00" long x 0.75" tall
Weight: 1.100



Preface to the First Edition


Preface to the Second Edition


List of Abbreviations



Human Speech Communication



Value of speech for human-machine communication



Ideas and language



Relationship between written and spoken language



Phonetics and phonology



The acoustic signal



Phonemes, phones and allophones



Vowels, consonants and syllables



Phonemes and spelling



Prosodic features



Language, accent and dialect



Supplementing the acoustic signal



The complexity of speech processing


Chapter 1 summary


Chapter 1 exercises



Mechanisms and Models of Human Speech Production



Introduction



Sound sources



The resonant system



Interaction of laryngeal and vocal tract functions



Radiation



Waveforms and spectrograms



Speech production models



Excitation models



Vocal tract models


Chapter 2 summary


Chapter 2 exercises



Mechanisms and Models of the Human Auditory System



Introduction



Physiology of the outer and middle ears



Structure of the cochlea



Neural response



Psychophysical measurements



Analysis of simple and complex signals



Models of the auditory system



Mechanical filtering



Models of neural transduction



Higher-level neural processing


Chapter 3 summary


Chapter 3 exercises



Digital Coding of Speech



Introduction



Simple waveform coders



Pulse code modulation



Deltamodulation



Analysis/synthesis systems (vocoders)



Channel vocoders



Sinusoidal coders



LPC vocoders



Formant vocoders



Efficient parameter coding



Vocoders based on segmental/phonetic structure



Intermediate systems



Sub-band coding



Linear prediction with simple coding of the residual



Adaptive predictive coding



Multipulse LPC



Code-excited linear prediction



Evaluating speech coding algorithms



Subjective speech intelligibility measures



Subjective speech quality measures



Objective speech quality measures



Choosing a coder


Chapter 4 summary


Chapter 4 exercises



Message Synthesis from Stored Human Speech Components



Introduction



Concatenation of whole words



Simple waveform concatenation



Concatenation of vocoded words



Limitations of concatenating word-size units



Concatenation of sub-word units: general principles



Choice of sub-word unit



Recording and selecting data for the units



Varying durations of concatenative units



Synthesis by concatenating vocoded sub-word units



Synthesis by concatenating waveform segments



Pitch modification



Timing modification



Performance of waveform concatenation



Variants of concatenative waveform synthesis



Hardware requirements


Chapter 5 summary


Chapter 5 exercises



Phonetic synthesis by rule



Introduction



Acoustic-phonetic rules



Rules for formant synthesizers



Table-driven phonetic rules



Simple transition calculation



Overlapping transitions



Using the tables to generate utterances



Optimizing phonetic rules



Automatic adjustment of phonetic rules



Rules for different speaker types



Incorporating intensity rules



Current capabilities of phonetic synthesis by rule


Chapter 6 summary


Chapter 6 exercises



Speech Synthesis from Textual or Conceptual Input



Introduction



Emulating the human speaking process



Converting from text to speech



TTS system architecture



Overview of tasks required for TTS conversion



Text analysis



Text pre-processing



Morphological analysis



Phonetic transcription



Syntactic analysis and prosodic phrasing



Assignment of lexical stress and pattern of word accents



Prosody generation



Timing pattern



Fundamental frequency contour



Implementation issues



Current TTS synthesis capabilities



Speech synthesis from concept


Chapter 7 summary


Chapter 7 exercises



Introduction to automatic speech recognition: template matching



Introduction



General principles of pattern matching



Distance metrics



Filter-bank analysis



Level normalization



End-point detection for isolated words



Allowing for timescale variations



Dynamic programming for time alignment



Refinements to isolated-word DP matching



Score pruning



Allowing for end-point errors



Dynamic programming for connected words



Continuous speech recognition



Syntactic constraints



Training a whole-word recognizer


Chapter 8 summary


Chapter 8 exercises



Introduction to stochastic modelling



Feature variability in pattern matching



Introduction to hidden Markov models



Probability calculations in hidden Markov models



The Viterbi algorithm



Parameter estimation for hidden Markov models



Forward and backward probabilities



Parameter re-estimation with forward and backward probabilities



Viterbi training



Vector quantization



Multi-variate continuous distributions



Use of normal distributions with HMMs



Probability calculations



Estimating the parameters of a normal distribution



Baum-Welch re-estimation



Viterbi training



Model initialization



Gaussian mixtures



Calculating emission probabilities



Baum-Welch re-estimation



Re-estimation using the most likely state sequence



Initialization of Gaussian mixture distributions



Tied mixture distributions



Extension of stochastic models to word sequences



Implementing probability calculations



Using the Viterbi algorithm with probabilities in logarithmic form



Adding probabilities when they are in logarithmic form



Relationship between DTW and a simple HMM



State durational characteristics of HMMs


Chapter 9 summary


Chapter 9 exercises



Introduction to front-end analysis for automatic speech recognition



Introduction



Pre-emphasis



Frames and windowing



Filter banks, Fourier analysis and the mel scale



Cepstral analysis



Analysis based on linear prediction



Dynamic features



Capturing the perceptually relevant information



General feature transformations



Variable-frame-rate analysis


Chapter 10 summary


Chapter 10 exercises



Practical techniques for improving speech recognition performance



Introduction



Robustness to environment and channel effects



Feature-based techniques



Model-based techniques



Dealing with unknown or unpredictable noise corruption



Speaker-independent recognition



Speaker normalization



Model adaptation



Bayesian methods for training and adaptation of HMMs



Adaptation methods based on linear transforms



Discriminative training methods



Maximum mutual information training



Training criteria based on reducing recognition errors



Robustness of recognizers to vocabulary variation


Chapter 11 summary


Chapter 11 exercises



Automatic speech recognition for large vocabularies



Introduction



Historical perspective



Speech transcription and speech understanding



Speech transcription



Challenges posed by large vocabularies



Acoustic modelling



Context-dependent phone modelling



Training issues for context-dependent models



Parameter tying



Training procedure



Methods for clustering model parameters



Constructing phonetic decision trees



Extensions beyond triphone modelling



Language modelling



N-grams



Perplexity and evaluating language models



Data sparsity in language modelling



Discounting



Backing off in language modelling



Interpolation of language models



Choice of more general distribution for smoothing



Improving on simple N-grams



Decoding



Efficient one-pass Viterbi decoding for large vocabularies



Multiple-pass Viterbi decoding



Depth-first decoding



Evaluating LVCSR performance



Measuring errors



Controlling word insertion errors



Performance evaluations



Speech understanding



Measuring and evaluating speech understanding performance


Chapter 12 summary


Chapter 12 exercises



Neural networks for speech recognition



Introduction



The human brain



Connectionist models



Properties of ANNs



ANNs for speech recognition



Hybrid HMM/ANN methods


Chapter 13 summary


Chapter 13 exercises



Recognition of speaker characteristics



Characteristics of speakers



Verification versus identification



Assessing performance



Measures of verification performance



Speaker recognition



Text dependence



Methods for text-dependent/text-prompted speaker recognition



Methods for text-independent speaker recognition



Acoustic features for speaker recognition



Evaluations of speaker recognition performance



Language recognition



Techniques for language recognition



Acoustic features for language recognition


Chapter 14 summary


Chapter 14 exercises



Applications and performance of current technology



Introduction



Why use speech technology?



Speech synthesis technology



Examples of speech synthesis applications



Aids for the disabled



Spoken warning signals, instructions and user feedback



Education, toys and games



Telecommunications



Speech recognition technology



Characterizing speech recognizers and recognition tasks



Typical recognition performance for different tasks



Achieving success with ASR in an application



Examples of ASR applications



Command and control



Education, toys and games



Dictation



Data entry and retrieval



Telecommunications



Applications of speaker and language recognition



The future of speech technology applications


Chapter 15 summary


Chapter 15 exercises



Future research directions in speech synthesis and recognition



Introduction



Speech synthesis



Speech sound generation



Prosody generation and higher-level linguistic processing



Automatic speech recognition



Advantages of statistical pattern-matching methods



Limitations of HMMs for speech recognition



Developing improved recognition models



Relationship between synthesis and recognition



Automatic speech understanding


Chapter 16 summary


Chapter 16 exercises



Further Reading



Books



Journals



Conferences and workshops



The Internet



Reading for individual chapters


References


Solutions to Exercises


Glossary


Index