| |
| |
Foreword | |
| |
| |
Preface | |
| |
| |
About the Authors | |
| |
| |
| |
Introduction | |
| |
| |
| |
Knowledge in Speech and Language Processing | |
| |
| |
| |
Ambiguity | |
| |
| |
| |
Models and Algorithms | |
| |
| |
| |
Language, Thought, and Understanding | |
| |
| |
| |
The State of the Art | |
| |
| |
| |
Some Brief History | |
| |
| |
| |
Foundational Insights: 1940s and 1950s | |
| |
| |
| |
The Two Camps: 1957 1970 | |
| |
| |
| |
Four Paradigms: 1970 1983 | |
| |
| |
| |
Empiricism and Finite State Models Redux: 1983 1993 | |
| |
| |
| |
The Field Comes Together: 1994 1999 | |
| |
| |
| |
The Rise of Machine Learning: 2000 2008 | |
| |
| |
| |
On Multiple Discoveries | |
| |
| |
| |
A Final Brief Note on Psychology | |
| |
| |
| |
SummaryBibliographical and Historical Notes | |
| |
| |
| |
Words2 Regular Expressions and Automata | |
| |
| |
| |
Regular Expressions | |
| |
| |
| |
Basic Regular Expression Patterns | |
| |
| |
| |
Disjunction, Grouping, and Precedence | |
| |
| |
| |
A Simple Example | |
| |
| |
| |
A More Complex Example | |
| |
| |
| |
Advanced Operators | |
| |
| |
| |
Regular Expression Substitution, Memory, and ELIZA | |
| |
| |
| |
Finite-State Automata | |
| |
| |
| |
Using an FSA to Recognize Sheeptalk | |
| |
| |
| |
Formal Languages | |
| |
| |
| |
Another Example | |
| |
| |
| |
Non-Deterministic FSAs | |
| |
| |
| |
Using an NFSA to Accept Strings | |
| |
| |
| |
Recognition as Search | |
| |
| |
| |
Relating Deterministic and Non-Deterministic Automata | |
| |
| |
| |
Regular Languages and FSAs | |
| |
| |
| |
SummaryBibliographical and Historical NotesExercises3 Words and Transducers | |
| |
| |
| |
Survey of (Mostly) English Morphology | |
| |
| |
| |
Inflectional Morphology | |
| |
| |
| |
Derivational Morphology | |
| |
| |
| |
Cliticization | |
| |
| |
| |
Non-Concatenative Morphology | |
| |
| |
| |
Agreement | |
| |
| |
| |
Finite-State Morphological Parsing | |
| |
| |
| |
Construction of a Finite-State Lexicon | |
| |
| |
| |
Finite-State Transducers | |
| |
| |
| |
Sequential Transducers and Determinism | |
| |
| |
| |
FSTs for Morphological Parsing | |
| |
| |
| |
Transducers and Orthographic Rules | |
| |
| |
| |
The COmbination of an FST Lexicon and Rules | |
| |
| |
| |
Lexicon-Free FSTs: The Porter Stemmer | |
| |
| |
| |
Word and Sentence Tokenization | |
| |
| |
| |
Segmentation in Chinese | |
| |
| |
| |
Detection and Correction of Spelling Errors | |
| |
| |
| |
Minimum Edit Distance | |
| |
| |
| |
Human Morphological Processing | |
| |
| |
| |
SummaryBibliographical and Historical NotesExercises4 N-grams | |
| |
| |
| |
Word Counting in Corpora | |
| |
| |
| |
Simple (Unsmoothed) N-grams | |
| |
| |
| |
Training and Test Sets | |
| |
| |
| |
N-gram Sensitivity to the Training Corpus | |
| |
| |
| |
Unknown Words: Open Versus Closed Vocabulary Tasks | |
| |
| |
| |
Evaluating N-grams: Perplexity | |
| |
| |
| |
Smoothing | |
| |
| |
| |
Laplace Smoothing | |
| |
| |
| |
Good-Turing Discounting | |
| |
| |
| |
Some Advanced Issues in Good-Turing Estimation | |
| |
| |
| |
Interpolation | |
| |
| |
| |
Backoff | |
| |
| |
| |
Advanced: Details of Computing Katz Backoff a and P | |
| |
| |
| |
Practical Issues: Toolkits and Data Formats | |
| |
| |
| |
Advanced Issues in Language Modeling | |
| |
| |
| |
Advanced Smoothing Methods: Kneser-Ney Smoothing | |
| |
| |
| |
Class-Based N-grams | |
| |
| |
| |
Language Model Adaptation and Web Use | |
| |
| |
| |
Using Longer Distance Information: A Brief Summary | |
| |
| |
| |
Advanced: Information Theory Background | |
| |
| |
| |
Cross-Entropy for Comparing Models | |
| |
| |
| |
Advanced: The Entropy of English and Entropy Rate Constancy | |
| |
| |
| |
SummaryBibliographical and Historical NotesExercises5 Part-of-Speech Tagging | |
| |
| |
| |
(Mostly) English Word Classes | |
| |
| |
| |
Tagsets for English | |
| |
| |
| |
Part-of-Speech Tagging | |
| |
| |
| |
Rule-Based Part-of-Speech Tagging | |
| |
| |
| |
HMM Part-of-Speech Tagging | |
| |
| |
| |
Computing the Most-Likely Tag Sequence: An Example | |