| |
| |
List of Tables | |
| |
| |
List of Figures | |
| |
| |
Table of Notations | |
| |
| |
Preface | |
| |
| |
Road Map | |
| |
| |
| |
Preliminaries | |
| |
| |
| |
Introduction | |
| |
| |
| |
Ratinalist and Empiricist Approaches to Language | |
| |
| |
| |
Scientific Content | |
| |
| |
| |
Questions that linguistics should answer | |
| |
| |
| |
Non-categorical phenomena in language | |
| |
| |
| |
Language and cognition as probabilistic phenomena | |
| |
| |
| |
The Ambiguity of Language: Why NLP Is Difficult | |
| |
| |
| |
Dirty Hands | |
| |
| |
| |
Lexical resources | |
| |
| |
| |
Word counts | |
| |
| |
| |
Zipf's laws | |
| |
| |
| |
Collocations | |
| |
| |
| |
Concordances | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Mathematical Foundations | |
| |
| |
| |
Elementary Probability Theory | |
| |
| |
| |
Probability spaces | |
| |
| |
| |
Conditional probability and independence | |
| |
| |
| |
Bayes' theorem | |
| |
| |
| |
Random variables | |
| |
| |
| |
Expectation and variance | |
| |
| |
| |
Notation | |
| |
| |
| |
Joint and conditional distributions | |
| |
| |
| |
Determining P | |
| |
| |
| |
Standard distributions | |
| |
| |
| |
Bayesian statistics | |
| |
| |
| |
Exercises | |
| |
| |
| |
Essential Information Theory | |
| |
| |
| |
Entropy | |
| |
| |
| |
Joint entropy and conditional entropy | |
| |
| |
| |
Mutual information | |
| |
| |
| |
The noisy channel model | |
| |
| |
| |
Relative entropy or Kullback-Leibler divergence | |
| |
| |
| |
The relation to language: Cross entropy | |
| |
| |
| |
The entropy of English | |
| |
| |
| |
Perplexity | |
| |
| |
| |
Exercises | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Linguistic Essentials | |
| |
| |
| |
Parts of Speech and Morphology | |
| |
| |
| |
Nouns and pronouns | |
| |
| |
| |
Words that accompany nouns: Determiners and adjectives | |
| |
| |
| |
Verbs | |
| |
| |
| |
Other parts of speech | |
| |
| |
| |
Phrase Structure | |
| |
| |
| |
Phrase structure grammars | |
| |
| |
| |
Dependency: Arguments and adjuncts | |
| |
| |
| |
X' theory | |
| |
| |
| |
Phrase structure ambiguity | |
| |
| |
| |
Semantics and Pragmatics | |
| |
| |
| |
Other Areas | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Corpus-Based Work | |
| |
| |
| |
Getting Set Up | |
| |
| |
| |
Computers | |
| |
| |
| |
Corpora | |
| |
| |
| |
Software | |
| |
| |
| |
Looking at Text | |
| |
| |
| |
Low-level formatting issues | |
| |
| |
| |
Tokenization: What is a word? | |
| |
| |
| |
Morphology | |
| |
| |
| |
Sentences | |
| |
| |
| |
Marked-up Data | |
| |
| |
| |
Markup schemes | |
| |
| |
| |
Grammatical tagging | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Words | |
| |
| |
| |
Collocations | |
| |
| |
| |
Frequency | |
| |
| |
| |
Mean and Variance | |
| |
| |
| |
Hypothesis Testing | |
| |
| |
| |
The t test | |
| |
| |
| |
Hypothesis testing of differences | |
| |
| |
| |
Pearson's chi-square test | |
| |
| |
| |
Likelihood ratios | |
| |
| |
| |
Mutual Information | |
| |
| |
| |
The Notion of Collocation | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Statistical Inference: n-gram Models over Sparse Data | |
| |
| |
| |
Bins: Forming Equivalence Classes | |
| |
| |
| |
Reliability vs. discrimination | |
| |
| |
| |
n-gram models | |
| |
| |
| |
Building n-gram models | |
| |
| |
| |
Statistical Estimators | |
| |
| |
| |
Maximum Likelihood Estimation (MLE) | |
| |
| |
| |
Laplace's law, Lidstone's law and the Jeffreys-Perks law | |
| |
| |
| |
Held out estimation | |
| |
| |
| |
Cross-validation (deleted estimation) | |
| |
| |
| |
Good-Turing estimation | |
| |
| |
| |
Briefly noted | |
| |
| |
| |
Combining Estimators | |
| |
| |
| |
Simple linear interpolation | |
| |
| |
| |
Katz's backing-off | |
| |
| |
| |
General linear interpolation | |
| |
| |
| |
Briefly noted | |
| |
| |
| |
Language models for Austen | |
| |
| |
| |
Conclusions | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Word Sense Disambiguation | |
| |
| |
| |
Methodological Preliminaries | |
| |
| |
| |
Supervised and unsupervised learning | |
| |
| |
| |
Pseudowords | |
| |
| |
| |
Upper and lower bounds on performance | |
| |
| |
| |
Supervised Disambiguation | |
| |
| |
| |
Bayesian classification | |
| |
| |
| |
An information-theoretic approach | |
| |
| |
| |
Dictionary-Based Disambiguation | |
| |
| |
| |
Disambiguation based on sense definitions | |
| |
| |
| |
Thesaurus-based disambiguation | |
| |
| |
| |
Disambiguation based on translations in a second-language corpus | |
| |
| |
| |
One sense per discourse, one sense per collocation | |
| |
| |
| |
Unsupervised Disambiguation | |
| |
| |
| |
What Is a Word Sense? | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Lexical Acquisition | |
| |
| |
| |
Evaluation Measures | |
| |
| |
| |
Verb Subcategorization | |
| |
| |
| |
Attachment Ambiguity | |
| |
| |
| |
Hindle and Rooth (1993) | |
| |
| |
| |
General remarks on PP attachment | |
| |
| |
| |
Selectional Preferences | |
| |
| |
| |
Semantic Similarity | |
| |
| |
| |
Vector space measures | |
| |
| |
| |
Probabilistic measures | |
| |
| |
| |
The Role of Lexical Acquisition in Statistical NLP | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Grammar | |
| |
| |
| |
Markov Models | |
| |
| |
| |
Markov Models | |
| |
| |
| |
Hidden Markov Models | |
| |
| |
| |
Why use HMMs? | |
| |
| |
| |
General form of an HMM | |
| |
| |
| |
The Three Fundamental Questions for HMMs | |
| |
| |
| |
Finding the probability of an observation | |
| |
| |
| |
Finding the best state sequence | |
| |
| |
| |
The third problem: Parameter estimation | |
| |
| |
| |
HMMs: Implementation, Properties, and Variants | |
| |
| |
| |
Implementation | |
| |
| |
| |
Variants | |
| |
| |
| |
Multiple input observations | |
| |
| |
| |
Initialization of parameter values | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Part-of-Speech Tagging | |
| |
| |
| |
The Information Sources in Tagging | |
| |
| |
| |
Markov Model Taggers | |
| |
| |
| |
The probabilistic model | |
| |
| |
| |
The Viterbi algorithm | |
| |
| |
| |
Variations | |
| |
| |
| |
Hidden Markov Model Taggers | |
| |
| |
| |
Applying HMMs to POS tagging | |
| |
| |
| |
The effect of initialization on HMM training | |
| |
| |
| |
Transformation-Based Learning of Tags | |
| |
| |
| |
Transformations | |
| |
| |
| |
The learning algorithm | |
| |
| |
| |
Relation to other models | |
| |
| |
| |
Automata | |
| |
| |
| |
Summary | |
| |
| |
| |
Other Methods, Other Languages | |
| |
| |
| |
Other approaches to tagging | |
| |
| |
| |
Languages other than English | |
| |
| |
| |
Tagging Accuracy and Uses of Taggers | |
| |
| |
| |
Tagging accuracy | |
| |
| |
| |
Applications of tagging | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Probabilistic Context Free Grammars | |
| |
| |
| |
Some Features of PCFGs | |
| |
| |
| |
Questions for PCFGs | |
| |
| |
| |
The Probability of a String | |
| |
| |
| |
Using inside probabilities | |
| |
| |
| |
Using outside probabilities | |
| |
| |
| |
Finding the most likely parse for a sentence | |
| |
| |
| |
Training a PCFG | |
| |
| |
| |
Problems with the Inside-Outside Algorithm | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Probabilistic Parsing | |
| |
| |
| |
Some Concepts | |
| |
| |
| |
Parsing for disambiguation | |
| |
| |
| |
Treebanks | |
| |
| |
| |
Parsing models vs. language models | |
| |
| |
| |
Weakening the independence assumptions of PCFGs | |
| |
| |
| |
Tree probabilities and derivational probabilities | |
| |
| |
| |
There's more than one way to do it | |
| |
| |
| |
Phrase structure grammars and dependency grammars | |
| |
| |
| |
Evaluation | |
| |
| |
| |
Equivalent models | |
| |
| |
| |
Building parsers: Search methods | |
| |
| |
| |
Use of the geometric mean | |
| |
| |
| |
Some Approaches | |
| |
| |
| |
Non-lexicalized treebank grammars | |
| |
| |
| |
Lexicalized models using derivational histories | |
| |
| |
| |
Dependency-based models | |
| |
| |
| |
Discussion | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Applications and Techniques | |
| |
| |
| |
Statistical Alignment and Machine Translation | |
| |
| |
| |
Text Alignment | |
| |
| |
| |
Aligning sentences and paragraphs | |
| |
| |
| |
Length-based methods | |
| |
| |
| |
Offset alignment by signal processing techniques | |
| |
| |
| |
Lexical methods of sentence alignment | |
| |
| |
| |
Summary | |
| |
| |
| |
Exercises | |
| |
| |
| |
Word Alignment | |
| |
| |
| |
Statistical Machine Translation | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Clustering | |
| |
| |
| |
Hierarchical Clustering | |
| |
| |
| |
Single-link and complete-link clustering | |
| |
| |
| |
Group-average agglomerative clustering | |
| |
| |
| |
An application: Improving a language model | |
| |
| |
| |
Top-down clustering | |
| |
| |
| |
Non-Hierarchical Clustering | |
| |
| |
| |
K-means | |
| |
| |
| |
The EM algorithm | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Topics in Information Retrieval | |
| |
| |
| |
Some Background on Information Retrieval | |
| |
| |
| |
Common design features of IR systems | |
| |
| |
| |
Evaluation measures | |
| |
| |
| |
The probability ranking principle (PRP) | |
| |
| |
| |
The Vector Space Model | |
| |
| |
| |
Vector similarity | |
| |
| |
| |
Term weighting | |
| |
| |
| |
Term Distribution Models | |
| |
| |
| |
The Poisson distribution | |
| |
| |
| |
The two-Poisson model | |
| |
| |
| |
The K mixture | |
| |
| |
| |
Inverse document frequency | |
| |
| |
| |
Residual inverse document frequency | |
| |
| |
| |
Usage of term distribution models | |
| |
| |
| |
Latent Semantic Indexing | |
| |
| |
| |
Least-squares methods | |
| |
| |
| |
Singular Value Decomposition | |
| |
| |
| |
Latent Semantic Indexing in IR | |
| |
| |
| |
Discourse Segmentation | |
| |
| |
| |
TextTiling | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Text Categorization | |
| |
| |
| |
Decision Trees | |
| |
| |
| |
Maximum Entropy Modeling | |
| |
| |
| |
Generalized iterative scaling | |
| |
| |
| |
Application to text categorization | |
| |
| |
| |
Perceptrons | |
| |
| |
| |
k Nearest Neighbor Classification | |
| |
| |
| |
Further Reading | |
| |
| |
Tiny Statistical Tables | |
| |
| |
Bibliography | |
| |
| |
Index | |