Skip to content

Foundations of Statistical Natural Language Processing

Best in textbook rentals since 2012!

ISBN-10: 0262133601

ISBN-13: 9780262133609

Edition: 1999

Authors: Christopher D. Manning, Hinrich Sch�tze

List price: $95.00
Shipping box This item qualifies for FREE shipping.
Blue ribbon 30 day, 100% satisfaction guarantee!

Rental notice: supplementary materials (access codes, CDs, etc.) are not guaranteed with rental orders.

what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.
Customers also bought

Book details

List price: $95.00
Copyright year: 1999
Publisher: MIT Press
Publication date: 5/28/1999
Binding: Hardcover
Pages: 720
Size: 8.00" wide x 9.25" long x 1.75" tall
Weight: 3.520
Language: English

List of Tables
List of Figures
Table of Notations
Preface
Road Map
Preliminaries
Introduction
Ratinalist and Empiricist Approaches to Language
Scientific Content
Questions that linguistics should answer
Non-categorical phenomena in language
Language and cognition as probabilistic phenomena
The Ambiguity of Language: Why NLP Is Difficult
Dirty Hands
Lexical resources
Word counts
Zipf's laws
Collocations
Concordances
Further Reading
Exercises
Mathematical Foundations
Elementary Probability Theory
Probability spaces
Conditional probability and independence
Bayes' theorem
Random variables
Expectation and variance
Notation
Joint and conditional distributions
Determining P
Standard distributions
Bayesian statistics
Exercises
Essential Information Theory
Entropy
Joint entropy and conditional entropy
Mutual information
The noisy channel model
Relative entropy or Kullback-Leibler divergence
The relation to language: Cross entropy
The entropy of English
Perplexity
Exercises
Further Reading
Linguistic Essentials
Parts of Speech and Morphology
Nouns and pronouns
Words that accompany nouns: Determiners and adjectives
Verbs
Other parts of speech
Phrase Structure
Phrase structure grammars
Dependency: Arguments and adjuncts
X' theory
Phrase structure ambiguity
Semantics and Pragmatics
Other Areas
Further Reading
Exercises
Corpus-Based Work
Getting Set Up
Computers
Corpora
Software
Looking at Text
Low-level formatting issues
Tokenization: What is a word?
Morphology
Sentences
Marked-up Data
Markup schemes
Grammatical tagging
Further Reading
Exercises
Words
Collocations
Frequency
Mean and Variance
Hypothesis Testing
The t test
Hypothesis testing of differences
Pearson's chi-square test
Likelihood ratios
Mutual Information
The Notion of Collocation
Further Reading
Statistical Inference: n-gram Models over Sparse Data
Bins: Forming Equivalence Classes
Reliability vs. discrimination
n-gram models
Building n-gram models
Statistical Estimators
Maximum Likelihood Estimation (MLE)
Laplace's law, Lidstone's law and the Jeffreys-Perks law
Held out estimation
Cross-validation (deleted estimation)
Good-Turing estimation
Briefly noted
Combining Estimators
Simple linear interpolation
Katz's backing-off
General linear interpolation
Briefly noted
Language models for Austen
Conclusions
Further Reading
Exercises
Word Sense Disambiguation
Methodological Preliminaries
Supervised and unsupervised learning
Pseudowords
Upper and lower bounds on performance
Supervised Disambiguation
Bayesian classification
An information-theoretic approach
Dictionary-Based Disambiguation
Disambiguation based on sense definitions
Thesaurus-based disambiguation
Disambiguation based on translations in a second-language corpus
One sense per discourse, one sense per collocation
Unsupervised Disambiguation
What Is a Word Sense?
Further Reading
Exercises
Lexical Acquisition
Evaluation Measures
Verb Subcategorization
Attachment Ambiguity
Hindle and Rooth (1993)
General remarks on PP attachment
Selectional Preferences
Semantic Similarity
Vector space measures
Probabilistic measures
The Role of Lexical Acquisition in Statistical NLP
Further Reading
Grammar
Markov Models
Markov Models
Hidden Markov Models
Why use HMMs?
General form of an HMM
The Three Fundamental Questions for HMMs
Finding the probability of an observation
Finding the best state sequence
The third problem: Parameter estimation
HMMs: Implementation, Properties, and Variants
Implementation
Variants
Multiple input observations
Initialization of parameter values
Further Reading
Part-of-Speech Tagging
The Information Sources in Tagging
Markov Model Taggers
The probabilistic model
The Viterbi algorithm
Variations
Hidden Markov Model Taggers
Applying HMMs to POS tagging
The effect of initialization on HMM training
Transformation-Based Learning of Tags
Transformations
The learning algorithm
Relation to other models
Automata
Summary
Other Methods, Other Languages
Other approaches to tagging
Languages other than English
Tagging Accuracy and Uses of Taggers
Tagging accuracy
Applications of tagging
Further Reading
Exercises
Probabilistic Context Free Grammars
Some Features of PCFGs
Questions for PCFGs
The Probability of a String
Using inside probabilities
Using outside probabilities
Finding the most likely parse for a sentence
Training a PCFG
Problems with the Inside-Outside Algorithm
Further Reading
Exercises
Probabilistic Parsing
Some Concepts
Parsing for disambiguation
Treebanks
Parsing models vs. language models
Weakening the independence assumptions of PCFGs
Tree probabilities and derivational probabilities
There's more than one way to do it
Phrase structure grammars and dependency grammars
Evaluation
Equivalent models
Building parsers: Search methods
Use of the geometric mean
Some Approaches
Non-lexicalized treebank grammars
Lexicalized models using derivational histories
Dependency-based models
Discussion
Further Reading
Exercises
Applications and Techniques
Statistical Alignment and Machine Translation
Text Alignment
Aligning sentences and paragraphs
Length-based methods
Offset alignment by signal processing techniques
Lexical methods of sentence alignment
Summary
Exercises
Word Alignment
Statistical Machine Translation
Further Reading
Clustering
Hierarchical Clustering
Single-link and complete-link clustering
Group-average agglomerative clustering
An application: Improving a language model
Top-down clustering
Non-Hierarchical Clustering
K-means
The EM algorithm
Further Reading
Exercises
Topics in Information Retrieval
Some Background on Information Retrieval
Common design features of IR systems
Evaluation measures
The probability ranking principle (PRP)
The Vector Space Model
Vector similarity
Term weighting
Term Distribution Models
The Poisson distribution
The two-Poisson model
The K mixture
Inverse document frequency
Residual inverse document frequency
Usage of term distribution models
Latent Semantic Indexing
Least-squares methods
Singular Value Decomposition
Latent Semantic Indexing in IR
Discourse Segmentation
TextTiling
Further Reading
Exercises
Text Categorization
Decision Trees
Maximum Entropy Modeling
Generalized iterative scaling
Application to text categorization
Perceptrons
k Nearest Neighbor Classification
Further Reading
Tiny Statistical Tables
Bibliography
Index