| |
| |
Table of Notation | |
| |
| |
Preface | |
| |
| |
| |
Boolean retrieval | |
| |
| |
| |
An example information retrieval problem | |
| |
| |
| |
A first take at building an inverted index | |
| |
| |
| |
Processing Boolean queries | |
| |
| |
| |
The extended Boolean model versus ranked retrieval | |
| |
| |
| |
References and further reading | |
| |
| |
| |
The term vocabulary and postings lists | |
| |
| |
| |
Document delineation and character sequence decoding | |
| |
| |
| |
Determining the vocabulary of terms | |
| |
| |
| |
Faster postings list intersection via skip pointers | |
| |
| |
| |
Positional postings and phrase queries | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Dictionaries and tolerant retrieval | |
| |
| |
| |
Search structures for dictionaries | |
| |
| |
| |
Wildcard queries | |
| |
| |
| |
Spelling correction | |
| |
| |
| |
Phonetic correction | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Index construction | |
| |
| |
| |
Hardware basics | |
| |
| |
| |
Blocked sort-based indexing | |
| |
| |
| |
Single-pass in-memory indexing | |
| |
| |
| |
Distributed indexing | |
| |
| |
| |
Dynamic indexing | |
| |
| |
| |
Other types of indexes | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Index compression | |
| |
| |
| |
Statistical properties of terms in information retrieval | |
| |
| |
| |
Dictionary compression | |
| |
| |
| |
Postings file compression | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Scoring, term weighting, and the vector space model | |
| |
| |
| |
Parametric and zone indexes | |
| |
| |
| |
Term frequency and weighting | |
| |
| |
| |
The vector space model for scoring | |
| |
| |
| |
Variant tf-idf functions | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Computing scores in a complete search system | |
| |
| |
| |
Efficient scoring and ranking | |
| |
| |
| |
Components of an information retrieval system | |
| |
| |
| |
Vector space scoring and query operator interaction | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Evaluation in information retrieval | |
| |
| |
| |
Information retrieval system evaluation | |
| |
| |
| |
Standard test collections | |
| |
| |
| |
Evaluation of unranked retrieval sets | |
| |
| |
| |
Evaluation of ranked retrieval results | |
| |
| |
| |
Assessing relevance | |
| |
| |
| |
A broader perspective: System quality and user utility | |
| |
| |
| |
Results snippets | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Relevance feedback and query expansion | |
| |
| |
| |
Relevance feedback and pseudo relevance feedback | |
| |
| |
| |
Global methods for query reformulation | |
| |
| |
| |
References and further reading | |
| |
| |
| |
XML retrieval | |
| |
| |
| |
Basic XML concepts | |
| |
| |
| |
Challenges in XML retrieval | |
| |
| |
| |
A vector space model for XML retrieval | |
| |
| |
| |
Evaluation of XML retrieval | |
| |
| |
| |
Text-centric versus data-centric XML retrieval | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Probabilistic information retrieval | |
| |
| |
| |
Review of basic probability theory | |
| |
| |
| |
The probability ranking principle | |
| |
| |
| |
The binary independence model | |
| |
| |
| |
An appraisal and some extensions | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Language models for information retrieval | |
| |
| |
| |
Language models | |
| |
| |
| |
The query likelihood model | |
| |
| |
| |
Language modeling versus other approaches in information retrieval | |
| |
| |
| |
Extended language modeling approaches | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Text classification and Naive Bayes | |
| |
| |
| |
The text classification problem | |
| |
| |
| |
Naive Bayes text classification | |
| |
| |
| |
The Bernoulli model | |
| |
| |
| |
Properties of Naive Bayes | |
| |
| |
| |
Feature selection | |
| |
| |
| |
Evaluation of text classification | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Vector space classification | |
| |
| |
| |
Document representations and measures of relatedness in vector spaces | |
| |
| |
| |
Rocchio classification | |
| |
| |
| |
k nearest neighbor | |
| |
| |
| |
Linear versus nonlinear classifiers | |
| |
| |
| |
Classification with more than two classes | |
| |
| |
| |
The bias-variance tradeoff | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Support vector machines and machine learning on documents | |
| |
| |
| |
Support vector machines: The linearly separable case | |
| |
| |
| |
Extensions to the support vector machine model | |
| |
| |
| |
Issues in the classification of text documents | |
| |
| |
| |
Machine-learning methods in ad hoc information retrieval | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Flat clustering | |
| |
| |
| |
Clustering in information retrieval | |
| |
| |
| |
Problem statement | |
| |
| |
| |
Evaluation of clustering | |
| |
| |
| |
K-means | |
| |
| |
| |
Model-based clustering | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Hierarchical clustering | |
| |
| |
| |
Hierarchical agglomerative clustering | |
| |
| |
| |
Single-link and complete-link clustering | |
| |
| |
| |
Group-average agglomerative clustering | |
| |
| |
| |
Centroid clustering | |
| |
| |
| |
Optimality of hierarchical agglomerative clustering | |
| |
| |
| |
Divisive clustering | |
| |
| |
| |
Cluster labeling | |
| |
| |
| |
Implementation notes | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Matrix decompositions and latent semantic indexing | |
| |
| |
| |
Linear algebra review | |
| |
| |
| |
Term-document matrices and singular value decompositions | |
| |
| |
| |
Low-rank approximations | |
| |
| |
| |
Latent semantic indexing | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Web search basics | |
| |
| |
| |
Background and history | |
| |
| |
| |
Web characteristics | |
| |
| |
| |
Advertising as the economic model | |
| |
| |
| |
The search user experience | |
| |
| |
| |
Index size and estimation | |
| |
| |
| |
Near-duplicates and shingling | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Web crawling and indexes | |
| |
| |
| |
Overview | |
| |
| |
| |
Crawling | |
| |
| |
| |
Distributing indexes | |
| |
| |
| |
Connectivity servers | |
| |
| |
| |
References and further reading | |
| |
| |
| |
Link analysis | |
| |
| |
| |
The Web as a graph | |
| |
| |
| |
PageRank | |
| |
| |
| |
Hubs and authorities | |
| |
| |
| |
References and further reading | |
| |
| |
Bibliography | |
| |
| |
Index | |