| |
| |
Foreword | |
| |
| |
Preface | |
| |
| |
| |
What's it all about? | |
| |
| |
| |
Data mining and machine learning | |
| |
| |
Describing structural patterns | |
| |
| |
Machine learning | |
| |
| |
Data mining | |
| |
| |
| |
Simple examples: The weather problem and others | |
| |
| |
The weather problem | |
| |
| |
Contact lenses: An idealized problem | |
| |
| |
Irises: A classic numeric dataset | |
| |
| |
CPU performance: Introducing numeric prediction | |
| |
| |
Labor negotiations: A more realistic example | |
| |
| |
Soybean classification: A classic machine learning success | |
| |
| |
| |
Fielded applications | |
| |
| |
Decisions involving judgment | |
| |
| |
Screening images | |
| |
| |
Load forecasting | |
| |
| |
Diagnosis | |
| |
| |
Marketing and sales | |
| |
| |
| |
Machine learning and statistics | |
| |
| |
| |
Generalization as search | |
| |
| |
Enumerating the concept space | |
| |
| |
Bias | |
| |
| |
| |
Data mining and ethics | |
| |
| |
| |
Further reading | |
| |
| |
| |
Input: Concepts, instances, attributes | |
| |
| |
| |
What's a concept? | |
| |
| |
| |
What's in an example? | |
| |
| |
| |
What's in an attribute? | |
| |
| |
| |
Preparing the input | |
| |
| |
Gathering the data together | |
| |
| |
Arff format | |
| |
| |
Attribute types | |
| |
| |
Missing values | |
| |
| |
Inaccurate values | |
| |
| |
Getting to know your data | |
| |
| |
| |
Further reading | |
| |
| |
| |
Output: Knowledge representation | |
| |
| |
| |
Decision tables | |
| |
| |
| |
Decision trees | |
| |
| |
| |
Classification rules | |
| |
| |
| |
Association rules | |
| |
| |
| |
Rules with exceptions | |
| |
| |
| |
Rules involving relations | |
| |
| |
| |
Trees for numeric prediction | |
| |
| |
| |
Instance-based representation | |
| |
| |
| |
Clusters | |
| |
| |
| |
Further reading | |
| |
| |
| |
Algorithms: The basic methods | |
| |
| |
| |
Inferring rudimentary rules | |
| |
| |
Missing values and numeric attributes | |
| |
| |
Discussion | |
| |
| |
| |
Statistical modeling | |
| |
| |
Missing values and numeric attributes | |
| |
| |
Discussion | |
| |
| |
| |
Divide and conquer: Constructing decision trees | |
| |
| |
Calculating information | |
| |
| |
Highly branching attributes | |
| |
| |
Discussion | |
| |
| |
| |
Covering algorithms: Constructing rules | |
| |
| |
Rules versus trees | |
| |
| |
A simple covering algorithm | |
| |
| |
Rules versus decision lists | |
| |
| |
| |
Mining association rules | |
| |
| |
Item sets | |
| |
| |
Association rules | |
| |
| |
Generating rules efficiently | |
| |
| |
Discussion | |
| |
| |
| |
Linear models | |
| |
| |
Numeric prediction | |
| |
| |
Classification | |
| |
| |
Discussion | |
| |
| |
| |
Instance-based learning | |
| |
| |
The distance function | |
| |
| |
Discussion | |
| |
| |
| |
Further reading | |
| |
| |
| |
Credibility: Evaluating what's been learned | |
| |
| |
| |
Training and testing | |
| |
| |
| |
Predicting performance | |
| |
| |
| |
Cross-validation | |
| |
| |
| |
Other estimates | |
| |
| |
Leave-one-out | |
| |
| |
The bootstrap | |
| |
| |
| |
Comparing data mining schemes | |
| |
| |
| |
Predicting probabilities | |
| |
| |
Quadratic loss function | |
| |
| |
Informational loss function | |
| |
| |
Discussion | |
| |
| |
| |
Counting the cost | |
| |
| |
Lift charts | |
| |
| |
ROC curves | |
| |
| |
Cost-sensitive learning | |
| |
| |
Discussion | |
| |
| |
| |
Evaluating numeric prediction | |
| |
| |
| |
The minimum description length principle | |
| |
| |
| |
Applying MDL to clustering | |
| |
| |
| |
Further reading | |
| |
| |
| |
Implementations: Real machine learning schemes | |
| |
| |
| |
Decision trees | |
| |
| |
Numeric attributes | |
| |
| |
Missing values | |
| |
| |
Pruning | |
| |
| |
Estimating error rates | |
| |
| |
Complexity of decision tree induction | |
| |
| |
From trees to rules | |
| |
| |
C4.5: Choices and options | |
| |
| |
Discussion | |
| |
| |
| |
Classification rules | |
| |
| |
Criteria for choosing tests | |
| |
| |
Missing values, numeric attributes | |
| |
| |
Good rules and bad rules | |
| |
| |
Generating good rules | |
| |
| |
Generating good decision lists | |
| |
| |
Probability measure for rule evaluation | |
| |
| |
Evaluating rules using a test set | |
| |
| |
Obtaining rules from partial decision trees | |
| |
| |
Rules with exceptions | |
| |
| |
Discussion | |
| |
| |
| |
Extending linear classification: Support vector machines | |
| |
| |
The maximum margin hyperplane | |
| |
| |
Nonlinear class boundaries | |
| |
| |
Discussion | |
| |
| |
| |
Instance-based learning | |
| |
| |
Reducing the number of exemplars | |
| |
| |
Pruning noisy exemplars | |
| |
| |
Weighting attributes | |
| |
| |
Generalizing exemplars | |
| |
| |
Distance functions for generalized exemplars | |
| |
| |
Generalized distance functions | |
| |
| |
Discussion | |
| |
| |
| |
Numeric prediction | |
| |
| |
Model trees | |
| |
| |
Building the tree | |
| |
| |
Pruning the tree | |
| |
| |
Nominal attributes | |
| |
| |
Missing values | |
| |
| |
Pseudo-code for model tree induction | |
| |
| |
Locally weighted linear regression | |
| |
| |
Discussion | |
| |
| |
| |
Clustering | |
| |
| |
Iterative distance-based clustering | |
| |
| |
Incremental clustering | |
| |
| |
Category utility | |
| |
| |
Probability-based clustering | |
| |
| |
The EM algorithm | |
| |
| |
Extending the mixture model | |
| |
| |
Bayesian clustering | |
| |
| |
Discussion | |
| |
| |
| |
Moving on: Engineering the input and output | |
| |
| |
| |
Attribute selection | |
| |
| |
Scheme-independent selection | |
| |
| |
Searching the attribute space | |
| |
| |
Scheme-specific selection | |
| |
| |
| |
Discretizing numeric attributes | |
| |
| |
Unsupervised discretization | |
| |
| |
Entropy-based discretization | |
| |
| |
Other discretization methods | |
| |
| |
Entropy-based versus error-based discretization | |
| |
| |
Converting discrete to numeric attributes | |
| |
| |
| |
Automatic data cleansing | |
| |
| |
Improving decision trees | |
| |
| |
Robust regression | |
| |
| |
Detecting anomalies | |
| |
| |
| |
Combining multiple models | |
| |
| |
Bagging | |
| |
| |
Boosting | |
| |
| |
Stacking | |
| |
| |
Error-correcting output codes | |
| |
| |
| |
Further reading | |
| |
| |
| |
Nuts and bolts: Machine learning algorithms in Java | |
| |
| |
| |
Getting started | |
| |
| |
| |
Javadoc and the class library | |
| |
| |
Classes, instances, and packages | |
| |
| |
The weka.core package | |
| |
| |
The weka.classifiers package | |
| |
| |
Other packages | |
| |
| |
Indexes | |
| |
| |
| |
Processing datasets using the machine learning programs | |
| |
| |
Using M5' | |
| |
| |
Generic options | |
| |
| |
Scheme-specific options | |
| |
| |
Classifiers | |
| |
| |
Meta-learning shemes | |
| |
| |
Filters | |
| |
| |
Association rules | |
| |
| |
Clustering | |
| |
| |
| |
Embedded machine learning | |
| |
| |
A simple message classifier | |
| |
| |
| |
Writing new learning schemes | |
| |
| |
An example classifier | |
| |
| |
Conventions for implementing classifiers | |
| |
| |
Writing filters | |
| |
| |
An example filter | |
| |
| |
Conventions for writing filters | |
| |
| |
| |
Looking forward | |
| |
| |
| |
Learning from massive datasets | |
| |
| |
| |
Visualizing machine learning | |
| |
| |
Visualizing the input | |
| |
| |
Visualizing the output | |
| |
| |
| |
Incorporating domain knowledge | |
| |
| |
| |
Text mining | |
| |
| |
Finding key phrases for documents | |
| |
| |
Finding information in running text | |
| |
| |
Soft parsing | |
| |
| |
| |
Mining the World Wide Web | |
| |
| |
| |
Further reading | |
| |
| |
References | |
| |
| |
Index | |
| |
| |
About the authors | |