| |

| |

Preface | |

| |

| |

| |

Introduction and Preview | |

| |

| |

| |

Multivariate Analysis | |

| |

| |

| |

Data Mining | |

| |

| |

| |

From EDA to Data Mining | |

| |

| |

| |

What Is Data Mining? | |

| |

| |

| |

Knowledge Discovery | |

| |

| |

| |

Machine Learning | |

| |

| |

| |

How Does a Machine Learn? | |

| |

| |

| |

Prediction Accuracy | |

| |

| |

| |

Generalization | |

| |

| |

| |

Generalization Error | |

| |

| |

| |

Overfitting | |

| |

| |

| |

Overview of Chapters | |

| |

| |

Bibliographical Notes | |

| |

| |

| |

Data and Databases | |

| |

| |

| |

Introduction | |

| |

| |

| |

Examples | |

| |

| |

| |

Example: DNA Microarray Data | |

| |

| |

| |

Example: Mixtures of Polyaromatic Hydrocarbons | |

| |

| |

| |

Example: Face Recognition | |

| |

| |

| |

Databases | |

| |

| |

| |

Data Types | |

| |

| |

| |

Trends in Data Storage | |

| |

| |

| |

Databases on the Internet | |

| |

| |

| |

Database Management | |

| |

| |

| |

Elements of Database Systems | |

| |

| |

| |

Structured Query Language (SQL) | |

| |

| |

| |

OLTP Databases | |

| |

| |

| |

Integrating Distributed Databases | |

| |

| |

| |

Data Warehousing | |

| |

| |

| |

Decision Support Systems and OLAP | |

| |

| |

| |

Statistical Packages and DBMSs | |

| |

| |

| |

Data Quality Problems | |

| |

| |

| |

Data Inconsistencies | |

| |

| |

| |

Outliers | |

| |

| |

| |

Missing Data | |

| |

| |

| |

More Variables than Observations | |

| |

| |

| |

The Curse of Dimensionality | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Random Vectors and Matrices | |

| |

| |

| |

Introduction | |

| |

| |

| |

Vectors and Matrices | |

| |

| |

| |

Notation | |

| |

| |

| |

Basic Matrix Operations | |

| |

| |

| |

Vectoring and Kronecker Products | |

| |

| |

| |

Eigenanalysis for Square Matrices | |

| |

| |

| |

Functions of Matrices | |

| |

| |

| |

Singular-Value Decomposition | |

| |

| |

| |

Generalized Inverses | |

| |

| |

| |

Matrix Norms | |

| |

| |

| |

Condition Numbers for Matrices | |

| |

| |

| |

Eigenvalue Inequalities | |

| |

| |

| |

Matrix Calculus | |

| |

| |

| |

Random Vectors | |

| |

| |

| |

Multivariate Moments | |

| |

| |

| |

Multivariate Gaussian Distribution | |

| |

| |

| |

Conditional Gaussian Distributions | |

| |

| |

| |

Random Matrices | |

| |

| |

| |

Wishart Distribution | |

| |

| |

| |

Maximum Likelihood Estimation for the Gaussian | |

| |

| |

| |

Joint Distribution of Sample Mean and Sample Covariance Matrix | |

| |

| |

| |

Admissibility | |

| |

| |

| |

James-Stein Estimator of the Mean Vector | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Nonparametric Density Estimation | |

| |

| |

| |

Introduction | |

| |

| |

| |

Example: Coronary Heart Disease | |

| |

| |

| |

Statistical Properties of Density Estimators | |

| |

| |

| |

Unbiasedness | |

| |

| |

| |

Consistency | |

| |

| |

| |

Bona Fide Density Estimators | |

| |

| |

| |

The Histogram | |

| |

| |

| |

The Histogram as an ML Estimator | |

| |

| |

| |

Asymptotics | |

| |

| |

| |

Estimating Bin Width | |

| |

| |

| |

Multivariate Histograms | |

| |

| |

| |

Maximum Penalized Likelihood | |

| |

| |

| |

Kernel Density Estimation | |

| |

| |

| |

Choice of Kernel | |

| |

| |

| |

Asymptotics | |

| |

| |

| |

Example: 1872 Hidalgo Postage Stamps of Mexico | |

| |

| |

| |

Estimating the Window Width | |

| |

| |

| |

Projection Pursuit Density Estimation | |

| |

| |

| |

The PPDE Paradigm | |

| |

| |

| |

Projection Indexes | |

| |

| |

| |

Assessing Multimodality | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Model Assessment and Selection in Multiple Regression | |

| |

| |

| |

Introduction | |

| |

| |

| |

The Regression Function and Least Squares | |

| |

| |

| |

Random A Case | |

| |

| |

| |

Fixed A Case | |

| |

| |

| |

Example: Bodyfat Data | |

| |

| |

| |

Prediction Accuracy and Model Assessment | |

| |

| |

| |

Random-X Case | |

| |

| |

| |

Fixed- X Case | |

| |

| |

| |

Estimating Prediction Error | |

| |

| |

| |

Apparent Error Rate | |

| |

| |

| |

Cross-Validation | |

| |

| |

| |

Bootstrap | |

| |

| |

| |

Instability of LS Estimates | |

| |

| |

| |

Biased Regression Methods | |

| |

| |

| |

Example: PET Yarns and NIR Spectra | |

| |

| |

| |

Principal Components Regression | |

| |

| |

| |

Partial Least Squares Regression | |

| |

| |

| |

Ridge Regression | |

| |

| |

| |

Variable Selection | |

| |

| |

| |

Stepwise Methods | |

| |

| |

| |

All Possible Subsets | |

| |

| |

| |

Criticisms of Variable Selection Methods | |

| |

| |

| |

Regularized Regression | |

| |

| |

| |

Least Angle Regression | |

| |

| |

| |

The Forwards Stagewise Algorithm | |

| |

| |

| |

The LARS Algorithm | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Multivariate Regression | |

| |

| |

| |

Introduction | |

| |

| |

| |

The Fixed-X Case | |

| |

| |

| |

Classical Multivariate Regression Model | |

| |

| |

| |

Example: Norwegian Paper Quality | |

| |

| |

| |

Separate and Multivariate Ridge Regressions | |

| |

| |

| |

Linear Constraints on the Regression Coefficients | |

| |

| |

| |

The Random-X Case | |

| |

| |

| |

Classical Multivariate Regression Model | |

| |

| |

| |

Multivariate Reduced-Rank Regression | |

| |

| |

| |

Example: Chemical Composition of Tobacco | |

| |

| |

| |

Assessing the Effective Dimensionality | |

| |

| |

| |

Example: Mixtures of Polyaromatic Hydrocarbons | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Linear Dimensionality Reduction | |

| |

| |

| |

Introduction | |

| |

| |

| |

Principal Component Analysis | |

| |

| |

| |

Example: The Nutritional Value of Food | |

| |

| |

| |

Population Principal Components | |

| |

| |

| |

Least-Squares Optimality of PCA | |

| |

| |

| |

PCA as a Variance-Maximization Technique | |

| |

| |

| |

Sample Principal Components | |

| |

| |

| |

How Many Principal Components to Retain? | |

| |

| |

| |

Graphical Displays | |

| |

| |

| |

Example: Face Recognition Using Eigenfaces | |

| |

| |

| |

Invariance and Scaling | |

| |

| |

| |

Example: Pen-Based Handwritten Digit Recognition | |

| |

| |

| |

Functional PCA | |

| |

| |

| |

What Can Be Gained from Using PCA? | |

| |

| |

| |

Canonical Variate and Correlation Analysis | |

| |

| |

| |

Canonical Variates and Canonical Correlations | |

| |

| |

| |

Example: COMBO-17 Galaxy Photometric Catalogue | |

| |

| |

| |

Least-Squares Optimality of CVA | |

| |

| |

| |

Relationship of CVA to RRR | |

| |

| |

| |

CVA as a Correlation-Maximization Technique | |

| |

| |

| |

Sample Estimates | |

| |

| |

| |

Invariance | |

| |

| |

| |

How Many Pairs of Canonical Variates to Retain? | |

| |

| |

| |

Projection Pursuit | |

| |

| |

| |

Projection Indexes | |

| |

| |

| |

Optimizing the Projection Index | |

| |

| |

| |

Visualizing Projections Using Dynamic Graphics | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Linear Discriminant Analysis | |

| |

| |

| |

Introduction | |

| |

| |

| |

Example: Wisconsin Diagnostic Breast Cancer Data | |

| |

| |

| |

Classes and Features | |

| |

| |

| |

Binary Classification | |

| |

| |

| |

Bayes's Rule Classifier | |

| |

| |

| |

Gaussian Linear Discriminant Analysis | |

| |

| |

| |

LDA via Multiple Regression | |

| |

| |

| |

Variable Selection | |

| |

| |

| |

Logistic Discrimination | |

| |

| |

| |

Gaussian LDA or Logistic Discrimination? | |

| |

| |

| |

Quadratic Discriminant Analysis | |

| |

| |

| |

Examples of Binary Misclassification Rates | |

| |

| |

| |

Multiclass LDA | |

| |

| |

| |

Bayes's Rule Classifier | |

| |

| |

| |

Multiclass Logistic Discrimination | |

| |

| |

| |

LDA via Reduced-Rank Regression | |

| |

| |

| |

Example: Gilgaied Soil | |

| |

| |

| |

Examples of Multiclass Misclassification Rates | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Recursive Partitioning and Tree-Based Methods | |

| |

| |

| |

Introduction | |

| |

| |

| |

Classification Trees | |

| |

| |

| |

Example: Cleveland Heart-Disease Data | |

| |

| |

| |

Tree-Growing Procedure | |

| |

| |

| |

Splitting Strategies | |

| |

| |

| |

Example: Pima Indians Diabetes Study | |

| |

| |

| |

Estimating the Misclassification Rate | |

| |

| |

| |

Pruning the Tree | |

| |

| |

| |

Choosing the Best Pruned Subtree | |

| |

| |

| |

Example: Vehicle Silhouettes | |

| |

| |

| |

Regression Trees | |

| |

| |

| |

The Terminal-Node Value | |

| |

| |

| |

Splitting Strategy | |

| |

| |

| |

Pruning the Tree | |

| |

| |

| |

Selecting the Best Pruned Subtree | |

| |

| |

| |

Example: 1992 Major League Baseball Salaries | |

| |

| |

| |

Extensions and Adjustments | |

| |

| |

| |

Multivariate Responses | |

| |

| |

| |

Survival Trees | |

| |

| |

| |

MARS | |

| |

| |

| |

Missing Data | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Artificial Neural Networks | |

| |

| |

| |

Introduction | |

| |

| |

| |

The Brain as a Neural Network | |

| |

| |

| |

The McCulloch-Pitts Neuron | |

| |

| |

| |

Hebbian Learning Theory | |

| |

| |

| |

Single-Layer Perceptrons | |

| |

| |

| |

Feedforward Single-Layer Networks | |

| |

| |

| |

Activation Functions | |

| |

| |

| |

Rosenblatt's Single-Unit Perceptron | |

| |

| |

| |

The Perceptron Learning Rule | |

| |

| |

| |

Perceptron Convergence Theorem | |

| |

| |

| |

Limitations of the Perceptron | |

| |

| |

| |

Artificial Intelligence and Expert Systems | |

| |

| |

| |

Multilayer Perceptrons | |

| |

| |

| |

Network Architecture | |

| |

| |

| |

A Single Hidden Layer | |

| |

| |

| |

ANNs Can Approximate Continuous Functions | |

| |

| |

| |

More than One Hidden Layer | |

| |

| |

| |

Optimality Criteria | |

| |

| |

| |

The Backpropagation of Errors Algorithm | |

| |

| |

| |

Convergence and Stopping | |

| |

| |

| |

Network Design Considerations | |

| |

| |

| |

Learning Modes | |

| |

| |

| |

Input Scaling | |

| |

| |

| |

How Many Hidden Nodes and Layers? | |

| |

| |

| |

Initializing the Weights | |

| |

| |

| |

Overfitting and Network Pruning | |

| |

| |

| |

Example: Detecting Hidden Messages in Digital Images | |

| |

| |

| |

Examples of Fitting Neural Networks | |

| |

| |

| |

Related Statistical Methods | |

| |

| |

| |

Projection Pursuit Regression | |

| |

| |

| |

Generalized Additive Models | |

| |

| |

| |

Bayesian Learning for ANN Models | |

| |

| |

| |

Laplace's Method | |

| |

| |

| |

Markov Chain Monte Carlo Methods | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Support Vector Machines | |

| |

| |

| |

Introduction | |

| |

| |

| |

Linear Support Vector Machines | |

| |

| |

| |

The Linearly Separable Case | |

| |

| |

| |

The Linearly Nonseparable Case | |

| |

| |

| |

Nonlinear Support Vector Machines | |

| |

| |

| |

Nonlinear Transformations | |

| |

| |

| |

The ""Kernel Trick"" | |

| |

| |

| |

Kernels and Their Properties | |

| |

| |

| |

Examples of Kernels | |

| |

| |

| |

Optimizing in Feature Space | |

| |

| |

| |

Grid Search for Parameters | |

| |

| |

| |

Example: E-mail or Spam? | |

| |

| |

| |

Binary Classification Examples | |

| |

| |

| |

SVM as a Regularization Method | |

| |

| |

| |

Multiclass Support Vector Machines | |

| |

| |

| |

Multiclass SVM as a Series of Binary Problems | |

| |

| |

| |

A True Multiclass SVM | |

| |

| |

| |

Support Vector Regression | |

| |

| |

| |

e-Insensitive Loss Functions | |

| |

| |

| |

Optimization for Linear ϵ-Insensitive Loss | |

| |

| |

| |

Extensions | |

| |

| |

| |

Optimization Algorithms for SVMs | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Cluster Analysis | |

| |

| |

| |

Introduction | |

| |

| |

| |

What Is a Cluster? | |

| |

| |

| |

Example: Old Faithful Geyser Eruptions | |

| |

| |

| |

Clustering Tasks | |

| |

| |

| |

Hierarchical Clustering | |

| |

| |

| |

Dendrogram | |

| |

| |

| |

Dissimilarity | |

| |

| |

| |

Agglomerative Nesting (agnes) | |

| |

| |

| |

A Worked Example | |

| |

| |

| |

Divisive Analysis (diana) | |

| |

| |

| |

Example: Primate Scapular Shapes | |

| |

| |

| |

Nonhierarchical or Partitioning Methods | |

| |

| |

| |

i-Means Clustering (kmeans) | |

| |

| |

| |

Partitioning Around Medoids (pam) | |

| |

| |

| |

Fuzzy Analysis (fanny) | |

| |

| |

| |

Silhouette Plot | |

| |

| |

| |

Example: Landsat Satellite Image Data | |

| |

| |

| |

Self-Organizing Maps (SOMs) | |

| |

| |

| |

The SOM Algorithm | |

| |

| |

| |

On-line Versions | |

| |

| |

| |

Batch Version | |

| |

| |

| |

Unified Distance Matrix | |

| |

| |

| |

Component Planes | |

| |

| |

| |

Clustering Variables | |

| |

| |

| |

Gene Clustering | |

| |

| |

| |

Principal Component Gene Shaving | |

| |

| |

| |

Example: Colon Cancer Data | |

| |

| |

| |

Block Clustering | |

| |

| |

| |

Two Way Clustering of Microarray Data | |

| |

| |

| |

Biclustering | |

| |

| |

| |

Plaid Models | |

| |

| |

| |

Example: Leukemia (ALL/AML) Data | |

| |

| |

| |

Clustering Based Upon Mixture Models | |

| |

| |

| |

The EM Algorithm for Finite Mixtures | |

| |

| |

| |

How Many Components? | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Multidimensional Scaling and Distance Geometry | |

| |

| |

| |

Introduction | |

| |

| |

| |

Example: Airline Distances | |

| |

| |

| |

Two Golden Oldies | |

| |

| |

| |

Example: Perceptions of Color in Human Vision | |

| |

| |

| |

Example: Confusion of Morse Code Signals | |

| |

| |

| |

Proximity Matrices | |

| |

| |

| |

Comparing Protein Sequences | |

| |

| |

| |

Optimal Sequence Alignment | |

| |

| |

| |

Example: Two Hemoglobin Chains | |

| |

| |

| |

String Matching | |

| |

| |

| |

Edit Distance | |

| |

| |

| |

Example: Employee Careers at Lloyds Bank | |

| |

| |

| |

Classical Scaling and Distance Geometry | |

| |

| |

| |

From Dissimilarities to Principal Coordinates | |

| |

| |

| |

Assessing Dimensionality | |

| |

| |

| |

Example: Airline Distances (Continued) | |

| |

| |

| |

Example: Mapping the Protein Universe | |

| |

| |

| |

Distance Scaling | |

| |

| |

| |

Metric Distance Scaling | |

| |

| |

| |

Metric Least-Squares Scaling | |

| |

| |

| |

Sammon Mapping | |

| |

| |

| |

Example: Lloyds Bank Employees | |

| |

| |

| |

Bayesian MDS | |

| |

| |

| |

Nonmetric Distance Scaling | |

| |

| |

| |

Disparities | |

| |

| |

| |

The Stress Function | |

| |

| |

| |

Fitting Nonmetric Distance-Scaling Models | |

| |

| |

| |

How Good Is an MDS Solution? | |

| |

| |

| |

How Many Dimensions? | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Committee Machines | |

| |

| |

| |

Introduction | |

| |

| |

| |

Bagging | |

| |

| |

| |

Bagging Tree-Based Classifiers | |

| |

| |

| |

Bagging Regression-Tree Predictors | |

| |

| |

| |

Boosting | |

| |

| |

| |

AdaBoost: Boosting by Reweighting | |

| |

| |

| |

Example: Aqueous Solubility in Drug Discovery | |

| |

| |

| |

Convergence Issues and Overfitting | |

| |

| |

| |

Classification Margins | |

| |

| |

| |

AdaBoost and Maximal Margins | |

| |

| |

| |

A Statistical Interpretation of AdaBoost | |

| |

| |

| |

Some Questions About AdaBoost | |

| |

| |

| |

Gradient Boosting for Regression | |

| |

| |

| |

Other Loss Functions | |

| |

| |

| |

Regularization | |

| |

| |

| |

Noisy Class Labels | |

| |

| |

| |

Random Forests | |

| |

| |

| |

Randomizing Tree Construction | |

| |

| |

| |

Generalization Error | |

| |

| |

| |

An Upper Bound on Generalization Error | |

| |

| |

| |

Example: Diagnostic Classification of Four Childhood Tumors | |

| |

| |

| |

Assessing Variable Importance | |

| |

| |

| |

Proximities for Classical Scaling | |

| |

| |

| |

Identifying Multivariate Outliers | |

| |

| |

| |

Treating Unbalanced Classes | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Latent Variable Models for Blind Source Separation | |

| |

| |

| |

Introduction | |

| |

| |

| |

Blind Source Separation and the Cocktail-Party Problem | |

| |

| |

| |

Independent Component Analysis | |

| |

| |

| |

Applications of ICA | |

| |

| |

| |

Example: Cutaneous Potential Recordings of a Pregnant Woman | |

| |

| |

| |

Connection to Projection Pursuit | |

| |

| |

| |

Centering and Sphering | |

| |

| |

| |

The General ICA Problem | |

| |

| |

| |

Linear Mixing: Noiseless ICA | |

| |

| |

| |

Identifiability Aspects | |

| |

| |

| |

Objective Functions | |

| |

| |

| |

Nonpolynomial-Based Approximations | |

| |

| |

| |

Mutual Information | |

| |

| |

| |

The FastICA Algorithm | |

| |

| |

| |

Example: Identifying Artifacts in MEG Recordings | |

| |

| |

| |

Maximum-Likelihood ICA | |

| |

| |

| |

Kernel ICA | |

| |

| |

| |

Exploratory Factor Analysis | |

| |

| |

| |

The Factor Analysis Model | |

| |

| |

| |

Principal Components FA | |

| |

| |

| |

Maximum-Likelihood FA | |

| |

| |

| |

Example: Twenty-four Psychological Tests | |

| |

| |

| |

Critiques of MLFA | |

| |

| |

| |

Confirmatory Factor Analysis | |

| |

| |

| |

Independent Factor Analysis | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Nonlinear Dimensionality Reduction and Manifold Learning | |

| |

| |

| |

Introduction | |

| |

| |

| |

Polynomial PCA | |

| |

| |

| |

Principal Curves and Surfaces | |

| |

| |

| |

Curves and Curvature | |

| |

| |

| |

Principal Curves | |

| |

| |

| |

Projection-Expectation Algorithm | |

| |

| |

| |

Bias Reduction | |

| |

| |

| |

Principal Surfaces | |

| |

| |

| |

Multilayer Autoassociative Neural Networks | |

| |

| |

| |

Main Features of the Network | |

| |

| |

| |

Relationship to Principal Curves | |

| |

| |

| |

Kernel PCA | |

| |

| |

| |

PCA in Feature Space | |

| |

| |

| |

Centering in Feature Space | |

| |

| |

| |

Example: Food Nutrition (Continued) | |

| |

| |

| |

Kernel PCA and Metric MDS | |

| |

| |

| |

Nonlinear Manifold Learning | |

| |

| |

| |

Manifolds | |

| |

| |

| |

Data on Manifolds | |

| |

| |

| |

Isomap | |

| |

| |

| |

Local Linear Embedding | |

| |

| |

| |

Laplacian Eigenmaps | |

| |

| |

| |

Hessian Eigenmaps | |

| |

| |

| |

Other Methods | |

| |

| |

| |

Relationships to Kernel PCA | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

| |

Correspondence Analysis | |

| |

| |

| |

Introduction | |

| |

| |

| |

Example: Shoplifting in The Netherlands | |

| |

| |

| |

Simple Correspondence Analysis | |

| |

| |

| |

Two-Way Contingency Tables | |

| |

| |

| |

Row and Column Dummy Variables | |

| |

| |

| |

Example: Hair Color and Eye Color | |

| |

| |

| |

Profiles, Masses, and Centroids | |

| |

| |

| |

Chi-squared Distances | |

| |

| |

| |

Total Inertia and Its Decomposition | |

| |

| |

| |

Principal Coordinates for Row and Column Profiles | |

| |

| |

| |

Graphical Displays | |

| |

| |

| |

Square Asymmetric Contingency Tables | |

| |

| |

| |

Example: Occupational Mobility in England | |

| |

| |

| |

Multiple Correspondence Analysis | |

| |

| |

| |

The Multivariate Indicator Matrix | |

| |

| |

| |

The Burt Matrix | |

| |

| |

| |

Equivalence and an Implication | |

| |

| |

| |

Example: Satisfaction with Housing Conditions | |

| |

| |

| |

A Weighted Least-Squares Approach | |

| |

| |

| |

Software Packages | |

| |

| |

Bibliographical Notes | |

| |

| |

Exercises | |

| |

| |

References | |

| |

| |

Index of Examples | |

| |

| |

Author Index | |

| |

| |

Subject Index | |