| |
| |
Preface | |
| |
| |
| |
Variability, Information, and Prediction | |
| |
| |
| |
The Curse of Dimensionality | |
| |
| |
| |
The Two Extremes | |
| |
| |
| |
Perspectives on the Curse | |
| |
| |
| |
Sparsity | |
| |
| |
| |
Exploding Numbers of Models | |
| |
| |
| |
Multicollinearity and Concurvity | |
| |
| |
| |
The Effect of Noise | |
| |
| |
| |
Coping with the Curse | |
| |
| |
| |
Selecting Design Points | |
| |
| |
| |
Local Dimension | |
| |
| |
| |
Parsimony | |
| |
| |
| |
Two Techniques | |
| |
| |
| |
The Bootstrap | |
| |
| |
| |
Cross-Validation | |
| |
| |
| |
Optimization and Search | |
| |
| |
| |
Univariate Search | |
| |
| |
| |
Multivariate Search | |
| |
| |
| |
General Searches | |
| |
| |
| |
Constraint Satisfaction and Combinatorial Search | |
| |
| |
| |
Notes | |
| |
| |
| |
Hammersley Points | |
| |
| |
| |
Edgeworth Expansions for the Mean | |
| |
| |
| |
Bootstrap Asymptotics for the Studentized Mean | |
| |
| |
| |
Exercises | |
| |
| |
| |
Local Smoothers | |
| |
| |
| |
Early Smoothers | |
| |
| |
| |
Transition to Classical Smoothers | |
| |
| |
| |
Global Versus Local Approximations | |
| |
| |
| |
LOESS | |
| |
| |
| |
Kernel Smoothers | |
| |
| |
| |
Statistical Function Approximation | |
| |
| |
| |
The Concept of Kernel Methods and the Discrete Case | |
| |
| |
| |
Kernels and Stochastic Designs: Density Estimation | |
| |
| |
| |
Stochastic Designs: Asymptotics for Kernel Smoothers | |
| |
| |
| |
Convergence Theorems and Rates for Kernel Smoothers | |
| |
| |
| |
Kernel and Bandwidth Selection | |
| |
| |
| |
Linear Smoothers | |
| |
| |
| |
Nearest Neighbors | |
| |
| |
| |
Applications of Kernel Regression | |
| |
| |
| |
A Simulated Example | |
| |
| |
| |
Ethanol Data | |
| |
| |
| |
Exercises | |
| |
| |
| |
Spline Smoothing | |
| |
| |
| |
Interpolating Splines | |
| |
| |
| |
Natural Cubic Splines | |
| |
| |
| |
Smoothing Splines for Regression | |
| |
| |
| |
Model Selection for Spline Smoothing | |
| |
| |
| |
Spline Smoothing Meets Kernel Smoothing | |
| |
| |
| |
Asymptotic Bias, Variance, and MISE for Spline Smoothers | |
| |
| |
| |
Ethanol Data Example - Continued | |
| |
| |
| |
Splines Redux: Hilbert Space Formulation | |
| |
| |
| |
Reproducing Kernels | |
| |
| |
| |
Constructing an RKHS | |
| |
| |
| |
Direct Sum Construction for Splines | |
| |
| |
| |
Explicit Forms | |
| |
| |
| |
Nonparametrics in Data Mining and Machine Learning | |
| |
| |
| |
Simulated Comparisons | |
| |
| |
| |
What Happens with Dependent Noise Models? | |
| |
| |
| |
Higher Dimensions and the Curse of Dimensionality | |
| |
| |
| |
Notes | |
| |
| |
| |
Sobolev Spaces: Definition | |
| |
| |
| |
Exercises | |
| |
| |
| |
New Wave Nonparametrics | |
| |
| |
| |
Additive Models | |
| |
| |
| |
The Backfitting Algorithm | |
| |
| |
| |
Concurvity and Inference | |
| |
| |
| |
Nonparametric Optimality | |
| |
| |
| |
Generalized Additive Models | |
| |
| |
| |
Projection Pursuit Regression | |
| |
| |
| |
Neural Networks | |
| |
| |
| |
Backpropagation and Inference | |
| |
| |
| |
Barron's Result and the Curse | |
| |
| |
| |
Approximation Properties | |
| |
| |
| |
Barron's Theorem: Formal Statement | |
| |
| |
| |
Recursive Partitioning Regression | |
| |
| |
| |
Growing Trees | |
| |
| |
| |
Pruning and Selection | |
| |
| |
| |
Regression | |
| |
| |
| |
Bayesian Additive Regression Trees: BART | |
| |
| |
| |
MARS | |
| |
| |
| |
Sliced Inverse Regression | |
| |
| |
| |
ACE and AVAS | |
| |
| |
| |
Notes | |
| |
| |
| |
Proof of Barron's Theorem | |
| |
| |
| |
Exercises | |
| |
| |
| |
Supervised Learning: Partition Methods | |
| |
| |
| |
Multiclass Learning | |
| |
| |
| |
Discriminant Analysis | |
| |
| |
| |
Distance-Based Discriminant Analysis | |
| |
| |
| |
Bayes Rules | |
| |
| |
| |
Probability-Based Discriminant Analysis | |
| |
| |
| |
Tree-Based Classifiers | |
| |
| |
| |
Splitting Rules | |
| |
| |
| |
Logic Trees | |
| |
| |
| |
Random Forests | |
| |
| |
| |
Support Vector Machines | |
| |
| |
| |
Margins and Distances | |
| |
| |
| |
Binary Classification and Risk | |
| |
| |
| |
Prediction Bounds for Function Classes | |
| |
| |
| |
Constructing SVM Classifiers | |
| |
| |
| |
SVM Classification for Nonlinearly Separable Populations | |
| |
| |
| |
SVMs in the General Nonlinear Case | |
| |
| |
| |
Some Kernels Used in SVM Classification | |
| |
| |
| |
Kernel Choice, SVMs and Model Selection | |
| |
| |
| |
Support Vector Regression | |
| |
| |
| |
Multiclass Support Vector Machines | |
| |
| |
| |
Neural Networks | |
| |
| |
| |
Notes | |
| |
| |
| |
Hoeffding's Inequality | |
| |
| |
| |
VC Dimension | |
| |
| |
| |
Exercises | |
| |
| |
| |
Alternative Nonparametrics | |
| |
| |
| |
Ensemble Methods | |
| |
| |
| |
Bayes Model Averaging | |
| |
| |
| |
Bagging | |
| |
| |
| |
Stacking | |
| |
| |
| |
Boosting | |
| |
| |
| |
Other Averaging Methods | |
| |
| |
| |
Oracle Inequalities | |
| |
| |
| |
Bayes Nonparametrics | |
| |
| |
| |
Dirichlet Process Priors | |
| |
| |
| |
Polya Tree Priors | |
| |
| |
| |
Gaussian Process Priors | |
| |
| |
| |
The Relevance Vector Machine | |
| |
| |
| |
RVM Regression: Formal Description | |
| |
| |
| |
RVM Classification | |
| |
| |
| |
Hidden Markov Models - Sequential Classification | |
| |
| |
| |
Notes | |
| |
| |
| |
Proof of Yang's Oracle Inequality | |
| |
| |
| |
Proof of Lecue's Oracle Inequality | |
| |
| |
| |
Exercises | |
| |
| |
| |
Computational Comparisons | |
| |
| |
| |
Computational Results: Classification | |
| |
| |
| |
Comparison on Fisher's Iris Data | |
| |
| |
| |
Comparison on Ripley's Data | |
| |
| |
| |
Computational Results: Regression | |
| |
| |
| |
Vapnik's sinc Function | |
| |
| |
| |
Friedman's Function | |
| |
| |
| |
Conclusions | |
| |
| |
| |
Systematic Simulation Study | |
| |
| |
| |
No Free Lunch | |
| |
| |
| |
Exercises | |
| |
| |
| |
Unsupervised Learning: Clustering | |
| |
| |
| |
Centroid-Based Clustering | |
| |
| |
| |
K-Means Clustering | |
| |
| |
| |
Variants | |
| |
| |
| |
Hierarchical Clustering | |
| |
| |
| |
Agglomerative Hierarchical Clustering | |
| |
| |
| |
Divisive Hierarchical Clustering | |
| |
| |
| |
Theory for Hierarchical Clustering | |
| |
| |
| |
Partitional Clustering | |
| |
| |
| |
Model-Based Clustering | |
| |
| |
| |
Graph-Theoretic Clustering | |
| |
| |
| |
Spectral Clustering | |
| |
| |
| |
Bayesian Clustering | |
| |
| |
| |
Probabilistic Clustering | |
| |
| |
| |
Hypothesis Testing | |
| |
| |
| |
Computed Examples | |
| |
| |
| |
Ripley's Data | |
| |
| |
| |
Iris Data | |
| |
| |
| |
Cluster Validation | |
| |
| |
| |
Notes | |
| |
| |
| |
Derivatives of Functions of a Matrix | |
| |
| |
| |
Kruskal's Algorithm: Proof | |
| |
| |
| |
Prim's Algorithm: Proof | |
| |
| |
| |
Exercises | |
| |
| |
| |
Learning in High Dimensions | |
| |
| |
| |
Principal Components | |
| |
| |
| |
Main Theorem | |
| |
| |
| |
Key Properties | |
| |
| |
| |
Extensions | |
| |
| |
| |
Factor Analysis | |
| |
| |
| |
Finding � and � | |
| |
| |
| |
Finding K | |
| |
| |
| |
Estimating Factor Scores | |
| |
| |
| |
Projection Pursuit | |
| |
| |
| |
Independent Components Analysis | |
| |
| |
| |
Main Definitions | |
| |
| |
| |
Key Results | |
| |
| |
| |
Computational Approach | |
| |
| |
| |
Nonlinear PCs and ICA | |
| |
| |
| |
Nonlinear PCs | |
| |
| |
| |
Nonlinear ICA | |
| |
| |
| |
Geometric Summarization | |
| |
| |
| |
Measuring Distances to an Algebraic Shape | |
| |
| |
| |
Principal Curves and Surfaces | |
| |
| |
| |
Supervised Dimension Reduction: Partial Least Squares | |
| |
| |
| |
Simple PLS | |
| |
| |
| |
PLS Procedures | |
| |
| |
| |
Properties of PLS | |
| |
| |
| |
Supervised Dimension Reduction: Sufficient Dimensions in Regression | |
| |
| |
| |
Visualization I: Basic Plots | |
| |
| |
| |
Elementary Visualization | |
| |
| |
| |
Projections | |
| |
| |
| |
Time Dependence | |
| |
| |
| |
Visualization II: Transformations | |
| |
| |
| |
Chernoff Faces | |
| |
| |
| |
Multidimensional Scaling | |
| |
| |
| |
Self-Organizing Maps | |
| |
| |
| |
Exercises | |
| |
| |
| |
Variable Selection | |
| |
| |
| |
Concepts from Linear Regression | |
| |
| |
| |
Subset Selection | |
| |
| |
| |
Variable Ranking | |
| |
| |
| |
Overview | |
| |
| |
| |
Traditional Criteria | |
| |
| |
| |
Akaike Information Criterion (AIC) | |
| |
| |
| |
Bayesian Information Criterion (BIC) | |
| |
| |
| |
Choices of Information Criteria | |
| |
| |
| |
Cross Validation | |
| |
| |
| |
Shrinkage Methods | |
| |
| |
| |
Shrinkage Methods for Linear Models | |
| |
| |
| |
Grouping in Variable Selection | |
| |
| |
| |
Least Angle Regression | |
| |
| |
| |
Shrinkage Methods for Model Classes | |
| |
| |
| |
Cautionary Notes | |
| |
| |
| |
Bayes Variable Selection | |
| |
| |
| |
Prior Specification | |
| |
| |
| |
Posterior Calculation and Exploration | |
| |
| |
| |
Evaluating Evidence | |
| |
| |
| |
Connections Between Bayesian and Frequentist Methods | |
| |
| |
| |
Computational Comparisons | |
| |
| |
| |
The n>p Case | |
| |
| |
| |
When p>n | |
| |
| |
| |
Notes | |
| |
| |
| |
Code for Generating Data in Section 10.5 | |
| |
| |
| |
Exercises | |
| |
| |
| |
Multiple Testing | |
| |
| |
| |
Analyzing the Hypothesis Testing Problem | |
| |
| |
| |
A Paradigmatic Setting | |
| |
| |
| |
Counts for Multiple Tests | |
| |
| |
| |
Measures of Error in Multiple Testing | |
| |
| |
| |
Aspects of Error Control | |
| |
| |
| |
Controlling the Familywise Error Rate | |
| |
| |
| |
One-Step Adjustments | |
| |
| |
| |
Stepwise p-Value Adjustments | |
| |
| |
| |
PCER and PFER | |
| |
| |
| |
Null Domination | |
| |
| |
| |
Two Procedures | |
| |
| |
| |
Controlling the Type I Error Rate | |
| |
| |
| |
Adjusted p-Values for PFER/PCER | |
| |
| |
| |
Controlling the False Discovery Rate | |
| |
| |
| |
FDR and other Measures of Error | |
| |
| |
| |
The Benjamini-Hochberg Procedure | |
| |
| |
| |
A BH Theorem for a Dependent Setting | |
| |
| |
| |
Variations on BH | |
| |
| |
| |
Controlling the Positive False Discovery Rate | |
| |
| |
| |
Bayesian Interpretations | |
| |
| |
| |
Aspects of Implementation | |
| |
| |
| |
Bayesian Multiple Testing | |
| |
| |
| |
Fully Bayes: Hierarchical | |
| |
| |
| |
Fully Bayes: Decision theory | |
| |
| |
| |
Notes | |
| |
| |
| |
Proof of the Benjamini-Hochberg Theorem | |
| |
| |
| |
Proof of the Benjamini-Yekutieli Theorem | |
| |
| |
References | |
| |
| |
Index | |