| |
| |
Preface | |
| |
| |
Notation | |
| |
| |
| |
Introduction | |
| |
| |
| |
Learning and Statistical Estimation | |
| |
| |
| |
Statistical Dependency and Causality | |
| |
| |
| |
Characterization of Variables | |
| |
| |
| |
Characterization of Uncertainty | |
| |
| |
| |
Predictive Learning versus Other Data Analytical Methodologies | |
| |
| |
| |
Problem Statement, Classical Approaches, and Adaptive Learning | |
| |
| |
| |
Formulation of the Learning Problem | |
| |
| |
| |
Objective of Learning | |
| |
| |
| |
Common Learning Tasks | |
| |
| |
| |
Scope of the Learning Problem Formulation | |
| |
| |
| |
Classical Approaches | |
| |
| |
| |
Density Estimation | |
| |
| |
| |
Classification | |
| |
| |
| |
Regression | |
| |
| |
| |
Solving Problems with Finite Data | |
| |
| |
| |
Nonparametric Methods | |
| |
| |
| |
Stochastic Approximation | |
| |
| |
| |
Adaptive Learning: Concepts and Inductive Principles | |
| |
| |
| |
Philosophy, Major Concepts, and Issues | |
| |
| |
| |
A Priori Knowledge and Model Complexity | |
| |
| |
| |
Inductive Principles | |
| |
| |
| |
Alternative Learning Formulations | |
| |
| |
| |
Summary | |
| |
| |
| |
Regularization Framework | |
| |
| |
| |
Curse and Complexity of Dimensionality | |
| |
| |
| |
Function Approximation and Characterization of Complexity | |
| |
| |
| |
Penalization | |
| |
| |
| |
Parametric Penalties | |
| |
| |
| |
Nonparametric Penalties | |
| |
| |
| |
Model Selection (Complexity Control) | |
| |
| |
| |
Analytical Model Selection Criteria | |
| |
| |
| |
Model Selection via Resampling | |
| |
| |
| |
Bias-Variance Tradeoff | |
| |
| |
| |
Example of Model Selection | |
| |
| |
| |
Function Approximation versus Predictive Learning | |
| |
| |
| |
Summary | |
| |
| |
| |
Statistical Learning Theory | |
| |
| |
| |
Conditions for Consistency and Convergence of ERM | |
| |
| |
| |
Growth Function and VC Dimension | |
| |
| |
| |
VC Dimension for Classification and Regression Problems | |
| |
| |
| |
Examples of Calculating VC Dimension | |
| |
| |
| |
Bounds on the Generalization | |
| |
| |
| |
Classification | |
| |
| |
| |
Regression | |
| |
| |
| |
Generalization Bounds and Sampling Theorem | |
| |
| |
| |
Structural Risk Minimization | |
| |
| |
| |
Dictionary Representation | |
| |
| |
| |
Feature Selection | |
| |
| |
| |
Penalization Formulation | |
| |
| |
| |
Input Preprocessing | |
| |
| |
| |
Initial Conditions for Training Algorithm | |
| |
| |
| |
Comparisons of Model Selection for Regression | |
| |
| |
| |
Model Selection for Linear Estimators | |
| |
| |
| |
Model Selection for k-Nearest-Neighbor Regression | |
| |
| |
| |
Model Selection for Linear Subset Regression | |
| |
| |
| |
Discussion | |
| |
| |
| |
Measuring the VC Dimension | |
| |
| |
| |
VC Dimension, Occam's Razor, and Popper's Falsifiability | |
| |
| |
| |
Summary and Discussion | |
| |
| |
| |
Nonlinear Optimization Strategies | |
| |
| |
| |
Stochastic Approximation Methods | |
| |
| |
| |
Linear Parameter Estimation | |
| |
| |
| |
Backpropagation Training of MLP Networks | |
| |
| |
| |
Iterative Methods | |
| |
| |
| |
EM Methods for Density Estimation | |
| |
| |
| |
Generalized Inverse Training of MLP Networks | |
| |
| |
| |
Greedy Optimization | |
| |
| |
| |
Neural Network Construction Algorithms | |
| |
| |
| |
Classification and Regression Trees | |
| |
| |
| |
Feature Selection, Optimization, and Statistical Learning Theory | |
| |
| |
| |
Summary | |
| |
| |
| |
Methods for Data Reduction and Dimensionality Reduction | |
| |
| |
| |
Vector Quantization and Clustering | |
| |
| |
| |
Optimal Source Coding in Vector Quantization | |
| |
| |
| |
Generalized Lloyd Algorithm | |
| |
| |
| |
Clustering | |
| |
| |
| |
EM Algorithm for VQ and Clustering | |
| |
| |
| |
Fuzzy Clustering | |
| |
| |
| |
Dimensionality Reduction: Statistical Methods | |
| |
| |
| |
Linear Principal Components | |
| |
| |
| |
Principal Curves and Surfaces | |
| |
| |
| |
Multidimensional Scaling | |
| |
| |
| |
Dimensionality Reduction: Neural Network Methods | |
| |
| |
| |
Discrete Principal Curves and Self-Organizing Map Algorithm | |
| |
| |
| |
Statistical Interpretation of the SOM Method | |
| |
| |
| |
Flow-Through Version of the SOM and Learning Rate Schedules | |
| |
| |
| |