Skip to content

Principles and Theory for Data Mining and Machine Learning

Best in textbook rentals since 2012!

ISBN-10: 0387981349

ISBN-13: 9780387981345

Edition: 2009

Authors: Bertrand Clarke, Ernest Fokoue, Hao Helen Zhang

List price: $219.99
Shipping box This item qualifies for FREE shipping.
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!


It's time for a high math level treatment of the basic techniques that are on the interface of Stats and Compsci, or data mining and machine learning more specifically. People are using these techniques and really have little idea why they work, how they inter-relate with other techniques, and what their general properties are. This is a more theoretical book on the same subject as the book on statistical learning by Hastie/Tibshirani/Friedman.
Customers also bought

Book details

List price: $219.99
Copyright year: 2009
Publisher: Springer New York
Publication date: 7/30/2009
Binding: Hardcover
Pages: 786
Size: 6.10" wide x 9.25" long x 1.50" tall
Weight: 2.750
Language: English

Variability, Information, and Prediction
The Curse of Dimensionality
The Two Extremes
Perspectives on the Curse
Exploding Numbers of Models
Multicollinearity and Concurvity
The Effect of Noise
Coping with the Curse
Selecting Design Points
Local Dimension
Two Techniques
The Bootstrap
Optimization and Search
Univariate Search
Multivariate Search
General Searches
Constraint Satisfaction and Combinatorial Search
Hammersley Points
Edgeworth Expansions for the Mean
Bootstrap Asymptotics for the Studentized Mean
Local Smoothers
Early Smoothers
Transition to Classical Smoothers
Global Versus Local Approximations
Kernel Smoothers
Statistical Function Approximation
The Concept of Kernel Methods and the Discrete Case
Kernels and Stochastic Designs: Density Estimation
Stochastic Designs: Asymptotics for Kernel Smoothers
Convergence Theorems and Rates for Kernel Smoothers
Kernel and Bandwidth Selection
Linear Smoothers
Nearest Neighbors
Applications of Kernel Regression
A Simulated Example
Ethanol Data
Spline Smoothing
Interpolating Splines
Natural Cubic Splines
Smoothing Splines for Regression
Model Selection for Spline Smoothing
Spline Smoothing Meets Kernel Smoothing
Asymptotic Bias, Variance, and MISE for Spline Smoothers
Ethanol Data Example - Continued
Splines Redux: Hilbert Space Formulation
Reproducing Kernels
Constructing an RKHS
Direct Sum Construction for Splines
Explicit Forms
Nonparametrics in Data Mining and Machine Learning
Simulated Comparisons
What Happens with Dependent Noise Models?
Higher Dimensions and the Curse of Dimensionality
Sobolev Spaces: Definition
New Wave Nonparametrics
Additive Models
The Backfitting Algorithm
Concurvity and Inference
Nonparametric Optimality
Generalized Additive Models
Projection Pursuit Regression
Neural Networks
Backpropagation and Inference
Barron's Result and the Curse
Approximation Properties
Barron's Theorem: Formal Statement
Recursive Partitioning Regression
Growing Trees
Pruning and Selection
Bayesian Additive Regression Trees: BART
Sliced Inverse Regression
Proof of Barron's Theorem
Supervised Learning: Partition Methods
Multiclass Learning
Discriminant Analysis
Distance-Based Discriminant Analysis
Bayes Rules
Probability-Based Discriminant Analysis
Tree-Based Classifiers
Splitting Rules
Logic Trees
Random Forests
Support Vector Machines
Margins and Distances
Binary Classification and Risk
Prediction Bounds for Function Classes
Constructing SVM Classifiers
SVM Classification for Nonlinearly Separable Populations
SVMs in the General Nonlinear Case
Some Kernels Used in SVM Classification
Kernel Choice, SVMs and Model Selection
Support Vector Regression
Multiclass Support Vector Machines
Neural Networks
Hoeffding's Inequality
VC Dimension
Alternative Nonparametrics
Ensemble Methods
Bayes Model Averaging
Other Averaging Methods
Oracle Inequalities
Bayes Nonparametrics
Dirichlet Process Priors
Polya Tree Priors
Gaussian Process Priors
The Relevance Vector Machine
RVM Regression: Formal Description
RVM Classification
Hidden Markov Models - Sequential Classification
Proof of Yang's Oracle Inequality
Proof of Lecue's Oracle Inequality
Computational Comparisons
Computational Results: Classification
Comparison on Fisher's Iris Data
Comparison on Ripley's Data
Computational Results: Regression
Vapnik's sinc Function
Friedman's Function
Systematic Simulation Study
No Free Lunch
Unsupervised Learning: Clustering
Centroid-Based Clustering
K-Means Clustering
Hierarchical Clustering
Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering
Theory for Hierarchical Clustering
Partitional Clustering
Model-Based Clustering
Graph-Theoretic Clustering
Spectral Clustering
Bayesian Clustering
Probabilistic Clustering
Hypothesis Testing
Computed Examples
Ripley's Data
Iris Data
Cluster Validation
Derivatives of Functions of a Matrix
Kruskal's Algorithm: Proof
Prim's Algorithm: Proof
Learning in High Dimensions
Principal Components
Main Theorem
Key Properties
Factor Analysis
Finding � and �
Finding K
Estimating Factor Scores
Projection Pursuit
Independent Components Analysis
Main Definitions
Key Results
Computational Approach
Nonlinear PCs and ICA
Nonlinear PCs
Nonlinear ICA
Geometric Summarization
Measuring Distances to an Algebraic Shape
Principal Curves and Surfaces
Supervised Dimension Reduction: Partial Least Squares
Simple PLS
PLS Procedures
Properties of PLS
Supervised Dimension Reduction: Sufficient Dimensions in Regression
Visualization I: Basic Plots
Elementary Visualization
Time Dependence
Visualization II: Transformations
Chernoff Faces
Multidimensional Scaling
Self-Organizing Maps
Variable Selection
Concepts from Linear Regression
Subset Selection
Variable Ranking
Traditional Criteria
Akaike Information Criterion (AIC)
Bayesian Information Criterion (BIC)
Choices of Information Criteria
Cross Validation
Shrinkage Methods
Shrinkage Methods for Linear Models
Grouping in Variable Selection
Least Angle Regression
Shrinkage Methods for Model Classes
Cautionary Notes
Bayes Variable Selection
Prior Specification
Posterior Calculation and Exploration
Evaluating Evidence
Connections Between Bayesian and Frequentist Methods
Computational Comparisons
The n>p Case
When p>n
Code for Generating Data in Section 10.5
Multiple Testing
Analyzing the Hypothesis Testing Problem
A Paradigmatic Setting
Counts for Multiple Tests
Measures of Error in Multiple Testing
Aspects of Error Control
Controlling the Familywise Error Rate
One-Step Adjustments
Stepwise p-Value Adjustments
Null Domination
Two Procedures
Controlling the Type I Error Rate
Adjusted p-Values for PFER/PCER
Controlling the False Discovery Rate
FDR and other Measures of Error
The Benjamini-Hochberg Procedure
A BH Theorem for a Dependent Setting
Variations on BH
Controlling the Positive False Discovery Rate
Bayesian Interpretations
Aspects of Implementation
Bayesian Multiple Testing
Fully Bayes: Hierarchical
Fully Bayes: Decision theory
Proof of the Benjamini-Hochberg Theorem
Proof of the Benjamini-Yekutieli Theorem