Principles and Theory for Data Mining and Machine Learning

Name: Principles and Theory for Data Mining and Machine Learning
Price: 85.28 USD
Availability: InStock
ISBN: 9780387981345

ISBN-10: 0387981349

ISBN-13: 9780387981345

Edition: 2009

Authors: Bertrand Clarke, Ernest Fokoue, Hao Helen Zhang

List price: $249.99

This item qualifies for FREE shipping.

30 day, 100% satisfaction guarantee!

Buy new: $274.99

Marketplace

3 new & used from $85.28

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

It's time for a high math level treatment of the basic techniques that are on the interface of Stats and Compsci, or data mining and machine learning more specifically. People are using these techniques and really have little idea why they work, how they inter-relate with other techniques, and what their general properties are. This is a more theoretical book on the same subject as the book on statistical learning by Hastie/Tibshirani/Friedman.

Book details

List price: $249.99
Copyright year: 2009
Publisher: Springer New York
Publication date: 7/30/2009
Binding: Hardcover
Pages: 786
Size: 6.10" wide x 9.25" long x 1.50" tall
Weight: 2.750
Language: English



Preface



Variability, Information, and Prediction



The Curse of Dimensionality



The Two Extremes



Perspectives on the Curse



Sparsity



Exploding Numbers of Models



Multicollinearity and Concurvity



The Effect of Noise



Coping with the Curse



Selecting Design Points



Local Dimension



Parsimony



Two Techniques



The Bootstrap



Cross-Validation



Optimization and Search



Univariate Search



Multivariate Search



General Searches



Constraint Satisfaction and Combinatorial Search



Notes



Hammersley Points



Edgeworth Expansions for the Mean



Bootstrap Asymptotics for the Studentized Mean



Exercises



Local Smoothers



Early Smoothers



Transition to Classical Smoothers



Global Versus Local Approximations



LOESS



Kernel Smoothers



Statistical Function Approximation



The Concept of Kernel Methods and the Discrete Case



Kernels and Stochastic Designs: Density Estimation



Stochastic Designs: Asymptotics for Kernel Smoothers



Convergence Theorems and Rates for Kernel Smoothers



Kernel and Bandwidth Selection



Linear Smoothers



Nearest Neighbors



Applications of Kernel Regression



A Simulated Example



Ethanol Data



Exercises



Spline Smoothing



Interpolating Splines



Natural Cubic Splines



Smoothing Splines for Regression



Model Selection for Spline Smoothing



Spline Smoothing Meets Kernel Smoothing



Asymptotic Bias, Variance, and MISE for Spline Smoothers



Ethanol Data Example - Continued



Splines Redux: Hilbert Space Formulation



Reproducing Kernels



Constructing an RKHS



Direct Sum Construction for Splines



Explicit Forms



Nonparametrics in Data Mining and Machine Learning



Simulated Comparisons



What Happens with Dependent Noise Models?



Higher Dimensions and the Curse of Dimensionality



Notes



Sobolev Spaces: Definition



Exercises



New Wave Nonparametrics



Additive Models



The Backfitting Algorithm



Concurvity and Inference



Nonparametric Optimality



Generalized Additive Models



Projection Pursuit Regression



Neural Networks



Backpropagation and Inference



Barron's Result and the Curse



Approximation Properties



Barron's Theorem: Formal Statement



Recursive Partitioning Regression



Growing Trees



Pruning and Selection



Regression



Bayesian Additive Regression Trees: BART



MARS



Sliced Inverse Regression



ACE and AVAS



Notes



Proof of Barron's Theorem



Exercises



Supervised Learning: Partition Methods



Multiclass Learning



Discriminant Analysis



Distance-Based Discriminant Analysis



Bayes Rules



Probability-Based Discriminant Analysis



Tree-Based Classifiers



Splitting Rules



Logic Trees



Random Forests



Support Vector Machines



Margins and Distances



Binary Classification and Risk



Prediction Bounds for Function Classes



Constructing SVM Classifiers



SVM Classification for Nonlinearly Separable Populations



SVMs in the General Nonlinear Case



Some Kernels Used in SVM Classification



Kernel Choice, SVMs and Model Selection



Support Vector Regression



Multiclass Support Vector Machines



Neural Networks



Notes



Hoeffding's Inequality



VC Dimension



Exercises



Alternative Nonparametrics



Ensemble Methods



Bayes Model Averaging



Bagging



Stacking



Boosting



Other Averaging Methods



Oracle Inequalities



Bayes Nonparametrics



Dirichlet Process Priors



Polya Tree Priors



Gaussian Process Priors



The Relevance Vector Machine



RVM Regression: Formal Description



RVM Classification



Hidden Markov Models - Sequential Classification



Notes



Proof of Yang's Oracle Inequality



Proof of Lecue's Oracle Inequality



Exercises



Computational Comparisons



Computational Results: Classification



Comparison on Fisher's Iris Data



Comparison on Ripley's Data



Computational Results: Regression



Vapnik's sinc Function



Friedman's Function



Conclusions



Systematic Simulation Study



No Free Lunch



Exercises



Unsupervised Learning: Clustering



Centroid-Based Clustering



K-Means Clustering



Variants



Hierarchical Clustering



Agglomerative Hierarchical Clustering



Divisive Hierarchical Clustering



Theory for Hierarchical Clustering



Partitional Clustering



Model-Based Clustering



Graph-Theoretic Clustering



Spectral Clustering



Bayesian Clustering



Probabilistic Clustering



Hypothesis Testing



Computed Examples



Ripley's Data



Iris Data



Cluster Validation



Notes



Derivatives of Functions of a Matrix



Kruskal's Algorithm: Proof



Prim's Algorithm: Proof



Exercises



Learning in High Dimensions



Principal Components



Main Theorem



Key Properties



Extensions



Factor Analysis



Finding ï¿½ and ï¿½



Finding K



Estimating Factor Scores



Projection Pursuit



Independent Components Analysis



Main Definitions



Key Results



Computational Approach



Nonlinear PCs and ICA



Nonlinear PCs



Nonlinear ICA



Geometric Summarization



Measuring Distances to an Algebraic Shape



Principal Curves and Surfaces



Supervised Dimension Reduction: Partial Least Squares



Simple PLS



PLS Procedures



Properties of PLS



Supervised Dimension Reduction: Sufficient Dimensions in Regression



Visualization I: Basic Plots



Elementary Visualization



Projections



Time Dependence



Visualization II: Transformations



Chernoff Faces



Multidimensional Scaling



Self-Organizing Maps



Exercises



Variable Selection



Concepts from Linear Regression



Subset Selection



Variable Ranking



Overview



Traditional Criteria



Akaike Information Criterion (AIC)



Bayesian Information Criterion (BIC)



Choices of Information Criteria



Cross Validation



Shrinkage Methods



Shrinkage Methods for Linear Models



Grouping in Variable Selection



Least Angle Regression



Shrinkage Methods for Model Classes



Cautionary Notes



Bayes Variable Selection



Prior Specification



Posterior Calculation and Exploration



Evaluating Evidence



Connections Between Bayesian and Frequentist Methods



Computational Comparisons



The n>p Case



When p>n



Notes



Code for Generating Data in Section 10.5



Exercises



Multiple Testing



Analyzing the Hypothesis Testing Problem



A Paradigmatic Setting



Counts for Multiple Tests



Measures of Error in Multiple Testing



Aspects of Error Control



Controlling the Familywise Error Rate



One-Step Adjustments



Stepwise p-Value Adjustments



PCER and PFER



Null Domination



Two Procedures



Controlling the Type I Error Rate



Adjusted p-Values for PFER/PCER



Controlling the False Discovery Rate



FDR and other Measures of Error



The Benjamini-Hochberg Procedure



A BH Theorem for a Dependent Setting



Variations on BH



Controlling the Positive False Discovery Rate



Bayesian Interpretations



Aspects of Implementation



Bayesian Multiple Testing



Fully Bayes: Hierarchical



Fully Bayes: Decision theory



Notes



Proof of the Benjamini-Hochberg Theorem



Proof of the Benjamini-Yekutieli Theorem


References


Index