| |
| |
Preface | |
| |
| |
| |
Introduction | |
| |
| |
| |
Applications and problems | |
| |
| |
| |
Definitions and terminology | |
| |
| |
| |
Cross-validation | |
| |
| |
| |
Learning scenarios | |
| |
| |
| |
Outline | |
| |
| |
| |
The PAC Learning Framework | |
| |
| |
| |
The PAC learning model | |
| |
| |
| |
Guarantees for finite hypothesis sets - consistent case | |
| |
| |
| |
Guarantees for finite hypothesis sets - inconsistent case | |
| |
| |
| |
Generalities | |
| |
| |
| |
Deterministic versus stochastic scenarios | |
| |
| |
| |
Bayes error and noise | |
| |
| |
| |
Estimation and approximation errors | |
| |
| |
| |
Model selection | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Rademacher Complexity and VC-Dimension | |
| |
| |
| |
Rademacher complexity | |
| |
| |
| |
Growth function | |
| |
| |
| |
VC-dimension | |
| |
| |
| |
Lower bounds | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Support Vector Machines | |
| |
| |
| |
Linear classification | |
| |
| |
| |
SVMs - separable case | |
| |
| |
| |
Primal optimization problem | |
| |
| |
| |
Support vectors | |
| |
| |
| |
Dual optimization problem | |
| |
| |
| |
Leave-one-out analysis | |
| |
| |
| |
SVMs - non-separable case | |
| |
| |
| |
Primal optimization problem | |
| |
| |
| |
Support vectors | |
| |
| |
| |
Dual optimization problem | |
| |
| |
| |
Margin theory | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Kernel Methods | |
| |
| |
| |
Introduction | |
| |
| |
| |
Positive definite symmetric kernels | |
| |
| |
| |
Definitions | |
| |
| |
| |
Reproducing kernel Hilbert space | |
| |
| |
| |
Properties | |
| |
| |
| |
Kernel-based algorithms | |
| |
| |
| |
SVMs with PDS kernels | |
| |
| |
| |
Representer theorem | |
| |
| |
| |
Learning guarantees | |
| |
| |
| |
Negative definite symmetric kernels | |
| |
| |
| |
Sequence kernels | |
| |
| |
| |
Weighted transducers | |
| |
| |
| |
Rational kernels | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Boosting | |
| |
| |
| |
Introduction | |
| |
| |
| |
AdaBoost | |
| |
| |
| |
Bound on the empirical error | |
| |
| |
| |
Relationship with coordinate descent | |
| |
| |
| |
Relationship with logistic regression | |
| |
| |
| |
Standard use in practice | |
| |
| |
| |
Theoretical results | |
| |
| |
| |
VC-dimension-based analysis | |
| |
| |
| |
Margin-based analysis | |
| |
| |
| |
Margin maximization | |
| |
| |
| |
Game-theoretic interpretation | |
| |
| |
| |
Discussion | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
On-Line Learning | |
| |
| |
| |
Introduction | |
| |
| |
| |
Prediction with expert advice | |
| |
| |
| |
Mistake bounds and Halving algorithm | |
| |
| |
| |
Weighted majority algorithm | |
| |
| |
| |
Randomized weighted majority algorithm | |
| |
| |
| |
Exponential weighted average algorithm | |
| |
| |
| |
Linear classification | |
| |
| |
| |
Perceptron algorithm | |
| |
| |
| |
Winnow algorithm | |
| |
| |
| |
On-line to batch conversion | |
| |
| |
| |
Game-theoretic connection | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Multi-Class Classification | |
| |
| |
| |
Multi-class classification problem | |
| |
| |
| |
Generalization bounds | |
| |
| |
| |
Uncombined multi-class algorithms | |
| |
| |
| |
Multi-class SVMs | |
| |
| |
| |
Multi-class boosting algorithms | |
| |
| |
| |
Decision trees | |
| |
| |
| |
Aggregated multi-class algorithms | |
| |
| |
| |
One-versus-all | |
| |
| |
| |
One-versus-one | |
| |
| |
| |
Error-correction codes | |
| |
| |
| |
Structured prediction algorithms | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Ranking | |
| |
| |
| |
The problem of ranking | |
| |
| |
| |
Generalization bound | |
| |
| |
| |
Ranking with SVMs | |
| |
| |
| |
RankBoost | |
| |
| |
| |
Bound on the empirical error | |
| |
| |
| |
Relationship with coordinate descent | |
| |
| |
| |
Margin bound for ensemble methods in ranking | |
| |
| |
| |
Bipartite ranking | |
| |
| |
| |
Boosting in bipartite ranking | |
| |
| |
| |
Area under the ROC curve | |
| |
| |
| |
Preference-based setting | |
| |
| |
| |
Second-stage ranking problem | |
| |
| |
| |
Deterministic algorithm | |
| |
| |
| |
Randomized algorithm | |
| |
| |
| |
Extension to other loss functions | |
| |
| |
| |
Discussion | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Regression | |
| |
| |
| |
The problem of regression | |
| |
| |
| |
Generalization bounds | |
| |
| |
| |
Finite hypothesis sets | |
| |
| |
| |
Rademacher complexity bounds | |
| |
| |
| |
Pseudo-dimension bounds | |
| |
| |
| |
Regression algorithms | |
| |
| |
| |
Linear regression | |
| |
| |
| |
Kernel ridge regression | |
| |
| |
| |
Support vector regression | |
| |
| |
| |
Lasso | |
| |
| |
| |
Group norm regression algorithms | |
| |
| |
| |
On-line regression algorithms | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Algorithmic Stability | |
| |
| |
| |
Definitions | |
| |
| |
| |
Stability-based generalization guarantee | |
| |
| |
| |
Stability of kernel-based regularization algorithms | |
| |
| |
| |
Application to regression algorithms: SVR and KRR | |
| |
| |
| |
Application to classification algorithms: SVMs | |
| |
| |
| |
Discussion | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Dimensionality Reduction | |
| |
| |
| |
Principal Component Analysis | |
| |
| |
| |
Kernel Principal Component Analysis (KPCA) | |
| |
| |
| |
KPCA and manifold learning | |
| |
| |
| |
Isomap | |
| |
| |
| |
Laplacian eigenmaps | |
| |
| |
| |
Locally linear embedding (LLE) | |
| |
| |
| |
Johnson-Lindenstrauss lemma | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Learning Automata and Languages | |
| |
| |
| |
Introduction | |
| |
| |
| |
Finite automata | |
| |
| |
| |
Efficient exact learning | |
| |
| |
| |
Passive learning | |
| |
| |
| |
Learning with queries | |
| |
| |
| |
Learning automata with queries | |
| |
| |
| |
Identification in the limit | |
| |
| |
| |
Learning reversible automata | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Reinforcement Learning | |
| |
| |
| |
Learning scenario | |
| |
| |
| |
Markov decision process model | |
| |
| |
| |
Policy | |
| |
| |
| |
Definition | |
| |
| |
| |
Policy value | |
| |
| |
| |
Policy evaluation | |
| |
| |
| |
Optimal policy | |
| |
| |
| |
Planning algorithms | |
| |
| |
| |
Value iteration | |
| |
| |
| |
Policy iteration | |
| |
| |
| |
Linear programming | |
| |
| |
| |
Learning algorithms | |
| |
| |
| |
Stochastic approximation | |
| |
| |
| |
TD(0) algorithm | |
| |
| |
| |
Q-learning algorithm | |
| |
| |
| |
SARSA | |
| |
| |
| |
TD(�) algorithm | |
| |
| |
| |
Large state space | |
| |
| |
| |
Chapter notes | |
| |
| |
Conclusion | |
| |
| |
| |
Linear Algebra Review | |
| |
| |
| |
Vectors and norms | |
| |
| |
| |
Norms | |
| |
| |
| |
Dual norms | |
| |
| |
| |
Matrices | |
| |
| |
| |
Matrix norms | |
| |
| |
| |
Singular value decomposition | |
| |
| |
| |
Symmetric positive semidefinite (SPSD) matrices | |
| |
| |
| |
Convex Optimization | |
| |
| |
| |
Differentiation and unconstrained optimization | |
| |
| |
| |
Convexity | |
| |
| |
| |
Constrained optimization | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Probability Review | |
| |
| |
| |
Probability | |
| |
| |
| |
Random variables | |
| |
| |
| |
Conditional probability and independence | |
| |
| |
| |
Expectation, Markov's inequality, and moment-generating function | |
| |
| |
| |
Variance and Chebyshev's inequality | |
| |
| |
| |
Concentration inequalities | |
| |
| |
| |
Hoeffding's inequality | |
| |
| |
| |
McDiarmid's inequality | |
| |
| |
| |
Other inequalities | |
| |
| |
| |
Binomial distribution: Slud's inequality | |
| |
| |
| |
Normal distribution: tail bound | |
| |
| |
| |
Khintchine-Kahane inequality | |
| |
| |
| |
Chapter notes | |
| |
| |
| |
Exercises | |
| |
| |
| |
Notation | |
| |
| |
References | |
| |
| |
Index | |