| |

| |

Preface | |

| |

| |

Introduction | |

| |

| |

| |

What is a Neural Network? | |

| |

| |

| |

The Human Brain | |

| |

| |

| |

Models of a Neuron | |

| |

| |

| |

Neural Networks Viewed As Directed Graphs | |

| |

| |

| |

Feedback | |

| |

| |

| |

Network Architectures | |

| |

| |

| |

Knowledge Representation | |

| |

| |

| |

Learning Processes | |

| |

| |

| |

Learning Tasks | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Notes and References | |

| |

| |

| |

Rosenblatt's Perceptron | |

| |

| |

| |

Introduction | |

| |

| |

| |

Perceptron | |

| |

| |

| |

The Perceptron Convergence Theorem | |

| |

| |

| |

Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment | |

| |

| |

| |

Computer Experiment: Pattern Classification | |

| |

| |

| |

The Batch Perceptron Algorithm | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Model Building through Regression | |

| |

| |

| |

Introduction | |

| |

| |

| |

Linear Regression Model: Preliminary Considerations | |

| |

| |

| |

Maximum a Posteriori Estimation of the Parameter Vector | |

| |

| |

| |

Relationship Between Regularized Least-Squares Estimation and MAP Estimation | |

| |

| |

| |

Computer Experiment: Pattern Classification | |

| |

| |

| |

The Minimum-Description-Length Principle | |

| |

| |

| |

Finite Sample-Size Considerations | |

| |

| |

| |

The Instrumental-Variables Method | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

The Least-Mean-Square Algorithm | |

| |

| |

| |

Introduction | |

| |

| |

| |

Filtering Structure of the LMS Algorithm | |

| |

| |

| |

Unconstrained Optimization: a Review | |

| |

| |

| |

The Wiener Filter | |

| |

| |

| |

The Least-Mean-Square Algorithm | |

| |

| |

| |

Markov Model Portraying the Deviation of the LMS Algorithm from the Wiener Filter | |

| |

| |

| |

The Langevin Equation: Characterization of Brownian Motion | |

| |

| |

| |

Kushner's Direct-Averaging Method | |

| |

| |

| |

Statistical LMS Learning Theory for Small Learning-Rate Parameter | |

| |

| |

| |

Computer Experiment I: Linear Prediction | |

| |

| |

| |

Computer Experiment II: Pattern Classification | |

| |

| |

| |

Virtues and Limitations of the LMS Algorithm | |

| |

| |

| |

Learning-Rate Annealing Schedules | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Multilayer Perceptrons | |

| |

| |

| |

Introduction | |

| |

| |

| |

Some Preliminaries | |

| |

| |

| |

Batch Learning and On-Line Learning | |

| |

| |

| |

The Back-Propagation Algorithm | |

| |

| |

| |

XOR Problem | |

| |

| |

| |

Heuristics for Making the Back-Propagation Algorithm Perform Better | |

| |

| |

| |

Computer Experiment: Pattern Classification | |

| |

| |

| |

Back Propagation and Differentiation | |

| |

| |

| |

The Hessian and Its Role in On-Line Learning | |

| |

| |

| |

Optimal Annealing and Adaptive Control of the Learning Rate | |

| |

| |

| |

Generalization | |

| |

| |

| |

Approximations of Functions | |

| |

| |

| |

Cross-Validation | |

| |

| |

| |

Complexity Regularization and Network Pruning | |

| |

| |

| |

Virtues and Limitations of Back-Propagation Learning | |

| |

| |

| |

Supervised Learning Viewed as an Optimization Problem | |

| |

| |

| |

Convolutional Networks | |

| |

| |

| |

Nonlinear Filtering | |

| |

| |

| |

Small-Scale Versus Large-Scale Learning Problems | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Kernel Methods and Radial-Basis Function Networks | |

| |

| |

| |

Introduction | |

| |

| |

| |

Cover's Theorem on the Separability of Patterns | |

| |

| |

| |

The Interpolation Problem | |

| |

| |

| |

Radial-Basis-Function Networks | |

| |

| |

| |

K-Means Clustering | |

| |

| |

| |

Recursive Least-Squares Estimation of the Weight Vector | |

| |

| |

| |

Hybrid Learning Procedure for RBF Networks | |

| |

| |

| |

Computer Experiment: Pattern Classification | |

| |

| |

| |

Interpretations of the Gaussian Hidden Units | |

| |

| |

| |

Kernel Regression and Its Relation to RBF Networks | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Support Vector Machines | |

| |

| |

| |

Introduction | |

| |

| |

| |

Optimal Hyperplane for Linearly Separable Patterns | |

| |

| |

| |

Optimal Hyperplane for Nonseparable Patterns | |

| |

| |

| |

The Support Vector Machine Viewed as a Kernel Machine | |

| |

| |

| |

Design of Support Vector Machines | |

| |

| |

| |

XOR Problem | |

| |

| |

| |

Computer Experiment: Pattern Classification | |

| |

| |

| |

Regression: Robustness Considerations | |

| |

| |

| |

Optimal Solution of the Linear Regression Problem | |

| |

| |

| |

The Representer Theorem and Related Issues | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Regularization Theory | |

| |

| |

| |

Introduction | |

| |

| |

| |

Hadamard's Conditions for Well-Posedness | |

| |

| |

| |

Tikhonov's Regularization Theory | |

| |

| |

| |

Regularization Networks | |

| |

| |

| |

Generalized Radial-Basis-Function Networks | |

| |

| |

| |

The Regularized Least-Squares Estimator: Revisited | |

| |

| |

| |

Additional Notes of Interest on Regularization | |

| |

| |

| |

Estimation of the Regularization Parameter | |

| |

| |

| |

Semisupervised Learning | |

| |

| |

| |

Manifold Regularization: Preliminary Considerations | |

| |

| |

| |

Differentiable Manifolds | |

| |

| |

| |

Generalized Regularization Theory | |

| |

| |

| |

Spectral Graph Theory | |

| |

| |

| |

Generalized Representer Theorem | |

| |

| |

| |

Laplacian Regularized Least-Squares Algorithm | |

| |

| |

| |

Experiments on Pattern Classification Using Semisupervised Learning | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Principal-Components Analysis | |

| |

| |

| |

Introduction | |

| |

| |

| |

Principles of Self-Organization | |

| |

| |

| |

Self-Organized Feature Analysis | |

| |

| |

| |

Principal-Components Analysis: Perturbation Theory | |

| |

| |

| |

Hebbian-Based Maximum Eigenfilter | |

| |

| |

| |

Hebbian-Based Principal-Components Analysis | |

| |

| |

| |

Case Study: Image Coding | |

| |

| |

| |

Kernel Principal-Components Analysis | |

| |

| |

| |

Basic Issues Involved in the Coding of Natural Images | |

| |

| |

| |

Kernel Hebbian Algorithm | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Self-Organizing Maps | |

| |

| |

| |

Introduction | |

| |

| |

| |

Two Basic Feature-Mapping Models | |

| |

| |

| |

Self-Organizing Map | |

| |

| |

| |

Properties of the Feature Map | |

| |

| |

| |

Computer Experiments I: Disentangling Lattice Dynamics Using SOM | |

| |

| |

| |

Contextual Maps | |

| |

| |

| |

Hierarchical Vector Quantization | |

| |

| |

| |

Kernel Self-Organizing Map | |

| |

| |

| |

Computer Experiment II: Disentangling Lattice Dynamics Using Kernel SOM | |

| |

| |

| |

Relationship Between Kernel SOM and Kullback-Leibler Divergence | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Information-Theoretic Learning Models | |

| |

| |

| |

Introduction | |

| |

| |

| |

Entropy | |

| |

| |

| |

Maximum-Entropy Principle | |

| |

| |

| |

Mutual Information | |

| |

| |

| |

Kullback-Leibler Divergence | |

| |

| |

| |

Copulas | |

| |

| |

| |

Mutual Information as an Objective Function to be Optimized | |

| |

| |

| |

Maximum Mutual Information Principle | |

| |

| |

| |

Infomax and Redundancy Reduction | |

| |

| |

| |

Spatially Coherent Features | |

| |

| |

| |

Spatially Incoherent Features | |

| |

| |

| |

Independent-Components Analysis | |

| |

| |

| |

Sparse Coding of Natural Images and Comparison with ICA Coding | |

| |

| |

| |

Natural-Gradient Learning for Independent-Components Analysis | |

| |

| |

| |

Maximum-Likelihood Estimation for Independent-Components Analysis | |

| |

| |

| |

Maximum-Entropy Learning for Blind Source Separation | |

| |

| |

| |

Maximization of Negentropy for Independent-Components Analysis | |

| |

| |

| |

Coherent Independent-Components Analysis | |

| |

| |

| |

Rate Distortion Theory and Information Bottleneck | |

| |

| |

| |

Optimal Manifold Representation of Data | |

| |

| |

| |

Computer Experiment: Pattern Classification | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Stochastic Methods Rooted in Statistical Mechanics | |

| |

| |

| |

Introduction | |

| |

| |

| |

Statistical Mechanics | |

| |

| |

| |

Markov Chains | |

| |

| |

| |

Metropolis Algorithm | |

| |

| |

| |

Simulated Annealing | |

| |

| |

| |

Gibbs Sampling | |

| |

| |

| |

Boltzmann Machine | |

| |

| |

| |

Logistic Belief Nets | |

| |

| |

| |

Deep Belief Nets | |

| |

| |

| |

Deterministic Annealing | |

| |

| |

| |

Analogy of Deterministic Annealing with Expectation-Maximization Algorithm | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Dynamic Programming | |

| |

| |

| |

Introduction | |

| |

| |

| |

Markov Decision Process | |

| |

| |

| |

Bellman's Optimality Criterion | |

| |

| |

| |

Policy Iteration | |

| |

| |

| |

Value Iteration | |

| |

| |

| |

Approximate Dynamic Programming: Direct Methods | |

| |

| |

| |

Temporal-Difference Learning | |

| |

| |

| |

Q-Learning | |

| |

| |

| |

Approximate Dynamic Programming: Indirect Methods | |

| |

| |

| |

Least-Squares Policy Evaluation | |

| |

| |

| |

Approximate Policy Iteration | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Neurodynamics | |

| |

| |

| |

Introduction | |

| |

| |

| |

Dynamic Systems | |

| |

| |

| |

Stability of Equilibrium States | |

| |

| |

| |

Attractors | |

| |

| |

| |

Neurodynamic Models | |

| |

| |

| |

Manipulation of Attractors as a Recurrent Network Paradigm | |

| |

| |

| |

Hopfield Model | |

| |

| |

| |

The Cohen-Grossberg Theorem | |

| |

| |

| |

Brain-State-In-A-Box Model | |

| |

| |

| |

Strange Attractors and Chaos | |

| |

| |

| |

Dynamic Reconstruction of a Chaotic Process | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Bayseian Filtering for State Estimation of Dynamic Systems | |

| |

| |

| |

Introduction | |

| |

| |

| |

State-Space Models | |

| |

| |

| |

Kalman Filters | |

| |

| |

| |

The Divergence-Phenomenon and Square-Root Filtering | |

| |

| |

| |

The Extended Kalman Filter | |

| |

| |

| |

The Bayesian Filter | |

| |

| |

| |

Cubature Kalman Filter: Building on the Kalman Filter | |

| |

| |

| |

Particle Filters | |

| |

| |

| |

Computer Experiment: Comparative Evaluation of Extended Kalman and Particle Filters | |

| |

| |

| |

Kalman Filtering in Modeling of Brain Functions | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

| |

Dynamically Driven Recurrent Networks | |

| |

| |

| |

Introduction | |

| |

| |

| |

Recurrent Network Architectures | |

| |

| |

| |

Universal Approximation Theorem | |

| |

| |

| |

Controllability and Observability | |

| |

| |

| |

Computational Power of Recurrent Networks | |

| |

| |

| |

Learning Algorithms | |

| |

| |

| |

Back Propagation Through Time | |

| |

| |

| |

Real-Time Recurrent Learning | |

| |

| |

| |

Vanishing Gradients in Recurrent Networks | |

| |

| |

| |

Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators | |

| |

| |

| |

Computer Experiment: Dynamic Reconstruction of Mackay-Glass Attractor | |

| |

| |

| |

Adaptivity Considerations | |

| |

| |

| |

Case Study: Model Reference Applied to Neurocontrol | |

| |

| |

| |

Summary and Discussion | |

| |

| |

Notes and References | |

| |

| |

Problems | |

| |

| |

Bibliography | |

| |

| |

Index | |