| |
| |
Preface | |
| |
| |
| |
Stochastic Search and Optimization: Motivation and Supporting Results | |
| |
| |
| |
Introduction | |
| |
| |
| |
General Background | |
| |
| |
| |
Formal Problem Statement; General Types of Problems and Solutions; Global versus Local Search | |
| |
| |
| |
Meaning of "Stochastic" in Stochastic Search and Optimization | |
| |
| |
| |
Some Principles of Stochastic Search and Optimization | |
| |
| |
| |
Some Key Points | |
| |
| |
| |
Limits of Performance: No Free Lunch Theorems | |
| |
| |
| |
Gradients, Hessians, and Their Connection to Optimization of "Smooth" Functions | |
| |
| |
| |
Definition of Gradient and Hessian in the Context of Loss Functions | |
| |
| |
| |
First- and Second-Order Conditions for Optimization | |
| |
| |
| |
Deterministic Search and Optimization: Steepest Descent and Newton-Raphson Search | |
| |
| |
| |
Steepest Descent Method | |
| |
| |
| |
Newton-Raphson Method and Deterministic Convergence Rates | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Direct Methods for Stochastic Search | |
| |
| |
| |
Introduction | |
| |
| |
| |
Random Search with Noise-Free Loss Measurements | |
| |
| |
| |
Some Attributes of Direct Random Search | |
| |
| |
| |
Three Algorithms for Random Search | |
| |
| |
| |
Example Implementations | |
| |
| |
| |
Random Search with Noisy Loss Measurements | |
| |
| |
| |
Nonlinear Simplex (Nelder-Mead) Algorithm | |
| |
| |
| |
Basic Method | |
| |
| |
| |
Adaptation for Noisy Loss Measurements | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Recursive Estimation for Linear Models | |
| |
| |
| |
Formulation for Estimation with Linear Models | |
| |
| |
| |
Linear Model | |
| |
| |
| |
Mean-Squared and Least-Squares Estimation | |
| |
| |
| |
Least-Mean-Squares and Recursive-Least-Squares for Static [theta] | |
| |
| |
| |
Introduction | |
| |
| |
| |
Basic LMS Algorithm | |
| |
| |
| |
LMS Algorithm in Adaptive Signal Processing and Control | |
| |
| |
| |
Basic RLS Algorithm | |
| |
| |
| |
Connection of RLS to the Newton-Raphson Method | |
| |
| |
| |
Extensions to Multivariate RLS and Weighted Summands in Least-Squares Criterion | |
| |
| |
| |
LMS, RLS, and Kalman Filter for Time-Varying [theta] | |
| |
| |
| |
Introduction | |
| |
| |
| |
LMS | |
| |
| |
| |
RLS | |
| |
| |
| |
Kalman Filter | |
| |
| |
| |
Case Study: Analysis of Oboe Reed Data | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Stochastic Approximation for Nonlinear Root-Finding | |
| |
| |
| |
Introduction | |
| |
| |
| |
Potpourri of Stochastic Approximation Examples | |
| |
| |
| |
Convergence of Stochastic Approximation | |
| |
| |
| |
Background | |
| |
| |
| |
Convergence Conditions | |
| |
| |
| |
On the Gain Sequence and Connection to ODEs | |
| |
| |
| |
Asymptotic Normality and Choice of Gain Sequence | |
| |
| |
| |
Extensions to Basic Stochastic Approximation | |
| |
| |
| |
Joint Parameter and State Evolution | |
| |
| |
| |
Adaptive Estimation and Higher-Order Algorithms | |
| |
| |
| |
Iterate Averaging | |
| |
| |
| |
Time-Varying Functions | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Stochastic Gradient Form of Stochastic Approximation | |
| |
| |
| |
Root-Finding Stochastic Approximation as a Stochastic Gradient Method | |
| |
| |
| |
Basic Principles | |
| |
| |
| |
Stochastic Gradient Algorithm | |
| |
| |
| |
Implementation in General Nonlinear Regression Problems | |
| |
| |
| |
Connection of LMS to Stochastic Gradient SA | |
| |
| |
| |
Neural Network Training | |
| |
| |
| |
Discrete-Event Dynamic Systems | |
| |
| |
| |
Image Restoration | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Stochastic Approximation and the Finite-Difference Method | |
| |
| |
| |
Introduction and Contrast of Gradient-Based and Gradient-Free Algorithms | |
| |
| |
| |
Some Motivating Examples for Gradient-Free Stochastic Approximation | |
| |
| |
| |
Finite-Difference Algorithm | |
| |
| |
| |
Convergence Theory | |
| |
| |
| |
Bias in Gradient Estimate | |
| |
| |
| |
Convergence | |
| |
| |
| |
Asymptotic Normality | |
| |
| |
| |
Practical Selection of Gain Sequences | |
| |
| |
| |
Several Finite-Difference Examples | |
| |
| |
| |
Some Extensions and Enhancements to the Finite-Difference Algorithm | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Simultaneous Perturbation Stochastic Approximation | |
| |
| |
| |
Background | |
| |
| |
| |
Form and Motivation for Standard SPSA Algorithm | |
| |
| |
| |
Basic Algorithm | |
| |
| |
| |
Relationship of Gradient Estimate to True Gradient | |
| |
| |
| |
Basic Assumptions and Supporting Theory for Convergence | |
| |
| |
| |
Asymptotic Normality and Efficiency Analysis | |
| |
| |
| |
Practical Implementation | |
| |
| |
| |
Step-by-Step Implementation | |
| |
| |
| |
Choice of Gain Sequences | |
| |
| |
| |
Numerical Examples | |
| |
| |
| |
Some Extensions: Optimal Perturbation Distribution; One-Measurement Form; Global, Discrete, and Constrained Optimization | |
| |
| |
| |
Adaptive SPSA | |
| |
| |
| |
Introduction and Basic Algorithm | |
| |
| |
| |
Implementation Aspects of Adaptive SPSA | |
| |
| |
| |
Theory on Convergence and Efficiency of Adaptive SPSA | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Appendix: Conditions for Asymptotic Normality | |
| |
| |
Exercises | |
| |
| |
| |
Annealing-Type Algorithms | |
| |
| |
| |
Introduction to Simulated Annealing and Motivation from the Physics of Cooling | |
| |
| |
| |
Simulated Annealing Algorithm | |
| |
| |
| |
Basic Algorithm | |
| |
| |
| |
Modifications for Noisy Loss Function Measurements | |
| |
| |
| |
Some Examples | |
| |
| |
| |
Global Optimization via Annealing Algorithms Based on Stochastic Approximation | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Appendix: Convergence Theory for Simulated Annealing Based on Stochastic Approximation | |
| |
| |
Exercises | |
| |
| |
| |
Evolutionary Computation I: Genetic Algorithms | |
| |
| |
| |
Introduction | |
| |
| |
| |
Some Historical Perspective and Motivating Applications | |
| |
| |
| |
Brief History | |
| |
| |
| |
Early Motivation | |
| |
| |
| |
Coding of Elements for Searching | |
| |
| |
| |
Introduction | |
| |
| |
| |
Standard Bit Coding | |
| |
| |
| |
Gray Coding | |
| |
| |
| |
Real-Number Coding | |
| |
| |
| |
Standard Genetic Algorithm Operations | |
| |
| |
| |
Selection and Elitism | |
| |
| |
| |
Crossover | |
| |
| |
| |
Mutation and Termination | |
| |
| |
| |
Overview of Basic GA Search Approach | |
| |
| |
| |
Practical Guidance and Extensions: Coefficient Values, Constraints, Noisy Fitness Evaluations, Local Search, and Parent Selection | |
| |
| |
| |
Examples | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Evolutionary Computation II: General Methods and Theory | |
| |
| |
| |
Introduction | |
| |
| |
| |
Overview of Evolution Strategy and Evolutionary Programming with Comparisons to Genetic Algorithms | |
| |
| |
| |
Schema Theory | |
| |
| |
| |
What Makes a Problem Hard? | |
| |
| |
| |
Convergence Theory | |
| |
| |
| |
No Free Lunch Theorems | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Reinforcement Learning via Temporal Differences | |
| |
| |
| |
Introduction | |
| |
| |
| |
Delayed Reinforcement and Formulation for Temporal Difference Learning | |
| |
| |
| |
Basic Temporal Difference Algorithm | |
| |
| |
| |
Batch and Online Implementations of TD Learning | |
| |
| |
| |
Some Examples | |
| |
| |
| |
Connections to Stochastic Approximation | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Statistical Methods for Optimization in Discrete Problems | |
| |
| |
| |
Introduction to Multiple Comparisons Over a Finite Set | |
| |
| |
| |
Statistical Comparisons Test Without Prior Information | |
| |
| |
| |
Multiple Comparisons Against One Candidate with Known Noise Variance(s) | |
| |
| |
| |
Multiple Comparisons Against One Candidate with Unknown Noise Variance(s) | |
| |
| |
| |
Extensions to Bonferroni Inequality; Ranking and Selection Methods in Optimization Over a Finite Set | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Model Selection and Statistical Information | |
| |
| |
| |
Bias-Variance Tradeoff | |
| |
| |
| |
Bias and Variance as Contributors to Model Prediction Error | |
| |
| |
| |
Interpretation of the Bias-Variance Tradeoff | |
| |
| |
| |
Bias-Variance Analysis for Linear Models | |
| |
| |
| |
Model Selection: Cross-Validation | |
| |
| |
| |
The Information Matrix: Applications and Resampling-Based Computation | |
| |
| |
| |
Introduction | |
| |
| |
| |
Fisher Information Matrix: Definition and Two Equivalent Forms | |
| |
| |
| |
Two Key Properties of the Information Matrix: Connections to the Covariance Matrix of Parameter Estimates | |
| |
| |
| |
Selected Applications | |
| |
| |
| |
Resampling-Based Calculation of the Information Matrix | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Simulation-Based Optimization I: Regeneration, Common Random Numbers, and Selection Methods | |
| |
| |
| |
Background | |
| |
| |
| |
Focus of Chapter and Roles of Search and Optimization in Simulation | |
| |
| |
| |
Statement of the Optimization Problem | |
| |
| |
| |
Regenerative Systems | |
| |
| |
| |
Background and Definition of the Loss Function L([theta]) | |
| |
| |
| |
Estimators of L([theta]) | |
| |
| |
| |
Estimates Related to the Gradient of L([theta]) | |
| |
| |
| |
Optimization with Finite-Difference and Simultaneous Perturbation Gradient Estimators | |
| |
| |
| |
Common Random Numbers | |
| |
| |
| |
Introduction | |
| |
| |
| |
Theory and Examples for Common Random Numbers | |
| |
| |
| |
Partial Common Random Numbers for Finite Samples | |
| |
| |
| |
Selection Methods for Optimization with Discrete-Valued [theta] | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Simulation-Based Optimization II: Stochastic Gradient and Sample Path Methods | |
| |
| |
| |
Framework for Gradient Estimation | |
| |
| |
| |
Some Issues in Gradient Estimation | |
| |
| |
| |
Gradient Estimation and the Interchange of Derivative and Integral | |
| |
| |
| |
Pure Likelihood Ratio/Score Function and Pure Infinitesimal Perturbation Analysis | |
| |
| |
| |
Gradient Estimation Methods in Root-Finding Stochastic Approximation: The Hybrid LR/SF and IPA Setting | |
| |
| |
| |
Sample Path Optimization | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Markov Chain Monte Carlo | |
| |
| |
| |
Background | |
| |
| |
| |
Metropolis-Hastings Algorithm | |
| |
| |
| |
Gibbs Sampling | |
| |
| |
| |
Sketch of Theoretical Foundation for Gibbs Sampling | |
| |
| |
| |
Some Examples of Gibbs Sampling | |
| |
| |
| |
Applications in Bayesian Analysis | |
| |
| |
| |
Concluding Remarks | |
| |
| |
Exercises | |
| |
| |
| |
Optimal Design for Experimental Inputs | |
| |
| |
| |
Introduction | |
| |
| |
| |
Motivation | |
| |
| |
| |
Finite-Sample and Asymptotic (Continuous) Designs | |
| |
| |
| |
Precision Matrix and D-Optimality | |
| |
| |
| |
Linear Models | |
| |
| |
| |
Background and Connections to D-Optimality | |
| |
| |
| |
Some Properties of Asymptotic Designs | |
| |
| |
| |
Orthogonal Designs | |
| |
| |
| |
Sketch of Algorithms for Finding Optimal Designs | |
| |
| |
| |
Response Surface Methodology | |
| |
| |
| |
Nonlinear Models | |
| |
| |
| |
Introduction | |
| |
| |
| |
Methods for Coping with Dependence on [theta] | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Appendix: Optimal Design in Dynamic Models | |
| |
| |
Exercises | |
| |
| |
| |
Selected Results from Multivariate Analysis | |
| |
| |
| |
Multivariate Calculus and Analysis | |
| |
| |
| |
Some Useful Results in Matrix Theory | |
| |
| |
Exercises | |
| |
| |
| |
Some Basic Tests in Statistics | |
| |
| |
| |
Standard One-Sample Test | |
| |
| |
| |
Some Basic Two-Sample Tests | |
| |
| |
| |
Comments on Other Aspects of Statistical Testing | |
| |
| |
Exercises | |
| |
| |
| |
Probability Theory and Convergence | |
| |
| |
| |
Basic Properties | |
| |
| |
| |
Convergence Theory | |
| |
| |
| |
Definitions of Convergence | |
| |
| |
| |
Examples and Counterexamples Related to Convergence | |
| |
| |
| |
Dominated Convergence Theorem | |
| |
| |
| |
Convergence in Distribution and Central Limit Theorem | |
| |
| |
Exercises | |
| |
| |
| |
Random Number Generation | |
| |
| |
| |
Background and Introduction to Linear Congruential Generators | |
| |
| |
| |
Transformation of Uniform Random Numbers to Other Distributions | |
| |
| |
Exercises | |
| |
| |
| |
Markov Processes | |
| |
| |
| |
Background on Markov Processes | |
| |
| |
| |
Discrete Markov Chains | |
| |
| |
Exercises | |
| |
| |
Answers to Selected Exercises | |
| |
| |
References | |
| |
| |
Frequently Used Notation | |
| |
| |
Index | |