| |

| |

Preface | |

| |

| |

| |

Stochastic Search and Optimization: Motivation and Supporting Results | |

| |

| |

| |

Introduction | |

| |

| |

| |

General Background | |

| |

| |

| |

Formal Problem Statement; General Types of Problems and Solutions; Global versus Local Search | |

| |

| |

| |

Meaning of "Stochastic" in Stochastic Search and Optimization | |

| |

| |

| |

Some Principles of Stochastic Search and Optimization | |

| |

| |

| |

Some Key Points | |

| |

| |

| |

Limits of Performance: No Free Lunch Theorems | |

| |

| |

| |

Gradients, Hessians, and Their Connection to Optimization of "Smooth" Functions | |

| |

| |

| |

Definition of Gradient and Hessian in the Context of Loss Functions | |

| |

| |

| |

First- and Second-Order Conditions for Optimization | |

| |

| |

| |

Deterministic Search and Optimization: Steepest Descent and Newton-Raphson Search | |

| |

| |

| |

Steepest Descent Method | |

| |

| |

| |

Newton-Raphson Method and Deterministic Convergence Rates | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Direct Methods for Stochastic Search | |

| |

| |

| |

Introduction | |

| |

| |

| |

Random Search with Noise-Free Loss Measurements | |

| |

| |

| |

Some Attributes of Direct Random Search | |

| |

| |

| |

Three Algorithms for Random Search | |

| |

| |

| |

Example Implementations | |

| |

| |

| |

Random Search with Noisy Loss Measurements | |

| |

| |

| |

Nonlinear Simplex (Nelder-Mead) Algorithm | |

| |

| |

| |

Basic Method | |

| |

| |

| |

Adaptation for Noisy Loss Measurements | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Recursive Estimation for Linear Models | |

| |

| |

| |

Formulation for Estimation with Linear Models | |

| |

| |

| |

Linear Model | |

| |

| |

| |

Mean-Squared and Least-Squares Estimation | |

| |

| |

| |

Least-Mean-Squares and Recursive-Least-Squares for Static [theta] | |

| |

| |

| |

Introduction | |

| |

| |

| |

Basic LMS Algorithm | |

| |

| |

| |

LMS Algorithm in Adaptive Signal Processing and Control | |

| |

| |

| |

Basic RLS Algorithm | |

| |

| |

| |

Connection of RLS to the Newton-Raphson Method | |

| |

| |

| |

Extensions to Multivariate RLS and Weighted Summands in Least-Squares Criterion | |

| |

| |

| |

LMS, RLS, and Kalman Filter for Time-Varying [theta] | |

| |

| |

| |

Introduction | |

| |

| |

| |

LMS | |

| |

| |

| |

RLS | |

| |

| |

| |

Kalman Filter | |

| |

| |

| |

Case Study: Analysis of Oboe Reed Data | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Stochastic Approximation for Nonlinear Root-Finding | |

| |

| |

| |

Introduction | |

| |

| |

| |

Potpourri of Stochastic Approximation Examples | |

| |

| |

| |

Convergence of Stochastic Approximation | |

| |

| |

| |

Background | |

| |

| |

| |

Convergence Conditions | |

| |

| |

| |

On the Gain Sequence and Connection to ODEs | |

| |

| |

| |

Asymptotic Normality and Choice of Gain Sequence | |

| |

| |

| |

Extensions to Basic Stochastic Approximation | |

| |

| |

| |

Joint Parameter and State Evolution | |

| |

| |

| |

Adaptive Estimation and Higher-Order Algorithms | |

| |

| |

| |

Iterate Averaging | |

| |

| |

| |

Time-Varying Functions | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Stochastic Gradient Form of Stochastic Approximation | |

| |

| |

| |

Root-Finding Stochastic Approximation as a Stochastic Gradient Method | |

| |

| |

| |

Basic Principles | |

| |

| |

| |

Stochastic Gradient Algorithm | |

| |

| |

| |

Implementation in General Nonlinear Regression Problems | |

| |

| |

| |

Connection of LMS to Stochastic Gradient SA | |

| |

| |

| |

Neural Network Training | |

| |

| |

| |

Discrete-Event Dynamic Systems | |

| |

| |

| |

Image Restoration | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Stochastic Approximation and the Finite-Difference Method | |

| |

| |

| |

Introduction and Contrast of Gradient-Based and Gradient-Free Algorithms | |

| |

| |

| |

Some Motivating Examples for Gradient-Free Stochastic Approximation | |

| |

| |

| |

Finite-Difference Algorithm | |

| |

| |

| |

Convergence Theory | |

| |

| |

| |

Bias in Gradient Estimate | |

| |

| |

| |

Convergence | |

| |

| |

| |

Asymptotic Normality | |

| |

| |

| |

Practical Selection of Gain Sequences | |

| |

| |

| |

Several Finite-Difference Examples | |

| |

| |

| |

Some Extensions and Enhancements to the Finite-Difference Algorithm | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Simultaneous Perturbation Stochastic Approximation | |

| |

| |

| |

Background | |

| |

| |

| |

Form and Motivation for Standard SPSA Algorithm | |

| |

| |

| |

Basic Algorithm | |

| |

| |

| |

Relationship of Gradient Estimate to True Gradient | |

| |

| |

| |

Basic Assumptions and Supporting Theory for Convergence | |

| |

| |

| |

Asymptotic Normality and Efficiency Analysis | |

| |

| |

| |

Practical Implementation | |

| |

| |

| |

Step-by-Step Implementation | |

| |

| |

| |

Choice of Gain Sequences | |

| |

| |

| |

Numerical Examples | |

| |

| |

| |

Some Extensions: Optimal Perturbation Distribution; One-Measurement Form; Global, Discrete, and Constrained Optimization | |

| |

| |

| |

Adaptive SPSA | |

| |

| |

| |

Introduction and Basic Algorithm | |

| |

| |

| |

Implementation Aspects of Adaptive SPSA | |

| |

| |

| |

Theory on Convergence and Efficiency of Adaptive SPSA | |

| |

| |

| |

Concluding Remarks | |

| |

| |

| |

Appendix: Conditions for Asymptotic Normality | |

| |

| |

Exercises | |

| |

| |

| |

Annealing-Type Algorithms | |

| |

| |

| |

Introduction to Simulated Annealing and Motivation from the Physics of Cooling | |

| |

| |

| |

Simulated Annealing Algorithm | |

| |

| |

| |

Basic Algorithm | |

| |

| |

| |

Modifications for Noisy Loss Function Measurements | |

| |

| |

| |

Some Examples | |

| |

| |

| |

Global Optimization via Annealing Algorithms Based on Stochastic Approximation | |

| |

| |

| |

Concluding Remarks | |

| |

| |

| |

Appendix: Convergence Theory for Simulated Annealing Based on Stochastic Approximation | |

| |

| |

Exercises | |

| |

| |

| |

Evolutionary Computation I: Genetic Algorithms | |

| |

| |

| |

Introduction | |

| |

| |

| |

Some Historical Perspective and Motivating Applications | |

| |

| |

| |

Brief History | |

| |

| |

| |

Early Motivation | |

| |

| |

| |

Coding of Elements for Searching | |

| |

| |

| |

Introduction | |

| |

| |

| |

Standard Bit Coding | |

| |

| |

| |

Gray Coding | |

| |

| |

| |

Real-Number Coding | |

| |

| |

| |

Standard Genetic Algorithm Operations | |

| |

| |

| |

Selection and Elitism | |

| |

| |

| |

Crossover | |

| |

| |

| |

Mutation and Termination | |

| |

| |

| |

Overview of Basic GA Search Approach | |

| |

| |

| |

Practical Guidance and Extensions: Coefficient Values, Constraints, Noisy Fitness Evaluations, Local Search, and Parent Selection | |

| |

| |

| |

Examples | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Evolutionary Computation II: General Methods and Theory | |

| |

| |

| |

Introduction | |

| |

| |

| |

Overview of Evolution Strategy and Evolutionary Programming with Comparisons to Genetic Algorithms | |

| |

| |

| |

Schema Theory | |

| |

| |

| |

What Makes a Problem Hard? | |

| |

| |

| |

Convergence Theory | |

| |

| |

| |

No Free Lunch Theorems | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Reinforcement Learning via Temporal Differences | |

| |

| |

| |

Introduction | |

| |

| |

| |

Delayed Reinforcement and Formulation for Temporal Difference Learning | |

| |

| |

| |

Basic Temporal Difference Algorithm | |

| |

| |

| |

Batch and Online Implementations of TD Learning | |

| |

| |

| |

Some Examples | |

| |

| |

| |

Connections to Stochastic Approximation | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Statistical Methods for Optimization in Discrete Problems | |

| |

| |

| |

Introduction to Multiple Comparisons Over a Finite Set | |

| |

| |

| |

Statistical Comparisons Test Without Prior Information | |

| |

| |

| |

Multiple Comparisons Against One Candidate with Known Noise Variance(s) | |

| |

| |

| |

Multiple Comparisons Against One Candidate with Unknown Noise Variance(s) | |

| |

| |

| |

Extensions to Bonferroni Inequality; Ranking and Selection Methods in Optimization Over a Finite Set | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Model Selection and Statistical Information | |

| |

| |

| |

Bias-Variance Tradeoff | |

| |

| |

| |

Bias and Variance as Contributors to Model Prediction Error | |

| |

| |

| |

Interpretation of the Bias-Variance Tradeoff | |

| |

| |

| |

Bias-Variance Analysis for Linear Models | |

| |

| |

| |

Model Selection: Cross-Validation | |

| |

| |

| |

The Information Matrix: Applications and Resampling-Based Computation | |

| |

| |

| |

Introduction | |

| |

| |

| |

Fisher Information Matrix: Definition and Two Equivalent Forms | |

| |

| |

| |

Two Key Properties of the Information Matrix: Connections to the Covariance Matrix of Parameter Estimates | |

| |

| |

| |

Selected Applications | |

| |

| |

| |

Resampling-Based Calculation of the Information Matrix | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Simulation-Based Optimization I: Regeneration, Common Random Numbers, and Selection Methods | |

| |

| |

| |

Background | |

| |

| |

| |

Focus of Chapter and Roles of Search and Optimization in Simulation | |

| |

| |

| |

Statement of the Optimization Problem | |

| |

| |

| |

Regenerative Systems | |

| |

| |

| |

Background and Definition of the Loss Function L([theta]) | |

| |

| |

| |

Estimators of L([theta]) | |

| |

| |

| |

Estimates Related to the Gradient of L([theta]) | |

| |

| |

| |

Optimization with Finite-Difference and Simultaneous Perturbation Gradient Estimators | |

| |

| |

| |

Common Random Numbers | |

| |

| |

| |

Introduction | |

| |

| |

| |

Theory and Examples for Common Random Numbers | |

| |

| |

| |

Partial Common Random Numbers for Finite Samples | |

| |

| |

| |

Selection Methods for Optimization with Discrete-Valued [theta] | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Simulation-Based Optimization II: Stochastic Gradient and Sample Path Methods | |

| |

| |

| |

Framework for Gradient Estimation | |

| |

| |

| |

Some Issues in Gradient Estimation | |

| |

| |

| |

Gradient Estimation and the Interchange of Derivative and Integral | |

| |

| |

| |

Pure Likelihood Ratio/Score Function and Pure Infinitesimal Perturbation Analysis | |

| |

| |

| |

Gradient Estimation Methods in Root-Finding Stochastic Approximation: The Hybrid LR/SF and IPA Setting | |

| |

| |

| |

Sample Path Optimization | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Markov Chain Monte Carlo | |

| |

| |

| |

Background | |

| |

| |

| |

Metropolis-Hastings Algorithm | |

| |

| |

| |

Gibbs Sampling | |

| |

| |

| |

Sketch of Theoretical Foundation for Gibbs Sampling | |

| |

| |

| |

Some Examples of Gibbs Sampling | |

| |

| |

| |

Applications in Bayesian Analysis | |

| |

| |

| |

Concluding Remarks | |

| |

| |

Exercises | |

| |

| |

| |

Optimal Design for Experimental Inputs | |

| |

| |

| |

Introduction | |

| |

| |

| |

Motivation | |

| |

| |

| |

Finite-Sample and Asymptotic (Continuous) Designs | |

| |

| |

| |

Precision Matrix and D-Optimality | |

| |

| |

| |

Linear Models | |

| |

| |

| |

Background and Connections to D-Optimality | |

| |

| |

| |

Some Properties of Asymptotic Designs | |

| |

| |

| |

Orthogonal Designs | |

| |

| |

| |

Sketch of Algorithms for Finding Optimal Designs | |

| |

| |

| |

Response Surface Methodology | |

| |

| |

| |

Nonlinear Models | |

| |

| |

| |

Introduction | |

| |

| |

| |

Methods for Coping with Dependence on [theta] | |

| |

| |

| |

Concluding Remarks | |

| |

| |

| |

Appendix: Optimal Design in Dynamic Models | |

| |

| |

Exercises | |

| |

| |

| |

Selected Results from Multivariate Analysis | |

| |

| |

| |

Multivariate Calculus and Analysis | |

| |

| |

| |

Some Useful Results in Matrix Theory | |

| |

| |

Exercises | |

| |

| |

| |

Some Basic Tests in Statistics | |

| |

| |

| |

Standard One-Sample Test | |

| |

| |

| |

Some Basic Two-Sample Tests | |

| |

| |

| |

Comments on Other Aspects of Statistical Testing | |

| |

| |

Exercises | |

| |

| |

| |

Probability Theory and Convergence | |

| |

| |

| |

Basic Properties | |

| |

| |

| |

Convergence Theory | |

| |

| |

| |

Definitions of Convergence | |

| |

| |

| |

Examples and Counterexamples Related to Convergence | |

| |

| |

| |

Dominated Convergence Theorem | |

| |

| |

| |

Convergence in Distribution and Central Limit Theorem | |

| |

| |

Exercises | |

| |

| |

| |

Random Number Generation | |

| |

| |

| |

Background and Introduction to Linear Congruential Generators | |

| |

| |

| |

Transformation of Uniform Random Numbers to Other Distributions | |

| |

| |

Exercises | |

| |

| |

| |

Markov Processes | |

| |

| |

| |

Background on Markov Processes | |

| |

| |

| |

Discrete Markov Chains | |

| |

| |

Exercises | |

| |

| |

Answers to Selected Exercises | |

| |

| |

References | |

| |

| |

Frequently Used Notation | |

| |

| |

Index | |