| |
| |
Preface to the Second Edition | |
| |
| |
Preface to the First Edition | |
| |
| |
Acknowledgments | |
| |
| |
| |
The Challenges of Dynamic Programming | |
| |
| |
| |
A Dynamic Programming Example: A Shortest Path Problem | |
| |
| |
| |
The Three Curses of Dimensionality | |
| |
| |
| |
Some Real Applications | |
| |
| |
| |
Problem Classes | |
| |
| |
| |
The Many Dialects of Dynamic Programming | |
| |
| |
| |
What Is New in This Book? | |
| |
| |
| |
Pedagogy | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
| |
Some Illustrative Models | |
| |
| |
| |
Deterministic Problems | |
| |
| |
| |
Stochastic Problems | |
| |
| |
| |
Information Acquisition Problems | |
| |
| |
| |
A Simple Modeling Framework for Dynamic Programs | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Introduction to Markov Decision Processes | |
| |
| |
| |
The Optimality Equations | |
| |
| |
| |
Finite Horizon Problems | |
| |
| |
| |
Infinite Horizon Problems | |
| |
| |
| |
Value Iteration | |
| |
| |
| |
Policy Iteration | |
| |
| |
| |
Hybrid Value-Policy Iteration | |
| |
| |
| |
Average Reward Dynamic Programming | |
| |
| |
| |
The Linear Programming Method for Dynamic Programs | |
| |
| |
| |
Monotone Policies* | |
| |
| |
| |
Why Does It Work?** | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Introduction to Approximate Dynamic Programming | |
| |
| |
| |
The Three Curses of Dimensionality (Revisited) | |
| |
| |
| |
The Basic Idea | |
| |
| |
| |
Q-Learning and SARSA | |
| |
| |
| |
Real-Time Dynamic Programming | |
| |
| |
| |
Approximate Value Iteration | |
| |
| |
| |
The Post-Decision State Variable | |
| |
| |
| |
Low-Dimensional Representations of Value Functions | |
| |
| |
| |
So Just What Is Approximate Dynamic Programming? | |
| |
| |
| |
Experimental Issues | |
| |
| |
| |
But Does It Work? | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Modeling Dynamic Programs | |
| |
| |
| |
Notational Style | |
| |
| |
| |
Modeling Time | |
| |
| |
| |
Modeling Resources | |
| |
| |
| |
The States of Our System | |
| |
| |
| |
Modeling Decisions | |
| |
| |
| |
The Exogenous Information Process | |
| |
| |
| |
The Transition Function | |
| |
| |
| |
The Objective Function | |
| |
| |
| |
A Measure-Theoretic View of Information** | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Policies | |
| |
| |
| |
Myopic Policies | |
| |
| |
| |
Lookahead Policies | |
| |
| |
| |
Policy Function Approximations | |
| |
| |
| |
Value Function Approximations | |
| |
| |
| |
Hybrid Strategies | |
| |
| |
| |
Randomized Policies | |
| |
| |
| |
How to Choose a Policy? | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Policy Search | |
| |
| |
| |
Background | |
| |
| |
| |
Gradient Search | |
| |
| |
| |
Direct Policy Search for Finite Alternatives | |
| |
| |
| |
The Knowledge Gradient Algorithm for Discrete Alternatives | |
| |
| |
| |
Simulation Optimization | |
| |
| |
| |
Why Does It Work?** | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Approximating Value Functions | |
| |
| |
| |
Lookup Tables and Aggregation | |
| |
| |
| |
Parametric Models | |
| |
| |
| |
Regression Variations | |
| |
| |
| |
Nonparametric Models | |
| |
| |
| |
Approximations and the Curse of Dimensionality | |
| |
| |
| |
Why Does It Work?** | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Learning Value Function Approximations | |
| |
| |
| |
Sampling the Value of a Policy | |
| |
| |
| |
Stochastic Approximation Methods | |
| |
| |
| |
Recursive Least Squares for Linear Models | |
| |
| |
| |
Temporal Difference Learning with a Linear Model | |
| |
| |
| |
Bellman's Equation Using a Linear Model | |
| |
| |
| |
Analysis of TD(0), LSTD, and LSPE Using a Single State | |
| |
| |
| |
Gradient-Based Methods for Approximate Value Iteration* | |
| |
| |
| |
Least Squares Temporal Differencing with Kernel Regression* | |
| |
| |
| |
Value Function Approximations Based on Bayesian Learning* | |
| |
| |
| |
Why Does It Work* | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Optimizing While Learning | |
| |
| |
| |
Overview of Algorithmic Strategies | |
| |
| |
| |
Approximate Value Iteration and Q-Learning Using Lookup Tables | |
| |
| |
| |
Statistical Bias in the Max Operator | |
| |
| |
| |
Approximate Value Iteration and Q-Learning Using Linear Models | |
| |
| |
| |
Approximate Policy Iteration | |
| |
| |
| |
The Actor-Critic Paradigm | |
| |
| |
| |
Policy Gradient Methods | |
| |
| |
| |
The Linear Programming Method Using Basis Functions | |
| |
| |
| |
Approximate Policy Iteration Using Kernel Regression* | |
| |
| |
| |
Finite Horizon Approximations for Steady-State Applications | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Adaptive Estimation and Stepsizes | |
| |
| |
| |
Learning Algorithms and Stepsizes | |
| |
| |
| |
Deterministic Stepsize Recipes | |
| |
| |
| |
Stochastic Stepsizes | |
| |
| |
| |
Optimal Stepsizes for Nonstationary Time Series | |
| |
| |
| |
Optimal Stepsizes for Approximate Value Iteration | |
| |
| |
| |
Convergence | |
| |
| |
| |
Guidelines for Choosing Stepsize Formulas | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Exploration Versus Exploitation | |
| |
| |
| |
A Learning Exercise: The Nomadic Trucker | |
| |
| |
| |
An Introduction to Learning | |
| |
| |
| |
Heuristic Learning Policies | |
| |
| |
| |
Gittins Indexes for Online Learning | |
| |
| |
| |
The Knowledge Gradient Policy | |
| |
| |
| |
Learning with a Physical State | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Value Function Approximations for Resource Allocation Problems | |
| |
| |
| |
Value Functions versus Gradients | |
| |
| |
| |
Linear Approximations | |
| |
| |
| |
Piecewise-Linear Approximations | |
| |
| |
| |
Solving a Resource Allocation Problem Using Piecewise-Linear Functions | |
| |
| |
| |
The SHAPE Algorithm | |
| |
| |
| |
Regression Methods | |
| |
| |
| |
Cutting Planes* | |
| |
| |
| |
Why Does It Work?** | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Dynamic Resource Allocation Problems | |
| |
| |
| |
An Asset Acquisition Problem | |
| |
| |
| |
The Blood Management Problem | |
| |
| |
| |
A Portfolio Optimization Problem | |
| |
| |
| |
A General Resource Allocation Problem | |
| |
| |
| |
A Fleet Management Problem | |
| |
| |
| |
A Driver Management Problem | |
| |
| |
| |
Bibliographic Notes | |
| |
| |
Problems | |
| |
| |
| |
Implementation Challenges | |
| |
| |
| |
Will ADP Work for Your Problem? | |
| |
| |
| |
Designing an ADP Algorithm for Complex Problems | |
| |
| |
| |
Debugging an ADP Algorithm | |
| |
| |
| |
Practical Issues | |
| |
| |
| |
Modeling Your Problem | |
| |
| |
| |
Online versus Offline Models | |
| |
| |
| |
If It Works, Patent It! | |
| |
| |
Bibliography | |
| |
| |
Index | |