Skip to content

Reinforcement Learning An Introduction

Best in textbook rentals since 2012!

ISBN-10: 0262193981

ISBN-13: 9780262193986

Edition: 2nd 1998

Authors: Richard S. Sutton, Andrew G. Barto, Francis Bach

List price: $75.00
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

Offering an account of key ideas and algorithms in reinforcement learning, this volume includes discussion ranging from the history of the field's intellectual foundations to recent developments and applications.
Customers also bought

Book details

List price: $75.00
Edition: 2nd
Copyright year: 1998
Publisher: MIT Press
Publication date: 2/26/1998
Binding: Hardcover
Pages: 344
Size: 7.01" wide x 9.29" long x 1.08" tall
Weight: 1.738
Language: English

Contents
Series Foreword
Preface
The Problem
Introduction
Reinforcement Learning
Examples
Elements of Reinforcement Learning
An Extended Example: Tic-Tac-Toe
Summary
History of Reinforcement Learning
Bibliographical Remarks
Evaluative Feedback
An n-Armed Bandit Problem
Action-Value Methods
Softmax Action Selection
Evaluation Versus Instruction
Incremental Implementation
Tracking a Nonstationary Problem
Optimistic Initial Values
Reinforcement Comparison
Pursuit Methods
Associative Search
Conclusions
Bibliographical and Historical Remarks
The Reinforcement Learning Problem
The Agent-Environment Interface
Goals and Rewards
Returns
Unified Notation for Episodic and Continuing Tasks
The Markov Property
Markov Decision Processes
Value Functions
Optimal Value Functions
Optimality and Approximation
Summary
Bibliographical and Historical Remarks
Elementary Solution Methods
Dynamic Programming
Policy Evaluation
Policy Improvement
Policy Iteration
Value Iteration
Asynchronous Dynamic Programming
Generalized Policy Iteration
Efficiency of Dynamic Programming
Summary
Bibliographical and Historical Remarks
Monte Carlo Methods
Monte Carlo Policy Evaluation
Monte Carlo Estimation of Action Values
Monte Carlo Control
On-Policy Monte Carlo Control
Evaluating One Policy While Following Another
Off-Policy Monte Carlo Control
Incremental Implementation
Summary
Bibliographical and Historical Remarks
Temporal-Difference Learning
TD Prediction
Advantages of TD Prediction Methods
Optimality of TD(O)
Sarsa: On-Policy TD Control
Q-Learning: Off-Policy TD Control
Actor-Critic Methods
R-Learning for Undiscounted Continuing Tasks
Games, Afterstates, and Other Special Cases
Summary
Bibliographical and Historical Remarks
A Unified View
Eligibility Traces
n-Step TD Prediction
The Forward View of TD(l)
The Backward View of TD(l)
Equivalence of Forward and Backward Views
Sarsa(l)
Q(l)
Eligibility Traces for Actor-Critic Methods
Replacing Traces
Implementation Issues
Variable l
Conclusions
Bibliographical and Historical Remarks
Generalization and Function Approximation
Value Prediction with Function Approximation
Gradient-Descent Methods
Linear Methods
Control with Function Approximation
Off-Policy Bootstrapping
Should We Bootstrap?
Summary
Bibliographical and Historical Remarks
Planning and Learning
Models and Planning
Integrating Planning, Acting, and Learning
When the Model Is Wrong
Prioritized Sweeping
Full vs. Sample Backups
Trajectory Sampling
Heuristic Search
Summary
Bibliographical and Historical Remarks
Dimensions of Reinforcement Learning
The Unified View
Other Frontier Dimensions
Case Studies
TD-Gammon
Samuel's Checkers Player
The Acrobot
Elevator Dispatching
Dynamic Channel Allocation
Job-Shop Scheduling
References
Summary of Notation
Index