Skip to content

Modern Processor Design Fundamentals of Superscalar Processors

Best in textbook rentals since 2012!

ISBN-10: 0070570647

ISBN-13: 9780070570641

Edition: 2005

Authors: John P. Shen, Mikko Lipasti

List price: $173.33
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Customers also bought

Book details

List price: $173.33
Copyright year: 2005
Publisher: McGraw-Hill Higher Education
Publication date: 7/7/2004
Binding: Hardcover
Pages: 656
Size: 7.50" wide x 9.25" long x 1.25" tall
Weight: 2.530
Language: English

Table of Contents
Additional Resources
Preface
Processor Design
The Evolution of Microprocessors
Instruction Set Processor Design
Digital Systems Design
Architecture, Implementation, and Realization
Instruction Set Architecture
Dynamic-Static Interface
Principles of Processor Performance
Processor Performance Equation
Processor Performance Optimizations
Performance Evaluation Method
Instruction-Level Parallel Processing
From Scalar to Superscalar
Limits of Instruction-Level Parallelism
Machines for Instruction-Level Parallelism
Summary
Pipelined Processors
Pipelining Fundamentals
Pipelined Design
Arithmetic Pipeline Example
Pipelining Idealism
Instruction Pipelining
Pipelined Processor Design
Balancing Pipeline Stages
Unifying Instruction Types
Minimizing Pipeline Stalls
Commercial Pipelined Processors
Deeply Pipelined Processors
Summary
Memory and I/O Systems
Introduction
Computer System Overview
Key Concepts: Latency and Bandwidth
Memory Hierarchy
Components of a Modern Memory Hierarchy
Temporal and Spatial Locality
Caching and Cache Memories
Main Memory
Virtual Memory Systems
Demand Paging
Memory Protection
Page Table Architectures
Memory Hierarchy Implementation
Input/Output Systems
Types of I/O Devices
Computer System Busses
Communication with I/O Devices
Interaction of I/O Devices and Memory Hierarchy
Summary
Superscalar Organization
Limitations of Scalar Pipelines
Upper Bound on Scalar Pipeline Throughput
Inefficient Unification into a Single Pipeline
Performance Lost Due to a Rigid Pipeline
From Scalar to Superscalar Pipelines
Parallel Pipelines
Diversified Pipelines
Dynamic Pipelines
Superscalar Pipeline Overview
Instruction Fetching
Instruction Decoding
Instruction Dispatching
Instruction Execution
Instruction Completion and Retiring
Summary
Superscalar Techniques
Instruction Flow Techniques
Program Control Flow and Control Dependences
Performance Degradation Due to Branches
Branch Prediction Techniques
Branch Misprediction Recovery
Advanced Branch Prediction Techniques
Other Instruction Flow Techniques
Register Data Flow Techniques
Register Reuse and False Data Dependences
Register Renaming Techniques
True Data Dependences and the Data Flow Limit
The Classic Tomasulo Algorithm
Dynamic Execution Core
Reservation Stations and Reorder Buffer
Dynamic Instruction Scheduler
Other Register Data Flow Techniques
Memory Data Flow Techniques
Memory Accessing Instructions
Ordering of Memory Accesses
Load Bypassing and Load Forwarding
Other Memory Data Flow Techniques
Summary
The PowerPC 620
Introduction
Experimental Framework
Instruction Fetching
Branch Prediction
Fetching and Speculation
Instruction Dispatching
Instruction Buffer
Dispatch Stalls
Dispatch Effectiveness
Instruction Execution
Issue Stalls
Execution Parallelism
Execution Latency
Instruction Completion
Completion Parallelism
Cache Effects
Conclusions and Observations
Bridging to the IBM POWER3 and POWER4
Summary
Intel's P6 Microarchitecture
Introduction
Basics of the P6 Microarchitecture
Pipelining
In-Order Front-End Pipeline
Out-of-Order Core Pipeline
Retirement Pipeline
The In-Order Front End
Instruction Cache and ITLB
Branch Prediction
Instruction Decoder
Register Alias Table
Allocator
The Out-of-Order Core
Reservation Station
Retirement
The Reorder Buffer
Memory Subsystem
Memory Access Ordering
Load Memory Operations
Basic Store Memory Operations
Deferring Memory Operations
Page Faults
Summary
Acknowledgments
Survey of Superscalar Processors
Development of Superscalar Processors
Early Advances in Uniprocessor Parallelism: The IBM Stretch
First Superscalar Design: The IBM Advanced Computer System
Instruction-Level Parallelism Studies
By-Products of DAE: The First Multiple-Decoding Implementations
IBM Cheetah, Panther, and America
Decoupled Microarchitectures
Other Efforts in the 1980s
Wide Acceptance of Superscalar
A Classification of Recent Designs
RISC and CISC Retrofits
Speed Demons: Emphasis on Clock Cycle Time
Brainiacs: Emphasis on IPC
Processor Descriptions
Compaq / DEC Alpha
Hewlett-Packard PA-RISC Version 1.0
Hewlett-Packard PA-RISC Version 2.0
IBM POWER
Intel i960
Intel IA32--Native Approaches
Intel IA32--Decoupled Approaches
x86-64
MIPS
Motorola
PowerPC--32-bit Architecture
PowerPC--64-bit Architecture
PowerPC-AS
SPARC Version 8
SPARC Version 9
Verification of Superscalar Processors
Acknowledgments
Advanced Instruction Flow Techniques
Introduction
Static Branch Prediction Techniques
Single-Direction Prediction
Backwards Taken/Forwards Not-Taken
Ball/Larus Heuristics
Profiling
Dynamic Branch Prediction Techniques
Basic Algorithms
Interference-Reducing Predictors
Predicting with Alternative Contexts
Hybrid Branch Predictors
The Tournament Predictor
Static Predictor Selection
Branch Classification
The Multihybrid Predictor
Prediction Fusion
Other Instruction Flow Issues and Techniques
Target Prediction
Branch Confidence Prediction
High-Bandwidth Fetch Mechanisms
High-Frequency Fetch Mechanisms
Summary
Advanced Register Data Flow Techniques
Introduction
Value Locality and Redundant Execution
Causes of Value Locality
Quantifying Value Locality
Exploiting Value Locality without Speculation
Memoization
Instruction Reuse
Basic Block and Trace Reuse
Data Flow Region Reuse
Concluding Remarks
Exploiting Value Locality with Speculation
The Weak Dependence Model
Value Prediction
The Value Prediction Unit
Speculative Execution Using Predicted Values
Performance of Value Prediction
Concluding Remarks
Summary
Executing Multiple Threads
Introduction
Synchronizing Shared-Memory Threads
Introduction to Multiprocessor Systems
Fully Shared Memory, Unit Latency, and Lack of Contention
Instantaneous Propagation of Writes
Coherent Shared Memory
Implementing Cache Coherence
Multilevel Caches, Inclusion, and Virtual Memory
Memory Consistency
The Coherent Memory Interface
Concluding Remarks
Explicitly Multithreaded Processors
Chip Multiprocessors
Fine-Grained Multithreading
Coarse-Grained Multithreading
Simultaneous Multithreading
Implicitly Multithreaded Processors
Resolving Control Dependences
Resolving Register Data Dependences
Resolving Memory Data Dependences
Concluding Remarks
Executing the Same Thread
Fault Detection
Prefetching
Branch Resolution
Concluding Remarks
Summary
Index