Modern Processor Design Fundamentals of Superscalar Processors

Name: Modern Processor Design Fundamentals of Superscalar Processors
Price: 106.42 USD
Availability: InStock
ISBN: 9780070570641

ISBN-10: 0070570647

ISBN-13: 9780070570641

Edition: 2005

Authors: John P. Shen, Mikko Lipasti

List price: $173.33

30 day, 100% satisfaction guarantee!

Marketplace

1 new & used from $106.42

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Book details

List price: $173.33
Copyright year: 2005
Publisher: McGraw-Hill Higher Education
Publication date: 7/7/2004
Binding: Hardcover
Pages: 656
Size: 7.50" wide x 9.25" long x 1.25" tall
Weight: 2.530
Language: English



Table of Contents


Additional Resources


Preface



Processor Design



The Evolution of Microprocessors



Instruction Set Processor Design



Digital Systems Design



Architecture, Implementation, and Realization



Instruction Set Architecture



Dynamic-Static Interface



Principles of Processor Performance



Processor Performance Equation



Processor Performance Optimizations



Performance Evaluation Method



Instruction-Level Parallel Processing



From Scalar to Superscalar



Limits of Instruction-Level Parallelism



Machines for Instruction-Level Parallelism



Summary



Pipelined Processors



Pipelining Fundamentals



Pipelined Design



Arithmetic Pipeline Example



Pipelining Idealism



Instruction Pipelining



Pipelined Processor Design



Balancing Pipeline Stages



Unifying Instruction Types



Minimizing Pipeline Stalls



Commercial Pipelined Processors



Deeply Pipelined Processors



Summary



Memory and I/O Systems



Introduction



Computer System Overview



Key Concepts: Latency and Bandwidth



Memory Hierarchy



Components of a Modern Memory Hierarchy



Temporal and Spatial Locality



Caching and Cache Memories



Main Memory



Virtual Memory Systems



Demand Paging



Memory Protection



Page Table Architectures



Memory Hierarchy Implementation



Input/Output Systems



Types of I/O Devices



Computer System Busses



Communication with I/O Devices



Interaction of I/O Devices and Memory Hierarchy



Summary



Superscalar Organization



Limitations of Scalar Pipelines



Upper Bound on Scalar Pipeline Throughput



Inefficient Unification into a Single Pipeline



Performance Lost Due to a Rigid Pipeline



From Scalar to Superscalar Pipelines



Parallel Pipelines



Diversified Pipelines



Dynamic Pipelines



Superscalar Pipeline Overview



Instruction Fetching



Instruction Decoding



Instruction Dispatching



Instruction Execution



Instruction Completion and Retiring



Summary



Superscalar Techniques



Instruction Flow Techniques



Program Control Flow and Control Dependences



Performance Degradation Due to Branches



Branch Prediction Techniques



Branch Misprediction Recovery



Advanced Branch Prediction Techniques



Other Instruction Flow Techniques



Register Data Flow Techniques



Register Reuse and False Data Dependences



Register Renaming Techniques



True Data Dependences and the Data Flow Limit



The Classic Tomasulo Algorithm



Dynamic Execution Core



Reservation Stations and Reorder Buffer



Dynamic Instruction Scheduler



Other Register Data Flow Techniques



Memory Data Flow Techniques



Memory Accessing Instructions



Ordering of Memory Accesses



Load Bypassing and Load Forwarding



Other Memory Data Flow Techniques



Summary



The PowerPC 620



Introduction



Experimental Framework



Instruction Fetching



Branch Prediction



Fetching and Speculation



Instruction Dispatching



Instruction Buffer



Dispatch Stalls



Dispatch Effectiveness



Instruction Execution



Issue Stalls



Execution Parallelism



Execution Latency



Instruction Completion



Completion Parallelism



Cache Effects



Conclusions and Observations



Bridging to the IBM POWER3 and POWER4



Summary



Intel's P6 Microarchitecture



Introduction



Basics of the P6 Microarchitecture



Pipelining



In-Order Front-End Pipeline



Out-of-Order Core Pipeline



Retirement Pipeline



The In-Order Front End



Instruction Cache and ITLB



Branch Prediction



Instruction Decoder



Register Alias Table



Allocator



The Out-of-Order Core



Reservation Station



Retirement



The Reorder Buffer



Memory Subsystem



Memory Access Ordering



Load Memory Operations



Basic Store Memory Operations



Deferring Memory Operations



Page Faults



Summary



Acknowledgments



Survey of Superscalar Processors



Development of Superscalar Processors



Early Advances in Uniprocessor Parallelism: The IBM Stretch



First Superscalar Design: The IBM Advanced Computer System



Instruction-Level Parallelism Studies



By-Products of DAE: The First Multiple-Decoding Implementations



IBM Cheetah, Panther, and America



Decoupled Microarchitectures



Other Efforts in the 1980s



Wide Acceptance of Superscalar



A Classification of Recent Designs



RISC and CISC Retrofits



Speed Demons: Emphasis on Clock Cycle Time



Brainiacs: Emphasis on IPC



Processor Descriptions



Compaq / DEC Alpha



Hewlett-Packard PA-RISC Version 1.0



Hewlett-Packard PA-RISC Version 2.0



IBM POWER



Intel i960



Intel IA32--Native Approaches



Intel IA32--Decoupled Approaches



x86-64



MIPS



Motorola



PowerPC--32-bit Architecture



PowerPC--64-bit Architecture



PowerPC-AS



SPARC Version 8



SPARC Version 9



Verification of Superscalar Processors



Acknowledgments



Advanced Instruction Flow Techniques



Introduction



Static Branch Prediction Techniques



Single-Direction Prediction



Backwards Taken/Forwards Not-Taken



Ball/Larus Heuristics



Profiling



Dynamic Branch Prediction Techniques



Basic Algorithms



Interference-Reducing Predictors



Predicting with Alternative Contexts



Hybrid Branch Predictors



The Tournament Predictor



Static Predictor Selection



Branch Classification



The Multihybrid Predictor



Prediction Fusion



Other Instruction Flow Issues and Techniques



Target Prediction



Branch Confidence Prediction



High-Bandwidth Fetch Mechanisms



High-Frequency Fetch Mechanisms



Summary



Advanced Register Data Flow Techniques



Introduction



Value Locality and Redundant Execution



Causes of Value Locality



Quantifying Value Locality



Exploiting Value Locality without Speculation



Memoization



Instruction Reuse



Basic Block and Trace Reuse



Data Flow Region Reuse



Concluding Remarks



Exploiting Value Locality with Speculation



The Weak Dependence Model



Value Prediction



The Value Prediction Unit



Speculative Execution Using Predicted Values



Performance of Value Prediction



Concluding Remarks



Summary



Executing Multiple Threads



Introduction



Synchronizing Shared-Memory Threads



Introduction to Multiprocessor Systems



Fully Shared Memory, Unit Latency, and Lack of Contention



Instantaneous Propagation of Writes



Coherent Shared Memory



Implementing Cache Coherence



Multilevel Caches, Inclusion, and Virtual Memory



Memory Consistency



The Coherent Memory Interface



Concluding Remarks



Explicitly Multithreaded Processors



Chip Multiprocessors



Fine-Grained Multithreading



Coarse-Grained Multithreading



Simultaneous Multithreading



Implicitly Multithreaded Processors



Resolving Control Dependences



Resolving Register Data Dependences



Resolving Memory Data Dependences



Concluding Remarks



Executing the Same Thread



Fault Detection



Prefetching



Branch Resolution



Concluding Remarks



Summary


Index