Skip to content

Programming Massively Parallel Processors A Hands-On Approach

Best in textbook rentals since 2012!

ISBN-10: 0123814723

ISBN-13: 9780123814722

Edition: 2010

Authors: David B. Kirk, Wen-mei W. Hwu

List price: $69.95
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Customers also bought

Book details

List price: $69.95
Copyright year: 2010
Publisher: Elsevier Science & Technology
Publication date: 2/22/2010
Binding: Paperback
Pages: 280
Size: 7.50" wide x 9.21" long x 0.75" tall
Weight: 1.298
Language: English

Wen-mei W. Hwu is the Walter J. ("Jerry") Sanders III-Advanced Micro Devices Endowed Chair in Electrical and Computer Engineering in the Coordinated Science Laboratory of the University of Illinois at Urbana-Champaign. From 1997 to 1999, Dr. Hwu served as the chairman of the Computer Engineering Program at the University of Illinois. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley. His research interests are in the areas of architecture, implementation, and software for high-performance computer systems. He is the director of the OpenIMPACT project, which has delivered new compiler and computer architecture technologies to the computer…    

Preface
Acknowledgments
Dedication
Introduction
GPUs as Parallel Computers
Architecture of a Modern GPU
Why More Speed or Parallelism?
Parallel Programming Languages and Models
Overarching Goals
Organization of the Book
History Of GPU Computing
Evolution of Graphics Pipelines
The Era of Fixed-Function Graphics Pipelines
Evolution of Programmable Real-Time Graphics
Unified Graphics and Computing Processors
GPGPU: An Intermediate Step
GPU Computing
Scalable GPUs
Recent Developments
Future Trends
Introduction To Cuda
Data Parallelism
Cuda Program Structure
A Matrix-Matrix Multiplication Example
Device Memories and Data Transfer
Kernel Functions and Threading
Summary
Function declarations
Kernel launch
Predefined variables
Runtime API
Cuda Threads
Cuda Thread Organization
Using blockIdx and threadIdx
Synchronization and Transparent Scalability
Thread Assignment
Thread Scheduling and Latency Tolerance
Summary
Exercises
Cuda� Memories
Importance of Memory Access Efficiency
CUDA Device Memory Types
A Strategy for Reducing Global Memory Traffic
Memory as a Limiting Factor to Parallelism
Summary
Exercises
Performance On Siderations
More on Thread Execution
Global Memory Bandwidth
Dynamic Partitioning of SM Resources
Data Prefetching
Instruction Mix
Thread Granularity
Measured Performance and Summary
Exercises
Floating Point Considerations
Floating-Point Format
Normalized Representation of M
Excess Encoding of E
Representable Numbers
Special Bit Patterns and Precision
Arithmetic Accuracy and Rounding
Algorithm Considerations
Summary
Exercises
Application Case Study: Advanced MRI Reconstruction
Application Background
Iterative Reconstruction
Computing F<sup>H</sup>d
Determine the Kernel Parallelism Structure
Getting Around the Memory Bandwidth Limitation
Using Hardware Trigonometry Functions
Experimental Performance Tuning
Final Evaluation
Exercises
Application Case Study: Molecular Visualization and Analysis
Application Background
A Simple Kernel Implementation
Instruction Execution Efficiency
Memory Coalescing
Additional Performance Comparisons
Using Multiple GPUs
Exercises
Parallel Programming and Computational Thinking
Goals of Parallel Programming
Problem Decomposition
Algorithm Selection
Computational Thinking
Exercises
A Brief Introduction To Opencl�
Background
Data Parallelism Model
Device Architecture
Kernel Functions
Device Management and Kernel Launch
Electrostatic Potential Map in OpenCL
Summary
Exercises
Conclusion And Future Outlook
Goals Revisited
Memory Architecture Evolution
Large Virtual and Physical Address Spaces
Unified Device Memory Space
Configurable Caching and Scratch Pad
Enhanced Atomic Operations
Enhanced Global Memory Access
Kernel Execution Control Evolution
Function Calls within Kernel Functions
Exception Handling in Kernel Functions
Simultaneous Execution of Multiple Kernels
Interruptible Kernels
Core Performance
Double-Precision Speed
Better Control Flow Efficiency
Programming Environment
A Bright Outlook
Matrix Multiplication Host-Only Version Source Code
matrixmul . cu
matrixmul_gold.cpp
matrixmul . h
assist.h
Expected Output
GPU Compute Capabilities
GPU Compute Capability Tables
Memory Coalescing Variations
Index