Programming Massively Parallel Processors A Hands-On Approach

Name: Programming Massively Parallel Processors A Hands-On Approach
Price: 54.0 USD
Availability: InStock
ISBN: 9780123814722

ISBN-10: 0123814723

ISBN-13: 9780123814722

Edition: 2010

Authors: David B. Kirk, Wen-mei W. Hwu

List price: $69.95

30 day, 100% satisfaction guarantee!

Marketplace

1 new & used from $54.00

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Book details

List price: $69.95
Copyright year: 2010
Publisher: Elsevier Science & Technology
Publication date: 2/22/2010
Binding: Paperback
Pages: 280
Size: 7.50" wide x 9.21" long x 0.75" tall
Weight: 1.298
Language: English

Wen-mei W. Hwu is the Walter J. ("Jerry") Sanders III-Advanced Micro Devices Endowed Chair in Electrical and Computer Engineering in the Coordinated Science Laboratory of the University of Illinois at Urbana-Champaign. From 1997 to 1999, Dr. Hwu served as the chairman of the Computer Engineering Program at the University of Illinois. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley. His research interests are in the areas of architecture, implementation, and software for high-performance computer systems. He is the director of the OpenIMPACT project, which has delivered new compiler and computer architecture technologies to the computer…



Preface


Acknowledgments


Dedication



Introduction



GPUs as Parallel Computers



Architecture of a Modern GPU



Why More Speed or Parallelism?



Parallel Programming Languages and Models



Overarching Goals



Organization of the Book



History Of GPU Computing



Evolution of Graphics Pipelines



The Era of Fixed-Function Graphics Pipelines



Evolution of Programmable Real-Time Graphics



Unified Graphics and Computing Processors



GPGPU: An Intermediate Step



GPU Computing



Scalable GPUs



Recent Developments



Future Trends



Introduction To Cuda



Data Parallelism



Cuda Program Structure



A Matrix-Matrix Multiplication Example



Device Memories and Data Transfer



Kernel Functions and Threading



Summary



Function declarations



Kernel launch



Predefined variables



Runtime API



Cuda Threads



Cuda Thread Organization



Using blockIdx and threadIdx



Synchronization and Transparent Scalability



Thread Assignment



Thread Scheduling and Latency Tolerance



Summary



Exercises



Cudaï¿½ Memories



Importance of Memory Access Efficiency



CUDA Device Memory Types



A Strategy for Reducing Global Memory Traffic



Memory as a Limiting Factor to Parallelism



Summary



Exercises



Performance On Siderations



More on Thread Execution



Global Memory Bandwidth



Dynamic Partitioning of SM Resources



Data Prefetching



Instruction Mix



Thread Granularity



Measured Performance and Summary



Exercises



Floating Point Considerations



Floating-Point Format



Normalized Representation of M



Excess Encoding of E



Representable Numbers



Special Bit Patterns and Precision



Arithmetic Accuracy and Rounding



Algorithm Considerations



Summary



Exercises



Application Case Study: Advanced MRI Reconstruction



Application Background



Iterative Reconstruction



Computing F<sup>H</sup>d



Determine the Kernel Parallelism Structure



Getting Around the Memory Bandwidth Limitation



Using Hardware Trigonometry Functions



Experimental Performance Tuning



Final Evaluation



Exercises



Application Case Study: Molecular Visualization and Analysis



Application Background



A Simple Kernel Implementation



Instruction Execution Efficiency



Memory Coalescing



Additional Performance Comparisons



Using Multiple GPUs



Exercises



Parallel Programming and Computational Thinking



Goals of Parallel Programming



Problem Decomposition



Algorithm Selection



Computational Thinking



Exercises



A Brief Introduction To Openclï¿½



Background



Data Parallelism Model



Device Architecture



Kernel Functions



Device Management and Kernel Launch



Electrostatic Potential Map in OpenCL



Summary



Exercises



Conclusion And Future Outlook



Goals Revisited



Memory Architecture Evolution



Large Virtual and Physical Address Spaces



Unified Device Memory Space



Configurable Caching and Scratch Pad



Enhanced Atomic Operations



Enhanced Global Memory Access



Kernel Execution Control Evolution



Function Calls within Kernel Functions



Exception Handling in Kernel Functions



Simultaneous Execution of Multiple Kernels



Interruptible Kernels



Core Performance



Double-Precision Speed



Better Control Flow Efficiency



Programming Environment



A Bright Outlook



Matrix Multiplication Host-Only Version Source Code



matrixmul . cu



matrixmul_gold.cpp



matrixmul . h



assist.h



Expected Output



GPU Compute Capabilities



GPU Compute Capability Tables



Memory Coalescing Variations


Index