Programming Massively Parallel Processors A Hands-On Approach

ISBN-10: 0123814723
ISBN-13: 9780123814722
Edition: 2010
List price: $69.95
eBook available
This item qualifies for FREE shipping

*A minimum purchase of $35 is required. Shipping is provided via FedEx SmartPost® and FedEx Express Saver®. Average delivery time is 1 – 5 business days, but is not guaranteed in that timeframe. Also allow 1 - 2 days for processing. Free shipping is eligible only in the continental United States and excludes Hawaii, Alaska and Puerto Rico. FedEx service marks used by permission."Marketplace" orders are not eligible for free or discounted shipping.

30 day, 100% satisfaction guarantee

If an item you ordered from TextbookRush does not meet your expectations due to an error on our part, simply fill out a return request and then return it by mail within 30 days of ordering it for a full refund of item cost.

Learn more about our returns policy

eBooks Starting from $68.95
Buy
what's this?
Rush Rewards U
Members Receive:
coins
coins
You have reached 400 XP and carrot coins. That is the daily max!

Study Briefs

Limited time offer: Get the first one free! (?)

All the information you need in one place! Each Study Brief is a summary of one specific subject; facts, figures, and explanations to help you learn faster.

Add to cart
Study Briefs
Italian Grammar Online content $4.95 $1.99
Add to cart
Study Briefs
Portuguese Grammar Online content $4.95 $1.99
Add to cart
Study Briefs
Spanish Grammar Online content $4.95 $1.99
Add to cart
Study Briefs
German Grammar Online content $4.95 $1.99

Customers also bought

Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

Book details

List price: $69.95
Copyright year: 2010
Publisher: Elsevier Science & Technology Books
Publication date: 1/22/2010
Binding: Paperback
Pages: 280
Size: 7.50" wide x 9.25" long x 0.75" tall
Weight: 1.584
Language: English

Wen-mei W. Hwu is the Walter J. ("Jerry") Sanders III-Advanced Micro Devices Endowed Chair in Electrical and Computer Engineering in the Coordinated Science Laboratory of the University of Illinois at Urbana-Champaign. From 1997 to 1999, Dr. Hwu served as the chairman of the Computer Engineering Program at the University of Illinois. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley. His research interests are in the areas of architecture, implementation, and software for high-performance computer systems. He is the director of the OpenIMPACT project, which has delivered new compiler and computer architecture technologies to the computer industry since 1987. He also serves as the Soft Systems Theme leader of the MARCO/DARPA Gigascale Silicon Research Center (GSRC) and on the Executive Committees of both the GSRC and the MARCO/DARPA Center for Circuit and System Solutions. For his contributions to the areas of compiler optimization and computer architecture, he received the 1993 Eta Kappa Nu Outstanding Young Electrical Engineer Award, the 1994 Xerox Award for Faculty Research, the 1994 University Scholar Award of the University of Illinois, the 1997 Eta Kappa Nu Holmes MacDonald Outstanding Teaching Award, the 1998 ACM SigArch Maurice Wilkes Award, the 1999 ACM Grace Murray Hopper Award, the 2001 Tau Beta Pi Daniel C. Drucker Eminent Faculty Award. He served as the Franklin Woeltge Distinguished Professor of Electrical and Computer Engineering from 2000 to 2004. He is a fellow of IEEE and ACM.

Preface
Acknowledgments
Dedication
Introduction
GPUs as Parallel Computers
Architecture of a Modern GPU
Why More Speed or Parallelism?
Parallel Programming Languages and Models
Overarching Goals
Organization of the Book
History Of GPU Computing
Evolution of Graphics Pipelines
The Era of Fixed-Function Graphics Pipelines
Evolution of Programmable Real-Time Graphics
Unified Graphics and Computing Processors
GPGPU: An Intermediate Step
GPU Computing
Scalable GPUs
Recent Developments
Future Trends
Introduction To Cuda
Data Parallelism
Cuda Program Structure
A Matrix-Matrix Multiplication Example
Device Memories and Data Transfer
Kernel Functions and Threading
Summary
Function declarations
Kernel launch
Predefined variables
Runtime API
Cuda Threads
Cuda Thread Organization
Using blockIdx and threadIdx
Synchronization and Transparent Scalability
Thread Assignment
Thread Scheduling and Latency Tolerance
Summary
Exercises
Cuda� Memories
Importance of Memory Access Efficiency
CUDA Device Memory Types
A Strategy for Reducing Global Memory Traffic
Memory as a Limiting Factor to Parallelism
Summary
Exercises
Performance On Siderations
More on Thread Execution
Global Memory Bandwidth
Dynamic Partitioning of SM Resources
Data Prefetching
Instruction Mix
Thread Granularity
Measured Performance and Summary
Exercises
Floating Point Considerations
Floating-Point Format
Normalized Representation of M
Excess Encoding of E
Representable Numbers
Special Bit Patterns and Precision
Arithmetic Accuracy and Rounding
Algorithm Considerations
Summary
Exercises
Application Case Study: Advanced MRI Reconstruction
Application Background
Iterative Reconstruction
Computing F<sup>H</sup>d
Determine the Kernel Parallelism Structure
Getting Around the Memory Bandwidth Limitation
Using Hardware Trigonometry Functions
Experimental Performance Tuning
Final Evaluation
Exercises
Application Case Study: Molecular Visualization and Analysis
Application Background
A Simple Kernel Implementation
Instruction Execution Efficiency
Memory Coalescing
Additional Performance Comparisons
Using Multiple GPUs
Exercises
Parallel Programming and Computational Thinking
Goals of Parallel Programming
Problem Decomposition
Algorithm Selection
Computational Thinking
Exercises
A Brief Introduction To Opencl�
Background
Data Parallelism Model
Device Architecture
Kernel Functions
Device Management and Kernel Launch
Electrostatic Potential Map in OpenCL
Summary
Exercises
Conclusion And Future Outlook
Goals Revisited
Memory Architecture Evolution
Large Virtual and Physical Address Spaces
Unified Device Memory Space
Configurable Caching and Scratch Pad
Enhanced Atomic Operations
Enhanced Global Memory Access
Kernel Execution Control Evolution
Function Calls within Kernel Functions
Exception Handling in Kernel Functions
Simultaneous Execution of Multiple Kernels
Interruptible Kernels
Core Performance
Double-Precision Speed
Better Control Flow Efficiency
Programming Environment
A Bright Outlook
Matrix Multiplication Host-Only Version Source Code
matrixmul . cu
matrixmul_gold.cpp
matrixmul . h
assist.h
Expected Output
GPU Compute Capabilities
GPU Compute Capability Tables
Memory Coalescing Variations
Index

×
Free shipping on orders over $35*

*A minimum purchase of $35 is required. Shipping is provided via FedEx SmartPost® and FedEx Express Saver®. Average delivery time is 1 – 5 business days, but is not guaranteed in that timeframe. Also allow 1 - 2 days for processing. Free shipping is eligible only in the continental United States and excludes Hawaii, Alaska and Puerto Rico. FedEx service marks used by permission."Marketplace" orders are not eligible for free or discounted shipping.

Learn more about the TextbookRush Marketplace.

×