CUDA Fortran for Scientists and Engineers Best Practices for Efficient CUDA Fortran Programming

Name: CUDA Fortran for Scientists and Engineers Best Practices for Efficient CUDA Fortran Programming
Price: 53.99 USD
Availability: InStock
ISBN: 9780124169708

ISBN-10: 0124169708

ISBN-13: 9780124169708

Edition: 2014

Authors: Gregory Ruetsch, Massimiliano Fatica

This item qualifies for FREE shipping.

30 day, 100% satisfaction guarantee!

Sell

Get cash fast!

Buy new: $67.58

Marketplace

2 new & used from $53.99

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran. In order to add CUDA Fortran to existing Fortran codes, they explain how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance - all in Fortran, without having to rewrite in another language.…

Book details

Copyright year: 2014
Publisher: Elsevier Science & Technology
Publication date: 10/24/2013
Binding: Paperback
Pages: 338
Size: 7.50" wide x 9.21" long x 0.75" tall
Weight: 1.540
Language: English

Greg Ruetsch is a Senior Applied Engineer at NVIDIA, where he works on CUDA Fortran and performance optimization of HPC codes. He holds a Bachelor's degree in mechanical and aerospace engineering from Rutgers University and a Ph.D. in applied mathematics from Brown University. Prior to joining NVIDIA he has held research positions at Stanford University's Center for Turbulence Research and Sun Microsystems Laboratories.

Massimiliano Fatica is the manager of the Tesla HPC Group at NVIDIA where he works in the area of GPU computing (high-performance computing and clusters). He holds a laurea in Aeronautical Engineering and a Phd in Theoretical and Applied Mechanics from the University of Rome "La Sapienza. Prior to joining NVIDIA, he was a research staff member at Stanford University where he worked at the Center for Turbulence Research and Center for Integrated Turbulent Simulations on applications for the Stanford Streaming Supercomputer.



Acknowledgments


Preface



Cuda Fortran Programming



Introduction



A Brief History of GPU Computing



Parallel Computation



Basic Concepts



A First CUDA Fortran Program



Extending to Larger Arrays



Multidimensional Arrays



Determining CUDA Hardware Features and Limits



Single and Double Precision



Error Handling



Compiling CUDA Fortran Code



Separate Compilation



Performance Measurement and Metrics



Measuring Kernel Execution Time



Host-Device Synchronization and CPU Timers



Timing via CUDA Events



Command Line Profiler



The nvprof Profiling Tool



Instruction, Bandwidth, and Latency Bound Kernels



Memory Bandwidth



Theoretical Peak Bandwidth



Effective Bandwidth



Actual Data Throughput vs. Effective Bandwidth



Optimization



Transfers between Host and Device



Pinned Memory



Batching Small Data Transfers



Asynchronous Data Transfers (Advanced Topic)



Device Memory



Declaring Data in Device Code



Coalesced Access to Global Memory



Texture Memory



Local Memory



Constant Memory



On-Chip Memory



L1 Cache



Registers



Shared Memory



Memory Optimization Example: Matrix Transpose



Partition Camping (Advanced Topic)



Execution Configuration



Thread-Level Parallelism



Instruction-Level Parallelism



Instruction Optimization



Device Intrinsics



Compiler Options



Divergent Warps



Kernel Loop Directives



Reductions in CUF Kernels



Streams in CUF Kernels



Instruction-Level Parallelism in CUF Kernels



Multi-GPU Programming



CUDA Multi-GPU Features



Peer-to-Peer Communication



Peer-to-Peer Direct Transfers



Peer-to-Peer Transpose



Multi-GPU Programming with MPI



Assigning Devices to MPI Ranks



MPI Transpose



GPU-Aware MPI Transpose



Case Studies



Monte Carlo Method



CURAND



Computing ï¿½ with CUF Kernels



EEEE-754 Precision (Advanced Topic)



Computing ï¿½ with Reduction Kernels



Reductions with Atomic Locks (Advanced Topic)



Accuracy of Summation



Option Pricing



Finite Difference Method



Nine-Point ID Finite Difference Stencil



Data Reuse and Shared Memory



The x-Derivative Kernel



Derivatives in y and z



Nonuniform Grids



2D Laplace Equation



Applications of Fast Fourier Transform



CUFFT



Spectral Derivatives



Convolution



Poisson Solver



Appendices



Tesla Specifications



System and Environment Management



Environment Variables



General



Command Line Profiler



Just-in-Time Compilation



nvidia-smi System Management Interface



Enabling and Disabling ECC



Compute Mode



Persistence Mode



Calling CUDA C from CUDA Fortran



Calling CUDA C Libraries



Calling User-Written CUDA C Code



Source Code



Texture Memory



Matrix Transpose



Thread- and Instruction-Level Parallelism



Multi-GPU Programming



Peer-to-Peer Transpose



MPI Transpose with Host MPI Transfers



MPI Transpose with Device MPI Transfers



Finite Difference Code



Spectral Poisson Solver


References


Index