x

Our Privacy Policy has changed. By using this site, you agree to the Privacy Policy.

CUDA Fortran for Scientists and Engineers Best Practices for Efficient CUDA Fortran Programming

ISBN-10: 0124169708
ISBN-13: 9780124169708
Edition: N/A
Buy it from $68.79
eBook available
This item qualifies for FREE shipping

*A minimum purchase of $35 is required. Shipping is provided via FedEx SmartPost® and FedEx Express Saver®. Average delivery time is 1 – 5 business days, but is not guaranteed in that timeframe. Also allow 1 - 2 days for processing. Free shipping is eligible only in the continental United States and excludes Hawaii, Alaska and Puerto Rico. FedEx service marks used by permission."Marketplace" orders are not eligible for free or discounted shipping.

30 day, 100% satisfaction guarantee

If an item you ordered from TextbookRush does not meet your expectations due to an error on our part, simply fill out a return request and then return it by mail within 30 days of ordering it for a full refund of item cost.

Learn more about our returns policy

Description: CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume  More...

New Starting from $80.60
eBooks Starting from $49.95
Buy
what's this?
Rush Rewards U
Members Receive:
coins
coins
You have reached 400 XP and carrot coins. That is the daily max!

Study Briefs

Limited time offer: Get the first one free! (?)

All the information you need in one place! Each Study Brief is a summary of one specific subject; facts, figures, and explanations to help you learn faster.

Add to cart
Study Briefs
SQL Online content $4.95 $1.99
Add to cart
Study Briefs
MS Excel® 2010 Online content $4.95 $1.99
Add to cart
Study Briefs
MS Word® 2010 Online content $4.95 $1.99
Add to cart
Study Briefs
MS PowerPoint® 2010 Online content $4.95 $1.99

Customers also bought

Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

Book details

Publisher: Elsevier Science & Technology
Publication date: 10/24/2013
Binding: Paperback
Pages: 338
Size: 7.50" wide x 9.00" long x 0.75" tall
Weight: 1.584
Language: English

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran. In order to add CUDA Fortran to existing Fortran codes, they explain how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance - all in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison. Leverage the power of GPU computing with PGI's CUDA Fortran compilerGain insights from members of the CUDA Fortran language development teamOffers deep coverage of optimizing Fortran code for GPU architecturesIncludes multi-GPU programming in CUDA Fortran, covering both peer-to-peer and message passing interface (MPI) approachesIncludes full source code for all the examples and several case studies Download source code and slides from the book's companion website

Massimiliano Fatica is the manager of the Tesla HPC Group at NVIDIA where he works in the area of GPU computing (high-performance computing and clusters). He holds a laurea in Aeronautical Engineering and a Phd in Theoretical and Applied Mechanics from the University of Rome "La Sapienza. Prior to joining NVIDIA, he was a research staff member at Stanford University where he worked at the Center for Turbulence Research and Center for Integrated Turbulent Simulations on applications for the Stanford Streaming Supercomputer.

Greg Ruetsch is a Senior Applied Engineer at NVIDIA, where he works on CUDA Fortran and performance optimization of HPC codes. He holds a Bachelor's degree in mechanical and aerospace engineering from Rutgers University and a Ph.D. in applied mathematics from Brown University. Prior to joining NVIDIA he has held research positions at Stanford University's Center for Turbulence Research and Sun Microsystems Laboratories.

Acknowledgments
Preface
Cuda Fortran Programming
Introduction
A Brief History of GPU Computing
Parallel Computation
Basic Concepts
A First CUDA Fortran Program
Extending to Larger Arrays
Multidimensional Arrays
Determining CUDA Hardware Features and Limits
Single and Double Precision
Error Handling
Compiling CUDA Fortran Code
Separate Compilation
Performance Measurement and Metrics
Measuring Kernel Execution Time
Host-Device Synchronization and CPU Timers
Timing via CUDA Events
Command Line Profiler
The nvprof Profiling Tool
Instruction, Bandwidth, and Latency Bound Kernels
Memory Bandwidth
Theoretical Peak Bandwidth
Effective Bandwidth
Actual Data Throughput vs. Effective Bandwidth
Optimization
Transfers between Host and Device
Pinned Memory
Batching Small Data Transfers
Asynchronous Data Transfers (Advanced Topic)
Device Memory
Declaring Data in Device Code
Coalesced Access to Global Memory
Texture Memory
Local Memory
Constant Memory
On-Chip Memory
L1 Cache
Registers
Shared Memory
Memory Optimization Example: Matrix Transpose
Partition Camping (Advanced Topic)
Execution Configuration
Thread-Level Parallelism
Instruction-Level Parallelism
Instruction Optimization
Device Intrinsics
Compiler Options
Divergent Warps
Kernel Loop Directives
Reductions in CUF Kernels
Streams in CUF Kernels
Instruction-Level Parallelism in CUF Kernels
Multi-GPU Programming
CUDA Multi-GPU Features
Peer-to-Peer Communication
Peer-to-Peer Direct Transfers
Peer-to-Peer Transpose
Multi-GPU Programming with MPI
Assigning Devices to MPI Ranks
MPI Transpose
GPU-Aware MPI Transpose
Case Studies
Monte Carlo Method
CURAND
Computing � with CUF Kernels
EEEE-754 Precision (Advanced Topic)
Computing � with Reduction Kernels
Reductions with Atomic Locks (Advanced Topic)
Accuracy of Summation
Option Pricing
Finite Difference Method
Nine-Point ID Finite Difference Stencil
Data Reuse and Shared Memory
The x-Derivative Kernel
Derivatives in y and z
Nonuniform Grids
2D Laplace Equation
Applications of Fast Fourier Transform
CUFFT
Spectral Derivatives
Convolution
Poisson Solver
Appendices
Tesla Specifications
System and Environment Management
Environment Variables
General
Command Line Profiler
Just-in-Time Compilation
nvidia-smi System Management Interface
Enabling and Disabling ECC
Compute Mode
Persistence Mode
Calling CUDA C from CUDA Fortran
Calling CUDA C Libraries
Calling User-Written CUDA C Code
Source Code
Texture Memory
Matrix Transpose
Thread- and Instruction-Level Parallelism
Multi-GPU Programming
Peer-to-Peer Transpose
MPI Transpose with Host MPI Transfers
MPI Transpose with Device MPI Transfers
Finite Difference Code
Spectral Poisson Solver
References
Index

×
Free shipping on orders over $35*

*A minimum purchase of $35 is required. Shipping is provided via FedEx SmartPost® and FedEx Express Saver®. Average delivery time is 1 – 5 business days, but is not guaranteed in that timeframe. Also allow 1 - 2 days for processing. Free shipping is eligible only in the continental United States and excludes Hawaii, Alaska and Puerto Rico. FedEx service marks used by permission."Marketplace" orders are not eligible for free or discounted shipping.

Learn more about the TextbookRush Marketplace.

×