Skip to content

Introduction to High Performance Computing for Scientists and Eng

Best in textbook rentals since 2012!

ISBN-10: 143981192X

ISBN-13: 9781439811924

Edition: 2010

Authors: Georg Hager, Gerhard Wellein

List price: $46.99
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Customers also bought

Book details

List price: $46.99
Copyright year: 2010
Publisher: CRC Press
Publication date: 7/7/2010
Binding: Paperback
Pages: 356
Size: 5.75" wide x 9.00" long x 0.75" tall
Weight: 1.342
Language: English

University of Erlangen Nuremberg, Germany

Foreword
Preface
About the authors
List of acronyms and abbreviations
Modern processors
Stored-program computer architecture
General-purpose cache-based microprocessor architecture
Performance metrics and benchmarks
Transistors galore: Moore's Law
Pipelining
Superscalarity
SIMD
Memory hierarchies
Cache
Cache mapping
Prefetch
Multicore processors
Multithreaded processors
Vector processors
Design principles
Maximum performance estimates
Programming for vector architectures
Basic optimization techniques for serial code
Scalar profiling
Function- and line-based runtime profiling
Hardware performance counters
Manual instrumentation
Common sense optimizations
Do less work!
Avoid expensive operations!
Shrink the working set!
Simple measures, large impact
Elimination of common subexpressions
Avoiding branches
Using SIMD instruction sets
The role of compilers
General optimization options
Inlining
Aliasing
Computational accuracy
Register optimizations
Using compiler logs
C++ optimizations
Temporaries
Dynamic memory management
Loop kernels and iterators
Data access optimization
Balance analysis and lightspeed estimates
Bandwidth-based performance modeling
The STREAM benchmarks
Storage order
Case study: The Jacobi algorithm
Case study: Dense matrix transpose
Algorithm classification and access optimizations
O(N)/O(N)
O(N<sup>2</sup>)/O(N<sup>2</sup>)
O(N<sup>3</sup>)/O(N<sup>2</sup>)
Case study: Sparse matrix-vector multiply
Sparse matrix storage schemes
Optimizing JDS sparse MVM
Parallel computers
Taxonomy of parallel computing paradigms
Shared-memory computers
Cache coherence
UMA
ccNUMA
Distributed-memory computers
Hierarchical (hybrid) systems
Networks
Basic performance characteristics of networks
Buses
Switched and fat-tree networks
Mesh networks
Hybrids
Basics of parallelization
Why parallelize?
Parallelism
Data parallelism
Functional parallelism
Parallel scalability
Factors that limit parallel execution
Scalability metrics
Simple scalability laws
Parallel efficiency
Serial performance versus strong scalability
Refined performance models
Choosing the right scaling baseline
Case study: Can slower processors compute faster?
Load imbalance
Shared-memory parallel programming with OpenMP
Short introduction to OpenMP
Parallel execution
Data scoping
OpenMP worksharing for loops
Synchronization
Reductions
Loop scheduling
Tasking
Miscellaneous
Case study: OpenMP-parallel Jacobi algorithm
Advanced OpenMP: Wavefront parallelization
Efficient OpenMP programming
Profiling OpenMP programs
Performance pitfalls
Ameliorating the impact of OpenMP worksharing constructs
Determining OpenMP overhead for short loops
Serialization
False sharing
Case study: Parallel sparse matrix-vector multiply
Locality optimizations on ccNUMA architectures
Locality of access on ccNUMA
Page placement by first touch
Access locality by other means
Case study: ccNUMA optimization of sparse MVM
Placement pitfalls
NUMA-unfriendly OpenMP scheduling
File system cache
ccNUMA issues with C++
Arrays of objects
Standard Template Library
Distributed-memory parallel programming with MPI
Message passing
A short introduction to MPI
A simple example
Messages and point-to-point communication
Collective communication
Nonblocking point-to-point communication
Virtual topologies
Example: MPI parallelization of a Jacobi solver
MPI implementation
Performance properties
Efficient MPI programming
MPI performance tools
Communication parameters
Synchronization, serialization, contention
Implicit serialization and synchronization
Contention
Reducing communication overhead
Optimal domain decomposition
Aggregating messages
Nonblocking vs. asynchronous communication
Collective communication
Understanding intranode point-to-point communication
Hybrid parallelization with MPI and OpenMP
Basic MPI/OpenMP programming models
Vector mode implementation
Task mode implementation
Case study: Hybrid Jacobi solver
MPI taxonomy of thread interoperability
Hybrid decomposition and mapping
Potential benefits and drawbacks of hybrid programming
Topology and affinity in multicore environments
Topology
Thread and process placement
External affinity control
Affinity under program control
Page placement beyond first touch
Solutions to the problems
Bibliography
Index