| |
| |
Foreward | |
| |
| |
| |
Preface | |
| |
| |
| |
Introduction | |
| |
| |
| |
Performance with OpenMP | |
| |
| |
| |
A First Glimpse of OpenMP | |
| |
| |
| |
The OpenMP Parallel Computer | |
| |
| |
| |
Why OpenMP? | |
| |
| |
| |
History of OpenMP | |
| |
| |
| |
Navigating the Rest of the Book | |
| |
| |
| |
Getting Started with OpenMP | |
| |
| |
| |
Introduction | |
| |
| |
| |
OpenMP from 10,000 Meters | |
| |
| |
| |
OpenMP Compiler Directives or Pragmas | |
| |
| |
| |
Parallel Control Structures | |
| |
| |
| |
Communication and Data Environment | |
| |
| |
| |
Synchronization | |
| |
| |
| |
Parallelizing a Simple Loop | |
| |
| |
| |
Runtime Execution Model of an OpenMP Program | |
| |
| |
| |
Communication and Data Scoping | |
| |
| |
| |
Synchronization in the Simple Loop Example | |
| |
| |
| |
Final Words on the Simple Loop Example | |
| |
| |
| |
A More Complicated Loop | |
| |
| |
| |
Explicit Synchronization | |
| |
| |
| |
The reduction Clause | |
| |
| |
| |
Expressing Parallelism with Parallel Regions | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Exercises | |
| |
| |
| |
Exploiting Loop-Level Parallelism | |
| |
| |
| |
Introduction | |
| |
| |
| |
Form and Usage of the parallel do Directive | |
| |
| |
| |
Clauses | |
| |
| |
| |
Restrictions on Parallel Loops | |
| |
| |
| |
Meaning of the parallel do Directive | |
| |
| |
| |
Loop Nests and Parallelism | |
| |
| |
| |
Controlling Data Sharing | |
| |
| |
| |
General Properties of Data Scope Clauses | |
| |
| |
| |
The shared Clause | |
| |
| |
| |
The private Clause | |
| |
| |
| |
Default Variable Scopes | |
| |
| |
| |
Changing Default Scoping Rules | |
| |
| |
| |
Parallelizing Reduction Operations | |
| |
| |
| |
Private Variable Initialization and Finalization | |
| |
| |
| |
Removing Data Dependences | |
| |
| |
| |
Why Data Dependences Are a Problem | |
| |
| |
| |
The First Step: Detection | |
| |
| |
| |
The Second Step: Classification | |
| |
| |
| |
The Third Step: Removal | |
| |
| |
| |
Summary | |
| |
| |
| |
Enhancing Performance | |
| |
| |
| |
Ensuring Sufficient Work | |
| |
| |
| |
Scheduling Loops to Balance the Load | |
| |
| |
| |
Static and Dynamic Scheduling | |
| |
| |
| |
Scheduling Options | |
| |
| |
| |
Comparison of Runtime Scheduling Behavior | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Exercises | |
| |
| |
| |
Beyond Loop-Level Parallelism: Parallel Regions | |
| |
| |
| |
Introduction | |
| |
| |
| |
Form and Usage of the parallel Directive | |
| |
| |
| |
Clauses on the parallel Directive | |
| |
| |
| |
Restrictions on the parallel Directive | |
| |
| |
| |
Meaning of the parallel Directive | |
| |
| |
| |
Parallel Regions and SPMD-Style Parallelism | |
| |
| |
| |
threadprivate Variables and the copyin Clause | |
| |
| |
| |
The threadprivate Directive | |
| |
| |
| |
The copyin Clause | |
| |
| |
| |
Work-Sharing in Parallel Regions | |
| |
| |
| |
A Parallel Task Queue | |
| |
| |
| |
Dividing Work Based on Thread Number | |
| |
| |
| |
Work-Sharing Constructs in OpenMP | |
| |
| |
| |
Restrictions on Work-Sharing Constructs | |
| |
| |
| |
Block Structure | |
| |
| |
| |
Entry and Exit | |
| |
| |
| |
Nesting of Work-Sharing Constructs | |
| |
| |
| |
Orphaning of Work-Sharing Constructs | |
| |
| |
| |
Data Scoping of Orphaned Constructs | |
| |
| |
| |
Writing Code with Orphaned Work-Sharing Constructs | |
| |
| |
| |
Nested Parallel Regions | |
| |
| |
| |
Directive Nesting and Binding | |
| |
| |
| |
Controlling Parallelism in an OpenMP Program | |
| |
| |
| |
Dynamically Disabling the parallel Directives | |
| |
| |
| |
Controlling the Number of Threads | |
| |
| |
| |
Dynamic Threads | |
| |
| |
| |
Runtime Library Calls and Environment Variables | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Exercises | |
| |
| |
| |
Synchronization | |
| |
| |
| |
Introduction | |
| |
| |
| |
Data Conflicts and the Need for Synchronization | |
| |
| |
| |
Getting Rid of Data Races | |
| |
| |
| |
Examples of Acceptable Data Races | |
| |
| |
| |
Synchronization Mechanisms in OpenMP | |
| |
| |
| |
Mutual Exclusion Synchronization | |
| |
| |
| |
The Critical Section Directive | |
| |
| |
| |
The atomic Directive | |
| |
| |
| |
Runtime Library Lock Routines | |
| |
| |
| |
Event Synchronization | |
| |
| |
| |
Barriers | |
| |
| |
| |
Ordered Sections | |
| |
| |
| |
The master Directive | |
| |
| |
| |
Custom Synchronization: Rolling Your Own | |
| |
| |
| |
The flush Directive | |
| |
| |
| |
Some Practical Considerations | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Exercises | |
| |
| |
| |
Performance | |
| |
| |
| |
Introduction | |
| |
| |
| |
Key Factors That Impact Performance | |
| |
| |
| |
Coverage and Granularity | |
| |
| |
| |
Load Balance | |
| |
| |
| |
Locality | |
| |
| |
| |
Synchronization | |
| |
| |
| |
Performance-Tuning Methodology | |
| |
| |
| |
Dynamic Threads | |
| |
| |
| |
Bus-Based and NUMA Machines | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Exercises | |
| |
| |
| |
A Quick Reference to OpenMP | |
| |
| |
References | |
| |
| |
Index | |