| |
| |
Preface | |
| |
| |
Contributors | |
| |
| |
| |
Programming Model | |
| |
| |
| |
ClusterGOP: A High-Level Programming Environment for Clusters | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
GOP Model and ClusterGOP Architecture | |
| |
| |
| |
VisualGOP | |
| |
| |
| |
The ClusterGOP Library | |
| |
| |
| |
MPMD Programming Support | |
| |
| |
| |
Programming Using ClusterGOP | |
| |
| |
| |
Summary | |
| |
| |
| |
The Challenge of Providing A High-Level Programming Model for High-Performance Computing | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
HPC Architectures | |
| |
| |
| |
HPC Programming Models: The First Generation | |
| |
| |
| |
The Second generation of HPC Programming Models | |
| |
| |
| |
OpenMP for DMPs | |
| |
| |
| |
Experiments with OpenMP on DMPs | |
| |
| |
| |
Conclusions | |
| |
| |
| |
SAT: Toward Structured Parallelism Using Skeletons | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
SAT: A Methodology Outline | |
| |
| |
| |
Skeletons and Collective Operations | |
| |
| |
| |
Case Study: Maximum Segment SUM (MSS) | |
| |
| |
| |
Performance Aspect in SAT | |
| |
| |
| |
Conclusions and Related Work | |
| |
| |
| |
Bulk-Synchronous Parallelism: An Emerging Paradigm of High-Performance Computing | |
| |
| |
| |
| |
The BSP Model | |
| |
| |
| |
BSP Programming | |
| |
| |
| |
Conclusions | |
| |
| |
| |
Cilk Versus MPI: Comparing Two Parallel Programming Styles on Heterogenous Systems | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
Experiments | |
| |
| |
| |
Results | |
| |
| |
| |
Conclusion | |
| |
| |
| |
Nested Parallelism and Pipelining in OpenMP | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
OpenMP Extensions for Nested Parallelism | |
| |
| |
| |
OpenMP Extensions for Thread Synchronization | |
| |
| |
| |
Summary | |
| |
| |
| |
OpenMP for Chip Multiprocessors | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
3SoC Architecture Overview | |
| |
| |
| |
The OpenMP Conpiler/Translator | |
| |
| |
| |
Extensions to OpenMP for DSEs | |
| |
| |
| |
Optimization for OpenMP | |
| |
| |
| |
Implementation | |
| |
| |
| |
Performance Evaluation | |
| |
| |
| |
Conclusions | |
| |
| |
| |
Architectural And System Support | |
| |
| |
| |
Compiler and Run-Time Parallelization Techniques for Scientific Computations on Distributed-Memory Parallel Computers | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
Background Material | |
| |
| |
| |
Compiling Regular Programs on DMPCs | |
| |
| |
| |
Compiler and Run-Time Support for Irregular Programs | |
| |
| |
| |
Library Support for Irregular Applications | |
| |
| |
| |
Related Works | |
| |
| |
| |
Concluding Remarks | |
| |
| |
| |
Enabling Partial-Cache Line Prefetching Through Data Compression | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
Motivation of Partial Cache-Line Perfetching | |
| |
| |
| |
Cache Design Details | |
| |
| |
| |
Experimental Results | |
| |
| |
| |
Related Work | |
| |
| |
| |
Conclusion | |
| |
| |
| |
MPI Atomicity and Concurrent Overlapping I/O | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
Concurrent Overlapping I/O | |
| |
| |
| |
Implementation Strategies | |
| |
| |
| |
Experiment Results | |
| |
| |
| |
Summary | |
| |
| |
| |
Code Tiling: One Size Fits All | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
Cache Model | |
| |
| |
| |
Code Tiling | |
| |
| |
| |
Data Tiling | |
| |
| |
| |
Finding Optimal Tile Sizes | |
| |
| |
| |
Experimental Results | |
| |
| |
| |
Related Work | |
| |
| |
| |
Conclusion | |
| |
| |
| |
Data Conversion for Heterogeneous Migration/Checkpointing | |
| |
| |
| |
| |
Introduction | |
| |
| |
| |
Migration and Checkpointing | |
| |
| |
| |
Data Conversion | |
| |
| |
| |
Coarse-Grain Tagged RMR in MigThread | |
| |
| |
| |
Microbenchmarks and Experiments | |
| |
| |
| |
Related Work | |
| |
| |
| |
Conclusions and Future Work | |
| |
| |
| |
Receiving-Message Prediction and Its Speculative Execution | |
| |
| |
| |
| |
Background | |
| |
| |
| |
Receiving-Message Prediction Meth | |