| |
| |
Fine-Grain Scheduling under Resource Constraints | |
| |
| |
| |
Mutation Scheduling: A Unified Approach to Compiling for Fine-Grain Parallelism | |
| |
| |
| |
Compiler Techniques for Fine-Grain Execution on Workstation Clusters Using PAPERS | |
| |
| |
| |
Solving Alignment Using Elementary Linear Algebra | |
| |
| |
| |
Detecting and Using Affinity in an Automatic Data Distribution Tool | |
| |
| |
| |
Array Distribution in Data-Parallel Programs | |
| |
| |
| |
Communication-Free Parallelization via Affine Transformations | |
| |
| |
| |
Finding Legal Reordering Transformations Using Mappings | |
| |
| |
| |
A New Algorithm for Global Optimization for Parallelism and Locality | |
| |
| |
| |
Polaris: Improving the Effectiveness of Parallelizing Compilers | |
| |
| |
| |
A Formal Approach to the Compilation of Data-Parallel Languages | |
| |
| |
| |
The Data Partitioning Graph: Extending Data and Control Dependencies for Data Partitioning | |
| |
| |
| |
Detecting Value-Based Scalar Dependence | |
| |
| |
| |
Minimal Data Dependence Abstractions for Loop Transformations | |
| |
| |
| |
Differences in Algorithmic Parallelism in Control Flow and Call Multigraphs | |
| |
| |
| |
Flow-Insensitive Interprocedural Alias Analysis in the Presence of Pointers | |
| |
| |
| |
Incremental Generation of Index Sets for Array Statement Execution on Distributed-Memory Machines | |
| |
| |
| |
A Unified Data-Flow Framework for Optimizing Communication | |
| |
| |
| |
Interprocedural Communication Optimizations for Distributed Memory Compilation | |
| |
| |
| |
Analysis of Event Synchronization in Parallel Programs | |
| |
| |
| |
Computing Communication Sets for Control Parallel Programs | |
| |
| |
| |
Optimizing Parallel SPMD Programs | |
| |
| |
| |
An Overview of the Opus Language and Runtime System | |
| |
| |
| |
SIMPLE Performance Results in ZPL | |
| |
| |
| |
Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines | |
| |
| |
| |
EQ: Overview of a New Language Approach for Prototyping Scientific Computation | |
| |
| |
| |
Reshaping Access Patterns for Generating Sparse Codes | |
| |
| |
| |
Evaluating Two Loop Transformations for Reducing Multiple-Writer False Sharing | |
| |
| |
| |
Parallelizing Tree Algorithms: Overhead vs. Parallelism | |
| |
| |
| |
Autoscheduling in a Distributed Shared-Memory Environment | |
| |
| |
| |
Optimizing Array Distributions in Data-Parallel Programs | |
| |
| |
| |
Automatic Reduction Tree Generation for Fine-Grain Parallel Architectures when Iteration Count is Unknown | |
| |