© Lorem Ipsum Dolor 2010
2013 IEEE High Performance
Extreme Computing Conference
(HPEC ‘13)
Seventeenth Annual HPEC Conference
10 - 12 September 2013
Westin Hotel, Waltham, MA USA
A Mechanism to Improve the Performance of Hybrid MPI-OpenMP Applications in Grid
Shikha Mehrotra, C-DAC; Shamjith K V, C-DAC; Prachi Pandey, C-DAC; Asvija B, C-DAC; Sridharan R, C-DAC
Abstract: In the current scenario of grid computing, heterogeneous resources are distributed across different administrative domains
and geographical boundaries. Every node in a cluster consists of multiple core CPUs wherein the distributed memory across nodes and
shared memory co-exists, thereby paving way for hybrid architectures. The hybrid programming approach combines MPI and OpenMP
libraries to exploit this hierarchical multicore architecture. The clear requirements of such hybrid application and knowledge of the
system architecture will help to boost the application performance. Scheduling these hybrid applications on the grid becomes a critical
task for obtaining better performance. In this paper, we outline the attempt made in improving the scheduling mechanism for the
hybrid applications based on the requirements of the application.
Expanding the High Performance Embedded Computing Tool Chest - Mixing C and Java™
Nazario Irizarry, The MITRE Corporation
Abstract: High performance embedded computing systems are often implemented in the C language to achieve the utmost in speed. In
light of continued budget reductions and the everpresent desire for quicker development timelines, safer and more productive
languages need to be used as well. Java is often overlooked due to the perception that it is slow. Oracle Java 7 and C were compared to
better understand their relative performance in single and multicore applications. Java performed as well as C in many of the tests. The
quantitative findings and the conditions under which Java performs well help design solutions that exploit Java's code safety and
productivity.
Re-Introduction of Communication-Avoiding FMM-Accelerated FFTs with GPU Acceleration
M Harper Langston, Reservoir Labs; Muthu Baskaran, Reservoir Labs; Benoit Meister, Reservoir Labs; Nicolas Nicolas Vasilache ,
Reservoir Labs; Richard Lethin, Reservoir Labs
Abstract: As distributed memory systems grow larger, communication demands have increased. Unfortunately, while the costs of
arithmetic operations continue to decrease rapidly, communication costs have not. As a result, there has been a growing interest in
communication-avoiding algorithms for some of the classic problems in numerical computing. For example, there have been exciting
new innovations in the development of communication-avoiding Fast Fourier Transforms (FFTs). A previously-developed low-
communication FFT, however, has remained largely out of the picture, partially due to its reliance on the Fast Multipole Method
(FMM), an algorithm which aids in accelerating dense computations. In light of the renewed interest in this method and other low-
communication FFTs, we have begun an algorithmic investigation and re-implementation design for the FMM-FFT, which exploits the
ability to tune precision of the result (due to the mathematical nature of the FMM) to reduce power burning communication and
computation, the potential benefit of which is to reduce the energy required for the fundamental transform of digital signal processing.
We reintroduce this algorithm as well as discuss new innovations we have developed to separate the distinct portions of the FMM into
a CPU-dedicated process, relying on inter-processor communication for approximate interactions, and a GPU-dedicated process for
dense interactions with no communication.
Accelerating a Novel Particle-based Fluid Simulation on the GPU
Zhilu Chen, WPI; James Kingsley, WPI; Xinming Huang, Worcester Polytechnic Institute; Erkan Tuzel,
Abstract: Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids,
such as binary and ternary mixtures, and polymer solutions, in either two or three dimensions. Although SRD is efficient compared to
traditional methods, it is still computationally expensive for large system sizes, e.g. when using a large array of particles to simulate
dense polymer solutions. Recently, as the power offered by Graphics Processing Units (GPUs) has risen, General Purpose GPU (GPGPU)
computing has been introduced as an effective way to improve performance for parallel computation tasks. This work focuses on the
acceleration of SRD simulations using Nvidia's GPGPU architecture, CUDA. We find that while the speed improvements delivered by
GPU acceleration vary with the simulation version and parameters used, our GPU implementation runs around 10 times faster than the
CPU version for basic simulations, and up to 50 times faster for polymers in solution.
GPU Accelerated Elevation Map based Registration of Aerial Images
Joseph Fernando, University of Dayton Research
Abstract: This paper proposes a lower latency implementation of the georegistration algorithm proposed by Jovanovic et. al. The
algorithm has been modified to mitigate the registration errors and has been parallelized to map to a Graphical Processor Unit (GPU).
Also, the target image offset and the painting value computations have been combined to a single loop to eliminate the use of shared
memory. The equations and the algorithm required to generate accurate orthorectified and georegistered images from digital Satellite
images and aeriel photographs are proposed. The proposed modified algorithm has been implemented in compute unified device
(CUDA) architecture to reduce latency. A fixed coordinate system is used to represent the image, focal and projection planes.
Experimental results show that the proposed algorithm is capable of generating accurate georegistered images for high flying airborne
vehicles. While this method has been tested using aerial photographs, it can be extended to Satellite images as well as other image
data. A speedup of over 10x has been achieved over the CPU version.