IEEE High Performance Extreme Computing Conference (HPEC 2012)

2013 IEEE High Performance Extreme Computing Conference (HPEC ‘13) Seventeenth Annual HPEC Conference 10 - 12 September 2013 Westin Hotel, Waltham, MA USA

Site created and maintained by Ballos Associates

D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database JV Kepner, MIT; Jeremy Kepner, MIT Abstract: Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel schemas. The Dynamic Distributed Dimensional Data Model (D4M)[http://www.mit.edu/~kepner/D4M] provides a uniform mathematical framework based on associative arrays that encompasses both traditional (i.e., SQL) and non-traditional databases. For non-traditional databases D4M naturally leads to a general purpose schema that can be used to fully index and rapidly query every unique string in a dataset. The D4M 2.0 Schema has been applied with little or no customization to cyber, bioinformatics, scientific citation, free text, and social media data. The D4M 2.0 Schema is simple, requires minimal parsing, and achieves the highest published Accumulo ingest rates. The benefits of the D4M 2.0 Schema are independent of the D4M interface. Any interface to Accumulo can achieve these benefits by using the D4M 2.0 Schema. Understanding Query Performance in Accumulo Scott Sawyer, MIT Lincoln Laboratory; B. David O'Gwynn, MIT Lincoln Laboratory; An Tran, MIT Lincoln Laboratory; Tamara Yu, MIT Lincoln Laboratory Abstract: Open-source, BigTable-like distributed databases provide a scalable storage solution for data-intensive applications. The simple key–value storage schema provides fast record ingest and retrieval, nearly independent of the quantity of data stored. However, real applications must support non-trivial queries that require careful key design and value indexing. We study an Apache Accumulo–based big data system designed for a network situational awareness application. The application’s storage schema and data retrieval requirements are analyzed. We then characterize the corresponding Accumulo performance bottlenecks. Queries are shown to be communication-bound and server-bound in different situations. Inefficiencies in the Accumulo software stack limit network and I/O performance. Additionally, in some situations, parallel clients can contend for server-side resources. Maximizing data retrieval rates for practical queries requires effective key design, indexing, and client parallelization. Big Snapshot Stitching with Scarce Overlap Alexandros-Stavros Iliopoulos, Duke University; Jun Hu, Department of Computer Science, Duke University; Nikos Pitsianis, Dept. of Electrical & Computer Engineering, Aristotle University of Thessaloniki; Xiaobai Sun, Dept. of Computer Science, Duke University Abstract: We address three properties that arise in gigapixel-scale image stitching for snapshot images captured with a novel micro- camera array system named AWARE-2. This system, developed last year by Brady et al. [1], features a greatly extended field of view and high optical resolution, offering unique sensing capabilities for a host of important applications. However, three simultaneously arising conditions pose a challenge to existing approaches to image stitching, with regard to the quality of the output image, as well as the automation and efficiency of the image composition process. Put simply, they may be described by the sparse, geometrically irregular, and noisy (S.I.N.) overlap amongst the fields of view of the constituent micro-cameras. We introduce a computational pipeline for image stitching under these conditions, which is scalable in terms of complexity and efficiency. With it, we also substantially reduce or eliminate ghosting effects due to micro-camera misalignment, without entailing manual intervention. Our present implementation of the pipeline leverages the combined use of multicore and GPU architectures. We present experimental results with the pipeline on real image data acquired with the AWARE-2. Biquad Implementation of an IIR Filter for IQ Mismatch Correction in an SoC RF Receiver Karen Gettings, MIT-LL; Michael Ericson, ; Andrew Bolstad, ; Xiao Wang, Abstract: This paper presents an IQ mismatch correction design and implementation that is part of a system-on-chip (SoC) that also includes a homodyne RF receiver and a sparse non-linear equalizer. It uses IIR filters to help the RF receiver achieve greater than an 80 dB image rejection ratio. The IIR filters are implemented using biquad structures to minimize power consumption by limiting the number of bits used per tap. The design was implemented in 65 nm CMOS technology and it is estimated to have a power performance of 150 GOPS per watt. Miniature Radar for Mobile Devices Praveen Sharma, MIT Lincoln Laboratory Abstract: We developed a miniature and low-cost radar (radio detection and ranging) sensor for mobile devices. A radar differs from other mobile sensors - it provides diverse capabilities such as detection, tracking, ranging, and imaging. As a proof-ofconcept for a radar sensor, we prototyped two X-band radars: using miniaturized X-band antennas: an X-band bi-static radar and time-division multiplexed, multiple-input multipleoutput, and eight-elements phased-array radar. Using these radar sensors, we also demonstrated data acquisition and signal processing in real-time and supported both standalone and distributed applications. In particular, Range-Time-Intensity (RTI), Doppler-Time-Intensity (DTI), and Range-Angle-Intensity (RAI) were depicted as the illustrative signal processing algorithms, and Android smartphone was used as the illustrative processing platform. 3D FFT for FPGAs Martin Herbordt, Boston University; Ben Humpries, Boston University Abstract: The 3D FFT is critical in electrostatics computations such as those used in Molecular Dynamics simulations. On FPGAs, however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of multigrid. We find the opposite: a simple design using less than half the chip resources, and operating at a very conservative frequency, takes less than 50us for $32^3$ and 200us for $64^3$ single precision data points, numbers similar to the best published for GPUs. The significance is that this is a critical piece in implementing a large scale FPGA-based MD engine: even a single FPGA is capable of keeping the FFT off of the critical path for a large fraction of possible simulations.

Senior Advisory Board
Technical Committee
Sponsors
Program Committee
Call for Papers
Submit Technical Talk / Poster / Vendor Demo
Paper Submission Guidelines
Presentation Guidelines
Projected Conference Dates
Past Proceedings