© Lorem Ipsum Dolor 2010
2013 IEEE High Performance
Extreme Computing Conference
(HPEC ‘13)
Seventeenth Annual HPEC Conference
10 - 12 September 2013
Westin Hotel, Waltham, MA USA
D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database
JV Kepner, MIT; Jeremy Kepner, MIT
Abstract: Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big
Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed
consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel
schemas. The Dynamic Distributed Dimensional Data Model (D4M)[http://www.mit.edu/~kepner/D4M] provides a uniform
mathematical framework based on associative arrays that encompasses both traditional (i.e., SQL) and non-traditional databases.
For non-traditional databases D4M naturally leads to a general purpose schema that can be used to fully index and rapidly query
every unique string in a dataset. The D4M 2.0 Schema has been applied with little or no customization to cyber, bioinformatics,
scientific citation, free text, and social media data. The D4M 2.0 Schema is simple, requires minimal parsing, and achieves the
highest published Accumulo ingest rates. The benefits of the D4M 2.0 Schema are independent of the D4M interface. Any interface
to Accumulo can achieve these benefits by using the D4M 2.0 Schema.
Understanding Query Performance in Accumulo
Scott Sawyer, MIT Lincoln Laboratory; B. David O'Gwynn, MIT Lincoln Laboratory; An Tran, MIT Lincoln Laboratory; Tamara Yu, MIT
Lincoln Laboratory
Abstract: Open-source, BigTable-like distributed databases provide a scalable storage solution for data-intensive applications. The
simple key–value storage schema provides fast record ingest and retrieval, nearly independent of the quantity of data stored.
However, real applications must support non-trivial queries that require careful key design and value indexing. We study an Apache
Accumulo–based big data system designed for a network situational awareness application. The application’s storage schema and
data retrieval requirements are analyzed. We then characterize the corresponding Accumulo performance bottlenecks. Queries are
shown to be communication-bound and server-bound in different situations. Inefficiencies in the Accumulo software stack limit
network and I/O performance. Additionally, in some situations, parallel clients can contend for server-side resources. Maximizing
data retrieval rates for practical queries requires effective key design, indexing, and client parallelization.
Big Snapshot Stitching with Scarce Overlap
Alexandros-Stavros Iliopoulos, Duke University; Jun Hu, Department of Computer Science, Duke University; Nikos Pitsianis, Dept. of
Electrical & Computer Engineering, Aristotle University of Thessaloniki; Xiaobai Sun, Dept. of Computer Science, Duke University
Abstract: We address three properties that arise in gigapixel-scale image stitching for snapshot images captured with a novel micro-
camera array system named AWARE-2. This system, developed last year by Brady et al. [1], features a greatly extended field of view
and high optical resolution, offering unique sensing capabilities for a host of important applications. However, three simultaneously
arising conditions pose a challenge to existing approaches to image stitching, with regard to the quality of the output image, as well
as the automation and efficiency of the image composition process. Put simply, they may be described by the sparse, geometrically
irregular, and noisy (S.I.N.) overlap amongst the fields of view of the constituent micro-cameras. We introduce a computational
pipeline for image stitching under these conditions, which is scalable in terms of complexity and efficiency. With it, we also
substantially reduce or eliminate ghosting effects due to micro-camera misalignment, without entailing manual intervention. Our
present implementation of the pipeline leverages the combined use of multicore and GPU architectures. We present experimental
results with the pipeline on real image data acquired with the AWARE-2.
Biquad Implementation of an IIR Filter for IQ Mismatch Correction in an SoC RF Receiver
Karen Gettings, MIT-LL; Michael Ericson, ; Andrew Bolstad, ; Xiao Wang,
Abstract: This paper presents an IQ mismatch correction design and implementation that is part of a system-on-chip (SoC) that also
includes a homodyne RF receiver and a sparse non-linear equalizer. It uses IIR filters to help the RF receiver achieve greater than an
80 dB image rejection ratio. The IIR filters are implemented using biquad structures to minimize power consumption by limiting the
number of bits used per tap. The design was implemented in 65 nm CMOS technology and it is estimated to have a power
performance of 150 GOPS per watt.
Miniature Radar for Mobile Devices
Praveen Sharma, MIT Lincoln Laboratory
Abstract: We developed a miniature and low-cost radar (radio detection and ranging) sensor for mobile devices. A radar differs from
other mobile sensors - it provides diverse capabilities such as detection, tracking, ranging, and imaging. As a proof-ofconcept for a
radar sensor, we prototyped two X-band radars: using miniaturized X-band antennas: an X-band bi-static radar and time-division
multiplexed, multiple-input multipleoutput, and eight-elements phased-array radar. Using these radar sensors, we also
demonstrated data acquisition and signal processing in real-time and supported both standalone and distributed applications. In
particular, Range-Time-Intensity (RTI), Doppler-Time-Intensity (DTI), and Range-Angle-Intensity (RAI) were depicted as the
illustrative signal processing algorithms, and Android smartphone was used as the illustrative processing platform.
3D FFT for FPGAs
Martin Herbordt, Boston University; Ben Humpries, Boston University
Abstract: The 3D FFT is critical in electrostatics computations such as those used in Molecular Dynamics simulations. On FPGAs,
however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of
multigrid. We find the opposite: a simple design using less than half the chip resources, and operating at a very conservative
frequency, takes less than 50us for $32^3$ and 200us for $64^3$ single precision data points, numbers similar to the best published
for GPUs. The significance is that this is a critical piece in implementing a large scale FPGA-based MD engine: even a single FPGA is
capable of keeping the FFT off of the critical path for a large fraction of possible simulations.