IEEE High Performance Extreme Computing Conference (HPEC 2012)

2013 IEEE High Performance Extreme Computing Conference (HPEC ‘13) Seventeenth Annual HPEC Conference 10 - 12 September 2013 Westin Hotel, Waltham, MA USA

Site created and maintained by Ballos Associates

Instructor: Deshanand Singh, Director of High-Level Design at Altera In recent years, Field-Programmable Gate Arrays have become extremely powerful computational platforms that can efficiently solve many complex problems. Modern FPGAs comprise effectively millions of programmable elements, signal processing elements and high-speed interfaces, all of which are necessary to deliver a complete solution. The power of FPGAs is unlocked via low-level programming languages such as VHDL and Verilog, which allow designers to explicitly specify the behavior of each programmable element. While these languages provide a means to create highly efficient logic circuits, they are akin to “assembly language” programming for modern processors. This is a serious limiting factor for both productivity and the adoption of FPGAs on a wider scale. In this tutorial, we use the OpenCL language to explore techniques that allow us to program FPGAs at a level of abstraction closer to traditional software-centric approaches. OpenCL is an industry standard parallel language based on ‘C’ that offers numerous advantages that enable designers to take full advantage of the capabilities offered by FPGAs, while providing a high-level design entry language that is familiar to a wide range of programmers. Because this is a tutorial, we will walk you through progressively more complex examples. We will start with Lincoln Lab’s own HPEC Challenge TDFIR benchmark. We selected it because it is simple and familiar to the HPEC community. Lincoln’s benchmark includes data and a test harness that verifies correctness. The optimal OpenCL implementation requires slightly different styles of coding to exploit the underlying architectures of a CPU, GPU and FPGA. Next we will examine an implementation of single precision, general-element, matrix multiplication (SGEMM). It is an example of a highly- parallel algorithm for which an efficient circuit structures are well known. We show how this application can be implemented in OpenCL and how the high-level description can be optimized to generate the most efficient circuit in hardware. Finally, we will show a detailed case study of a video compression algorithm written in OpenCL. We will walk through the optimization steps required to transform the code from an initial working implementation to one that is able to provide significantly higher performance than a modern GPU. In addition to the performance benefits of the FPGA implementation, we will show board level power measurements that demonstrate compelling power efficiency as well. The tutorial will contain live demonstrations and is supported by Altera and BittWare. It will include an overview of OpenCL but attendees will gain more from this tutorial if they walk in with some basic knowledge of OpenCL.

Senior Advisory Board
Technical Committee
Sponsors
Program Committee
Call for Papers
Submit Technical Talk / Poster / Vendor Demo
Paper Submission Guidelines
Presentation Guidelines
Projected Conference Dates
Past Proceedings