
Select an Action

An Environment for Accessing Remote Field-Programmable Custom Computing Machine Accelerators with Multi-Memory Banks
Title:
An Environment for Accessing Remote Field-Programmable Custom Computing Machine Accelerators with Multi-Memory Banks
Author:
Jadhav, Shrikant Shridhar, author.
ISBN:
9780355982046
Personal Author:
Physical Description:
1 electronic resource (136 pages)
General Note:
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Advisors: Christopher Doss; Clay Gloster Committee members: Corey Graves; Youngsoo Kim; Daniel Limbrick.
Abstract:
Recent research shows that a Field-Programmable Custom Computing Machine (FCCM) can be used to achieve an order-of-magnitude speedup of High-Performance Computing (HPC) applications. An FCCM, within the context of our research, is a host processor connected to one or more Field Programmable Gate Arrays (FPGAs). Typically, for some applications, FCCMs provide better performance than conventional processors and consume less energy than Graphics Processing Units (GPUs). FCCMs can also be integrated into cloud-computing environments and data centers to enhance the accessibility of these hardware accelerators. FCCMs are a viable choice for accelerating applications in various areas including bio-informatics, neuroinformatics, physics, computer vision, etc. where significant processing capability is required.
While extensive research is underway to accelerate computationally intensive tasks in HPC applications on FCCMs, few studies report frameworks and multi-memory architectures developed to provide access to FCCMs from a remote site. The remote environment introduced in this thesis, provided to access FCCM accelerators, proves beneficial to users with limited hardware/software knowledge. The remote environment allows users to access FCCM accelerators remotely from any part of the world. The remote environment also helps to reduce overall system cost since the users are not required to invest in an expensive FCCM design and infrastructure development.
Memory bandwidth is a major factor in developing FCCMs with execution times faster than traditional processors. For achieving high performance, FCCM processing elements must be able to consume/produce data at the highest possible data rates. If there are multiple memory banks, the number of memory accesses required to bring all data into the FCCM processing unit can be significantly reduced. Similarly, outputs can be transmitted in few numbers of memory write operations. In turn, increasing memory bandwidth will positively impact the overall performance of the FCCM accelerator. However, adding multiple memories adds extra hardware to the FCCM, increasing size/complexity, as an interface is needed for each additional memory.
In this thesis, we present a Java-based environment to access FCCM accelerators from a remote site. We have developed a hardware/software interface which supports multiple FPGAs connected to a server. This remote environment allows users with limited knowledge of the underlying hardware/software to take full advantage of FCCM accelerators by accessing FCCMs in a remote server from a client machine. We also provide a multi-memory architecture for accepting multiple inputs and producing multiple outputs per clock cycle. The architecture includes processor cores with pipelined functional units specifically tailored for each application. Additionally, we present an approach to achieve an order-of-magnitude speedup over a traditional software implementation executing on a conventional multi-core processor. Even though the clock frequency of the FCCM is an order of magnitude slower than a conventional multi-core processor, the FCCM is significantly faster.
This thesis also presents a case study using the Taylor Series to demonstrate the merits of the local FCCM. The Taylor Series is a computationally intensive application used in many HPC applications, e.g., computer vision [10], cellular networks [53] and deep neural network [26]. In our experiments, we executed the Taylor Series in software and compared execution times with an FCCM interfaced to the same machine. Our experiments show that the results obtained using our multi-memory architecture is approximately 481X faster than software executing the Taylor Series on a typical server. We also implemented other computational primitives, i.e., Power, Natural Logarithm and Exponential on a local FCCM and compared their execution time with their respective software implementation. The experiments show that, with our multi-memory architecture, we achieved 1-2 orders-of-magnitude speedup over software implementation.
This thesis also presents results that demonstrate the merits of a remote FCCM with the multi-memory architecture. In our experiments, we executed the FCCM implementation of the same applications on the remote server machine and transmitted data to be processed from a client machine. Our experiments show that the execution time for the remote FCCM is not constrained by the specialized processor, but rather by the network bandwidth. On a network with 300 Mbps upload speed, and 900 Mbps download speed of the Internet, execution of an application on the remote FCCM with our remote framework, with two clients and two servers, can achieve 10X speedup over the execution of a software implementation of the same application on the local machine.
Local Note:
School code: 1544
Added Corporate Author:
Available:*
Shelf Number | Item Barcode | Shelf Location | Status |
|---|---|---|---|
| XX(678714.1) | 678714-1001 | Proquest E-Thesis Collection | Searching... |
On Order
Select a list
Make this your default list.
The following items were successfully added.
There was an error while adding the following items. Please try again.
:
Select An Item
Data usage warning: You will receive one text message for each title you selected.
Standard text messaging rates apply.


