Eylem Seç
Fault Tolerance through Invariant Checks in Applications Using Linear Algebraic Methods
Başlık:
Fault Tolerance through Invariant Checks in Applications Using Linear Algebraic Methods
Yazar:
Loh, Felix Da Yuan, author.
ISBN:
9780438082496
Yazar Ek Girişi:
Fiziksel Tanımlama:
1 electronic resource (150 pages)
Genel Not:
Source: Dissertation Abstracts International, Volume: 79-11(E), Section: B.
Advisors: Parameswaran Ramanathan Committee members: Yu Hen Hu; Mikko Lipasti; Dan Negrut; Kewal Saluja.
Özet:
Graphics processing units (GPUs) have become a popular platform for scientific computing applications, many of which are based on linear algebra. As the minimum feature size of transistors decreases, GPUs are becoming more vulnerable to transient faults caused by events such as alpha particle strikes, power fluctuations and electronic noise. In addition, the likelihood of a fault increases as more GPU computing nodes are used in supercomputers to meet the increasingly demanding computational requirements of scientific applications. Consequently, there are concerns that GPU-based supercomputer systems will suffer from very high fault rates. In order to ensure reliability, it is necessary to use fault tolerance (FT) techniques.
This thesis presents low-overhead FT techniques for several commonly-used linear algebraic applications that run on GPUs, focusing mainly on applications that operate with sparse matrices. These FT techniques exploit the invariant properties of the algorithms used in these applications, and exploit the parallel execution model of GPUs to allow for low-overhead error detection.
This thesis introduces and studies efficient error checking schemes for three popular matrix factorization techniques: Householder QR factorization, left-looking Cholesky factorization, and right-looking LU factorization. It also explores lightweight invariant checking methods for the preconditioned conjugate gradient (PCG) and biconjugate gradient stabilized (BiCGSTAB) iterative solvers and introduces an efficient checking method for the Lanczos eigensolver, as well as fault injection mechanisms for NVIDIA GPUs that allow for the simulation of transient, non-instantaneous faults.
This thesis carefully evaluates these FT methods on a contemporary NVIDIA GPU platform, and the results show that the aforementioned error checking strategies have high error coverage and are significantly more efficient than prior FT techniques on a GPU system.
Notlar:
School code: 0262
Tüzel Kişi Ek Girişi:
Mevcut:*
Yer Numarası | Demirbaş Numarası | Shelf Location | Lokasyon / Statüsü / İade Tarihi |
---|---|---|---|
XX(694428.1) | 694428-1001 | Proquest E-Tez Koleksiyonu | Arıyor... |
On Order
Liste seç
Bunu varsayılan liste yap.
Öğeler başarıyla eklendi
Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.
:
Select An Item
Data usage warning: You will receive one text message for each title you selected.
Standard text messaging rates apply.