Eylem Seç
Automatic Selection of MapReduce Machine Learning Algorithms: A Model Building Approach
Başlık:
Automatic Selection of MapReduce Machine Learning Algorithms: A Model Building Approach
Yazar:
Franklin, Bryan M., author.
ISBN:
9780355979749
Yazar Ek Girişi:
Fiziksel Tanımlama:
1 electronic resource (265 pages)
Genel Not:
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Includes supplementary digital materials.
Advisors: Laura E. Brown Committee members: Timothy Havens; Benjamin Ong; Thomas Oommen.
Özet:
As the amount of information available for data mining grows larger, the amount of time needed to train models on those huge volumes of data also grows longer. Techniques such as sub-sampling and parallel algorithms have been employed to deal with this growth. Some studies have shown that sub-sampling can have adverse effects on the quality of models produced, and the degree to which it affects different types of learning algorithms varies. Parallel algorithms perform well when enough computing resources (e.g. cores, memory) are available, however for a limited sized cluster the growth in data will still cause an unacceptable growth in model training time. In addition to the data size mitigation problem, picking which algorithms are well suited to a particular dataset, can be a challenge. While some studies have looked at selection criteria for picking a learning algorithm based on the properties of the dataset, the additional complexity of parallel learners or possible run time limitations has not been considered. This study explores run time and model quality results of various techniques for dealing with large datasets, including using different numbers of compute cores, sub-sampling the datasets, and exploiting the iterative anytime nature of the training algorithms. The algorithms were studied using MapReduce implementations of four supervised learning algorithms, logistic regression, tree induction, bagged trees, and boosted stumps for binary classification using probabilistic models. Evaluation of these techniques was done using a modified form of learning curves which has a temporal component. Finally, the data collected was used to train a set of models to predict which type of parallel learner best suits a particular dataset, given run time limitations and the number of compute cores to be used. The predictions of those models were then compared to the actual results of running the algorithms on the datasets they were attempting to predict.
Notlar:
School code: 0129
Tüzel Kişi Ek Girişi:
Mevcut:*
Yer Numarası | Demirbaş Numarası | Shelf Location | Lokasyon / Statüsü / İade Tarihi |
---|---|---|---|
XX(679943.1) | 679943-1001 | Proquest E-Tez Koleksiyonu | Arıyor... |
On Order
Liste seç
Bunu varsayılan liste yap.
Öğeler başarıyla eklendi
Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.
:
Select An Item
Data usage warning: You will receive one text message for each title you selected.
Standard text messaging rates apply.