Automatic Selection of MapReduce Machine Learning Algorithms: A Model Building Approach

Eylem Seç

Ayırt
Listelerime ekle
Eposta
Yazdır

Başlık:

Automatic Selection of MapReduce Machine Learning Algorithms: A Model Building Approach

Yazar:

Franklin, Bryan M., author.

ISBN:

9780355979749

Yazar Ek Girişi:

Franklin, Bryan M., author.

Fiziksel Tanımlama:

1 electronic resource (265 pages)

Genel Not:

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Includes supplementary digital materials.

Advisors: Laura E. Brown Committee members: Timothy Havens; Benjamin Ong; Thomas Oommen.

Özet:

As the amount of information available for data mining grows larger, the amount of time needed to train models on those huge volumes of data also grows longer. Techniques such as sub-sampling and parallel algorithms have been employed to deal with this growth. Some studies have shown that sub-sampling can have adverse effects on the quality of models produced, and the degree to which it affects different types of learning algorithms varies. Parallel algorithms perform well when enough computing resources (e.g. cores, memory) are available, however for a limited sized cluster the growth in data will still cause an unacceptable growth in model training time. In addition to the data size mitigation problem, picking which algorithms are well suited to a particular dataset, can be a challenge. While some studies have looked at selection criteria for picking a learning algorithm based on the properties of the dataset, the additional complexity of parallel learners or possible run time limitations has not been considered. This study explores run time and model quality results of various techniques for dealing with large datasets, including using different numbers of compute cores, sub-sampling the datasets, and exploiting the iterative anytime nature of the training algorithms. The algorithms were studied using MapReduce implementations of four supervised learning algorithms, logistic regression, tree induction, bagged trees, and boosted stumps for binary classification using probabilistic models. Evaluation of these techniques was done using a modified form of learning curves which has a temporal component. Finally, the data collected was used to train a set of models to predict which type of parallel learner best suits a particular dataset, given run time limitations and the number of compute cores to be used. The predictions of those models were then compared to the actual results of running the algorithms on the datasets they were attempting to predict.

Notlar:

School code: 0129

Konu Başlığı:

Computer science.

Artificial intelligence.

Tüzel Kişi Ek Girişi:

Michigan Technological University. Computer Science.

Elektronik Erişim:

http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10791719

Mevcut:*

Yer Numarası	Demirbaş Numarası	Shelf Location	Lokasyon / Statüsü / İade Tarihi
XX(679943.1)	679943-1001	Proquest E-Tez Koleksiyonu	Arıyor...

On Order

Liste seç

Bunu varsayılan liste yap.

Öğeler başarıyla eklendi

Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.