Eylem Seç
Joint-Stochastic Spectral Inference for Robust Co-Occurrence Modeling and Latent Topic Analysis
Başlık:
Joint-Stochastic Spectral Inference for Robust Co-Occurrence Modeling and Latent Topic Analysis
Yazar:
Lee, Moontae, author.
ISBN:
9780438027527
Yazar Ek Girişi:
Fiziksel Tanımlama:
1 electronic resource (126 pages)
Genel Not:
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Advisors: David Mimno Committee members: David Bindel; Peter Frazier.
Özet:
Co-occurrence information is powerful statistics that can model various discrete objects by their joint instances with other objects. Transforming unsupervised problems of learning low-dimensional geometry into provable decompositions of co-occurrence information, spectral inference provides fast algorithms and optimality guarantees for non-linear dimensionality reduction or latent topic analysis. Spectral approaches reduce the dependence on the original training examples and produce substantial gain in efficiency, but at costs: a) The algorithms perform poorly on real data that does not necessarily follow underlying models; b) Users can no longer infer information about individual examples, which is often important for real-world applications; c) Model complexity rapidly grows as the number of objects increases, requiring a careful curation of the vocabulary.
The first issue is called model-data mismatch, which is a fundamental problem common in every spectral inference method for latent variable models. As real data never follows any particular computational model, this issue must be ad- dressed for practicality of the spectral inference beyond synthetic settings. For the second issue, users could revisit probabilistic inference to infer information about individual examples, but this brings back all the drawbacks of traditional approaches. One method is recently developed for spectral inference, but it works only on tiny models, quickly losing its performance for the datasets whose underlying structures exhibit realistic correlations. While probabilistic inference also suffers from the third issue, the problem is more serious for spectral inferences because co-occurrence information easily exceeds storable capacity as the size of vocabulary becomes larger.
We cast the learning problem in the framework of Joint Stochastic Matrix Factorization (JSMF), showing that existing methods violate the theoretical conditions necessary for a good solution to exist. Proposing novel rectification paradigms for handling the model-data mismatch, the Rectified Anchor Word Algorithm (RAWA) is able to learn quality latent structures and their interactions even on small noisy data. We also propose the Prior Aware Dual Decomposition (PADD) that is capable of considering the learned interactions as well as the learned latent structures to robustly infer example- specific information. Beyond the theoretical guarantees, our experimental results show that RAWA recovers quality low-dimensional geometry on various textual/non-textual datasets comparable to probabilistic Gibbs sampling, and PADD substantially outperforms the recently developed method for learning low-dimensional representations of individual examples.
Although this thesis does not address the complexity issue for large vocabulary, we have developed new methods that can drastically compress co-occurrence information and learn only with the compressed statistics without losing much precision. Providing rich capability to operate on millions of objects and billions of examples, we complete all the necessary tools to make spectral inference robust and scalable competitor to probabilistic inference for unsupervised latent structure learning. We hope our research serves an initial basis for a new perspective that combines the benefits of both spectral and probabilistic worlds.
Notlar:
School code: 0058
Tüzel Kişi Ek Girişi:
Mevcut:*
Yer Numarası | Demirbaş Numarası | Shelf Location | Lokasyon / Statüsü / İade Tarihi |
---|---|---|---|
XX(681901.1) | 681901-1001 | Proquest E-Tez Koleksiyonu | Arıyor... |
On Order
Liste seç
Bunu varsayılan liste yap.
Öğeler başarıyla eklendi
Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.
:
Select An Item
Data usage warning: You will receive one text message for each title you selected.
Standard text messaging rates apply.