Eylem Seç
Exploit Grammar Induction on Pattern and Structure Discovery
Başlık:
Exploit Grammar Induction on Pattern and Structure Discovery
Yazar:
Li, Yuan, author.
ISBN:
9780438116504
Yazar Ek Girişi:
Fiziksel Tanımlama:
1 electronic resource (113 pages)
Genel Not:
Source: Dissertation Abstracts International, Volume: 79-11(E), Section: B.
Advisors: Jim X. Chen Committee members: Xinyuan Wang; Harry Wechsler; Qi Wei.
Özet:
Time series attract researcher and scientist attention for many decades. To study time series, there are two angles to dive in. One is inter-data level and the other is intra-data level.
For inter-data level, researchers focus on the traditional tasks, such as clustering, classification and anomaly detection. All such tasks are based on a measurement to determine the similarity between datasets. Most existing work on time series similarity search focuses on finding shape-based similarity. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higher-level structures. In this thesis, we present a histogram-based representation for time series data, similar to the "bag of words" approach that is widely accepted by the text mining and information retrieval communities. Our approach has been proved to outperform the existing methods in clustering, classification, and anomaly detection on several real datasets.
So far, there is relatively little work on studying time series data from the intra-data level. Reductionism is a common approach when people try to understand the nature of a complex object, by which the complex object is reduced to parts or fundamental components, and interaction or relationship over these parts. In order to understand the hidden structure or hierarchy of time series dataset, we need to have a new approach and tools, by which we can grasp the hidden patterns and relationship between these patterns. Grammar induction is such a tool to facilitate us to reach that goal without requiring users known much detail on the data. We demonstrate that grammar induction in time series can effectively identify repeated patterns without much prior knowledge required. We also develop a motif visualization system based on grammar induction, by which the repeated patterns hidden in time series dataset can be discovered and hierarchy of patterns can be presented in an intuitive way.
Because of its practical and theoretical impact on data compression, pattern discovery and computation theory, many grammar induction algorithms have been introduced in recent years. Most existing work on learning grammar for a given sequence is based on deterministic approach. Some deterministic approaches used by grammar induction algorithms can be categorized as greedy heuristics. In addition, there are many grammars, which can be learned from a given sequence. The smallest grammar problem is defined by some researchers to evaluate different grammars learned from a given language by different algorithms. We introduce our non-deterministic approaches to address grammar induction for a given language. Our grammar induction algorithm can effectively identify smaller grammar than a well-known grammar induction algorithm. Experimental results, which are presented in this paper, illustrate that our approach and algorithms are feasible to resolve difficult problems such as identifying patterns of DNA sequence.
Notlar:
School code: 0883
Konu Başlığı:
Tüzel Kişi Ek Girişi:
Mevcut:*
Yer Numarası | Demirbaş Numarası | Shelf Location | Lokasyon / Statüsü / İade Tarihi |
---|---|---|---|
XX(692869.1) | 692869-1001 | Proquest E-Tez Koleksiyonu | Arıyor... |
On Order
Liste seç
Bunu varsayılan liste yap.
Öğeler başarıyla eklendi
Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.
:
Select An Item
Data usage warning: You will receive one text message for each title you selected.
Standard text messaging rates apply.