Adaptation and Augmentation: Towards Better Rescoring Strategies for Automatic Speech Recognition and Spoken Term Detection

Eylem Seç

Ayırt
Listelerime ekle
Eposta
Yazdır

Başlık:

Adaptation and Augmentation: Towards Better Rescoring Strategies for Automatic Speech Recognition and Spoken Term Detection

Yazar:

Ma, Min, author. (orcid)0000-0002-1816-1772

ISBN:

9780355978803

Yazar Ek Girişi:

Ma, Min, author.

Fiziksel Tanımlama:

1 electronic resource (161 pages)

Genel Not:

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Advisors: Michael I. Mandel; Andrew Rosenberg Committee members: Rivka Levitan; Michael I. Mandel; Andrew Rosenberg; Changhe Yuan.

Özet:

Selecting the best prediction from a set of candidates is an essential problem for many spoken language processing tasks, including automatic speech recognition (ASR) and spoken keyword spotting (KWS). Generally, the selection is determined by a confidence score assigned to each candidate. Calibrating these confidence scores (i.e., rescoring them) could make better selections and improve the system performance. This dissertation focuses on using tailored language models to rescore ASR hypotheses as well as keyword search results for ASR-based KWS.

This dissertation introduces three kinds of rescoring techniques: (1) Freezing most model parameters while fine-tuning the output layer in order to adapt neural network language models (NNLMs) from the written domain to the spoken domain. Experiments on a large-scale Italian corpus show a 30.2% relative reduction in perplexity at the word-cluster level and a 2.3% relative reduction in WER in a state-of-the-art Italian ASR system. (2) Incorporating source application information associated with speech queries. By exploring a range of adaptation model architectures, we achieve a 21.3% relative reduction in perplexity compared to a fine-tuned baseline. Initial experiments using a state-of-the-art Italian ASR system show a 3.0% relative reduction in WER on top of an unadapted 5-gram LM. In addition, human evaluations show significant improvements by using the source application information. (3) Marrying machine learning algorithms (classification and ranking) with a variety of signals to rescore keyword search results in the context of KWS for low-resource languages. These systems, built for the IARPA BABEL Program, enhance search performance in terms of maximum term-weighted value (MTWV) across six different low-resource languages: Vietnamese, Tagalog, Pashto, Turkish, Zulu and Tamil.

Notlar:

School code: 0046

Konu Başlığı:

Computer science.

Artificial intelligence.

Tüzel Kişi Ek Girişi:

City University of New York. Computer Science.

Elektronik Erişim:

http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10792971

Mevcut:*

Yer Numarası	Demirbaş Numarası	Shelf Location	Lokasyon / Statüsü / İade Tarihi
XX(680151.1)	680151-1001	Proquest E-Tez Koleksiyonu	Arıyor...

On Order

Liste seç

Bunu varsayılan liste yap.

Öğeler başarıyla eklendi

Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.