A Joint Parsing System for Visual Scene Understanding

Eylem Seç

Ayırt
Listelerime ekle
Eposta
Yazdır

Başlık:

A Joint Parsing System for Visual Scene Understanding

Yazar:

Qi, Hang, author.

ISBN:

9780438019713

Yazar Ek Girişi:

Qi, Hang, author.

Fiziksel Tanımlama:

1 electronic resource (93 pages)

Genel Not:

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Advisors: Song-Chun Zhu Committee members: Demetri Terzopoulos; Wei Wang; Ying Nian Wu.

Özet:

The computer vision community has been long focusing on classic tasks such as object detection, human attributes classification, action recognition. While the state-of-the-art performance is getting improved every year for a wide range of tasks, it remains a challenge to organize individual pieces into an integral system that parses visual scenes and events jointly. In this dissertation, we explore the problem of joint visual scene parsing in a restricted visual Turing test scenario that encourages explicit concept grounding. The goal is to build a scalable computer vision system that leverages the advancement of individual modules in various tasks and exploits the inherent correlation and constraints between them for a comprehensive understanding of visual scenes.

This dissertation contains three main parts.

Firstly, we describe a restricted visual Turing test scenario that evaluates computer vision systems across various tasks with a domain ontology and explicitly tests the grounding of concepts with formal queries. We present a benchmark for evaluating long-range recognition and event reasoning in videos captured from a network of cameras. The data and queries distinguish us from visual question answering in images and video captioning in that we emphasize explicit groundings of concepts in a restricted ontology via formal language queries.

Secondly, we propose a scalable system which leverages off-the-shelf computer vision modules to parse cross-view videos jointly. The system defines a unified knowledge representation for information sharing and is extendable to new tasks and domains. To fuse information from multiple modules and camera views, we proposed a joint parsing method that integrates view-centric proposals into scene-centric parse graphs that represent a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapped fields of views embed rich appearance and geometry correlations and that knowledge fragments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed method captures such correlations and constraints explicitly and generates semantic scene-centric parse graphs. Quantitative experiments show that scene-centric predictions outperform view-centric proposals.

Thirdly, we discuss a principled method to construct parse graph knowledge bases that retains rich structures and grounding details. By casting questions into graph fragments, we present a graph-matching based question-answering system that retrieves answers for questions via graph pattern matching.

Notlar:

School code: 0031

Konu Başlığı:

Computer science.

Tüzel Kişi Ek Girişi:

University of California, Los Angeles. Computer Science 0201.

Elektronik Erişim:

http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10825719

Mevcut:*

Yer Numarası	Demirbaş Numarası	Shelf Location	Lokasyon / Statüsü / İade Tarihi
XX(682417.1)	682417-1001	Proquest E-Tez Koleksiyonu	Arıyor...

On Order

Liste seç

Bunu varsayılan liste yap.

Öğeler başarıyla eklendi

Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.