Eylem Seç
A Joint Parsing System for Visual Scene Understanding
Başlık:
A Joint Parsing System for Visual Scene Understanding
Yazar:
Qi, Hang, author.
ISBN:
9780438019713
Yazar Ek Girişi:
Fiziksel Tanımlama:
1 electronic resource (93 pages)
Genel Not:
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Advisors: Song-Chun Zhu Committee members: Demetri Terzopoulos; Wei Wang; Ying Nian Wu.
Özet:
The computer vision community has been long focusing on classic tasks such as object detection, human attributes classification, action recognition. While the state-of-the-art performance is getting improved every year for a wide range of tasks, it remains a challenge to organize individual pieces into an integral system that parses visual scenes and events jointly. In this dissertation, we explore the problem of joint visual scene parsing in a restricted visual Turing test scenario that encourages explicit concept grounding. The goal is to build a scalable computer vision system that leverages the advancement of individual modules in various tasks and exploits the inherent correlation and constraints between them for a comprehensive understanding of visual scenes.
This dissertation contains three main parts.
Firstly, we describe a restricted visual Turing test scenario that evaluates computer vision systems across various tasks with a domain ontology and explicitly tests the grounding of concepts with formal queries. We present a benchmark for evaluating long-range recognition and event reasoning in videos captured from a network of cameras. The data and queries distinguish us from visual question answering in images and video captioning in that we emphasize explicit groundings of concepts in a restricted ontology via formal language queries.
Secondly, we propose a scalable system which leverages off-the-shelf computer vision modules to parse cross-view videos jointly. The system defines a unified knowledge representation for information sharing and is extendable to new tasks and domains. To fuse information from multiple modules and camera views, we proposed a joint parsing method that integrates view-centric proposals into scene-centric parse graphs that represent a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapped fields of views embed rich appearance and geometry correlations and that knowledge fragments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed method captures such correlations and constraints explicitly and generates semantic scene-centric parse graphs. Quantitative experiments show that scene-centric predictions outperform view-centric proposals.
Thirdly, we discuss a principled method to construct parse graph knowledge bases that retains rich structures and grounding details. By casting questions into graph fragments, we present a graph-matching based question-answering system that retrieves answers for questions via graph pattern matching.
Notlar:
School code: 0031
Konu Başlığı:
Tüzel Kişi Ek Girişi:
Mevcut:*
Yer Numarası | Demirbaş Numarası | Shelf Location | Lokasyon / Statüsü / İade Tarihi |
---|---|---|---|
XX(682417.1) | 682417-1001 | Proquest E-Tez Koleksiyonu | Arıyor... |
On Order
Liste seç
Bunu varsayılan liste yap.
Öğeler başarıyla eklendi
Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.
:
Select An Item
Data usage warning: You will receive one text message for each title you selected.
Standard text messaging rates apply.