Print Friendly Search Results

Xiao, Houping, author. (orcid)0000-0002-6981-8842

Multi-sourced Information Trustworthiness Analysis: Applications and Theory

Xiao, Houping, author. (orcid)0000-0002-6981-8842

9780438061347

Xiao, Houping, author.

1 electronic resource (246 pages)

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Advisors: Jing Gao Committee members: Varun Chandola; Lu Su; Aidong Zhang.

In the era of Big Data, data entries, even describing the same objects or events, can come from a variety of sources. There are some sources that typically provide accurate information, but due to various reasons such as recording errors, device malfunction, background noise and intent to manipulate the data, some other sources may contain noisy or even erroneous information. Therefore, it is inevitable that information from multiple sources is conflicting with each other. To discover useful knowledge, which is usually deeply buried in those complicate multi-sourced data, we have to conduct information trustworthiness analysis on all available data sources. In this thesis, we propose a series of approaches of multi-sourced information trustworthiness analysis, including reliability-aware information integration and inconsistency detection to efficiently and effectively discover both trustworthy and untrustworthy information, respectively.

In reliability-aware information integration, it is critical to identify reliable sources that more often provide accurate information, so we can pay more attention on their information to better discover the truths (i.e., trustworthy information). Unfortunately, there is no oracle telling us which information source is more reliable a priori. To correctly identify the truths, in Part I of this thesis, we develop novel information integration methods that incorporate the estimation of source reliability. We explore the power of source reliability estimation in both data-level and model-level information. The objective is to jointly estimate which source is reliable and which piece of information is correct, where the information could be the raw data in data-level information integration or the model parameter in model-level information integration. In this part, we proved some nice properties of the proposed approaches via theoretical analysis and demonstrated their impacts on some real applications, such as indoor floorplan construction and crowdsourced question answering.

On the other hand, when unexpected disagreement is encountered across diverse information sources, i.e. data entities receive inconsistent information across multiple data sources, this might raise a red flag and require in-depth investigation. The Part II of my thesis research is to conduct inconsistency detection among multiple information sources to detect anomalies. We develop a series of tensor decomposition based algorithms for detecting inconsistent information in an unsupervised learning setting. In unsupervised learning, by representing dynamic multi-sourced data as tensors, we proposed different tensor decomposition based approaches, including an online method with theoretical guarantees for large-scale applications, to capture the common patterns across sources. An indicator of anomaly is proposed by identifying inconsistencies based on a comparison between source inputs and common patterns. The proposed frameworks have further been applied to a wide variety of applications from cybersecurity, to hotel review, and to computer networks.

To sum up, we conduct novel multi-sourced information trustworthiness analysis to discover trustworthy information or to detect untrustworthy information in this thesis. For trustworthy information discovery, the proposed reliability-aware Information Integration framework gives us a tool to identify reliable sources and discover the true information of data entities from the conflicting multi-sourced data. For untrustworthy information detection, we can detect malicious data entities which receive inconsistent information across all available data sources via the developed Inconsistency Detection approaches. The frameworks we developed have been effectively applied in many areas, including Hotel Review Analysis, Cybersecurity, and Computer Network, and have the potential of being applied to many other areas, such as Healthcare, Mobilesensing, and Crowdsourcing. With advances in technology and devices, both the amount of data and the number of sources in our world are still exploding, so there are great opportunities as well as numerous research challenges for inference of useful knowledge from multiple sources of massive data collections.

School code: 0656

Computer science.

State University of New York at Buffalo. Computer Science and Engineering.

http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10823306