Deep Learning Based Facial Computing - Data, Algorithms and Applications

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Deep Learning Based Facial Computing - Data, Algorithms and Applications

Author:

Li, Wei, author.

ISBN:

9780438002623

Personal Author:

Li, Wei, author.

Physical Description:

1 electronic resource (106 pages)

General Note:

Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.

Advisors: Zhigang Zhu Committee members: Yingli Tian; Jizhong Xiao; Lijun Yin; Jianting Zhang; Zhigang Zhu.

Abstract:

In this thesis, a complete task workflow of facial computing is introduced, which includes three compnents: data, algorithms and applications. Then the research focuses on two most important tasks in facial computing: facial expression recognition and face action unit (AU) detection, since the former is key indicators of people's emotion and the latter is the basic elements for more complicated facial tasks.

In the facial expression recognition part, we propose a recursive framework to recognize facial expressions from images in real scenes. Unlike traditional approaches that typically focus on developing and refining algorithms for improving recognition performance on an existing dataset, we integrate four important components in a recursive manner: facial dataset generation, facial expression recognition model building, and interactive interfaces for testing and new data collection, and finally a dataset evaluation and cleansing. To start with, we first create a candid-images-for-facial-expression (CIFE) dataset from Web images. We then apply a convolutional neural network (CNN) to CIFE and build a CNN model for web image expression classification. In order to increase the expression recognition accuracy, we also fine-tune the CNN model and thus obtain a better CNN facial expression recognition model. Based on the fine-tuned CNN model, we design a facial expression game engine and collect a new and more balanced dataset, GaMo. The images of this dataset are collected from various facial expressions that our game users make when playing the game. Finally, we run yet another recursive step -- a self-evaluation of the quality of the data labeling and propose a self-cleansing mechanism for improve the quality of the data. We evaluate the GaMo and CIFE datasets and show that our recursive framework can help build a better facial expression model for dealing with real scene facial expression tasks.

In the AU detection part, we propose a deep learning based approach for facial action unit detection by enhancing and cropping regions of interest of face images. The approach is implemented by adding two novel nets (a.k.a. layers): the enhancing layers and the cropping layers, to a pretrained convolutional neural network (CNN) model. For the enhancing layers (noted as E-Net ), we have designed an attention map based on facial landmark features and apply it to a pretrained neural network to conduct enhanced learning. For the cropping layers (noted as C-Net), we crop facial regions around the detected landmarks and design individual convolutional layers to learn deeper features for each facial region. We then combine the E-Net and the C-Net to construct a so-called Enhancing and Cropping Net ( EAC-Net), which can learn both features enhancing and region cropping functions effectively. The EAC-Net integrates three important elements, i.e., learning transfer, attention coding, and regions of interest processing, making our AU detection approach more efficient and more robust to facial position and orientation changes. Our approach shows a significant performance improvement over the state-of-the-art methods when tested on the BP4D and DISFA AU datasets. We have also studied the performance of the proposed EAC-Net under two very challenging conditions: (1) faces with partial occlusion and (2) faces with large head pose variations. Experimental results show that (1) the EAC-Net learns facial AUs correlation effectively and predicts AUs reliably even with only half of a face being visible, especially for the lower half; (2) Our EAC-Net model also works well under very large head poses, which outperforms significantly a compared baseline approach. It further shows that the EAC-Net works much better without a face alignment than with face alignment as pre-processing, in terms of computational efficiency and AU detection accuracy.

To better address the problem of effective fusion of temporal information in AU detection, we propose a C-Net based region of interest (ROI) adaptation optimal LSTM-based temporal fusing approach. The optimal selection of multiple LSTM layers to form the best LSTM Net is carried out to best fuse temporal features.

Local Note:

School code: 1606

Subject Term:

Engineering.

Computer science.

Added Corporate Author:

The City College of New York. Electrical Engineering.

Electronic Access:

http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10286937

Available:*

Shelf Number	Item Barcode	Shelf Location	Status
XX(677953.1)	677953-1001	Proquest E-Thesis Collection	Searching...

On Order

Select a list

Make this your default list.

The following items were successfully added.

There was an error while adding the following items. Please try again.