Research Questions

Please introduce your research quickly?

I am a Ph.D. student from the University of South Carolina. This September, I just passed my dissertation defense, so I can definitely graduate on this Dec. My research area is computer vision, machine learning, especially for the deep learning recently. Our lab focused on improving the facial behavior recognition using the machine learning methods. The facial behavior recognition including the facial expression and facial action unit recognition. If you want to hear more about the facial action unit, I can explain you later. The problem is that given a facial image, the learned model should tell which expression or facial actions unit exist in this facial image. In the first three years of my Ph.D. program, we mainly extracted the human-designed features such as the LBP and HOG, then use the Adaboost or SVM for the classification. In the recent three years, we use the Deep learning methods especially the convolution neural networks. We mainly use the Caffe deep learning framework with c++. We not only use the CNN, we also develop our customized layers in the current CNN framework with forward and backward propagation. Write own layers with forward gpu and backward gup functions. We published our methods to the international conference of CVPR and NIPS. 


The facial action unit is the smallest observable muscle movement, which is the component or smallest unit of facial expression. There are totally of 46 facial action units, which can consist thousands of combinations of expressions. If we can recognize all the facial action units, then we can recognize any kind of facial expressions, such as nerves, exciting, and sleepy. For example, the expression smile, it can be divided into cheek raiser (AU6), lip corner puller (AU12), lip part (AU25). 

Introduce the Intern:

I did the intern at the correlated solutions company. They do the digital image correlation. That is to use the optical or non-contact method to measure the deformation, vibration, or strain of the surface of different materials or products. The measurements scale can be as small as microns, and the time scale can be as small as nanoseconds. The general steps to do the digital image correlation includes camera calibration, image collection, image correlation calculation, and post-processing. Actually, their product is pretty well and wrote by C++, but many customs hope to do the post-processing by themselves using the python. My work is to write a wrapper to wrap their C++ class or function for the python user. Also, I need to develop some post-processing functions to meet the needs of customs. 

Challenge:
The biggest challenge is that there are bits of knowledge related to physics, material science that I am not familiar with. For this challenge, first, I need to ask my supervisor who is the CEO of this small company. Since he is very busy, so I will write down all the questions and ideas and use the PPT to discuss with him. Second, I ask him the books or papers related to the expert knowledge. At the end of the intern, the project is finished with the help of my advisor.
Python Package: Numpy, SciPy, Pandas, Matplotlib, thread, SciKit-learn



Introduce the IB-CNN work

Motivation: The CNN is successfully applied in many areas, but it's easy to overfit. So dropout method is proposed to randomly drop some features. It can not only reduce the overfitting, but also it's an ensemble model to improve the performance. With the same motivation, but instead of randomly dropout some features, we drop some redundant features with purpose and only keep the discriminative features. In such case, we need some feature selection method. I have a lot of experience to use the Adaboost for feature selection. Then we want to combine the Adaboost feature selection and classification with the CNN as a unified framework. The general idea is to substitute the current decision layer with our proposed incremental boosting layer, which is part of the CNN framework and can be optimized with the stochastic gradient descent.

1 dropout ensemble
2 feature selection, reduce the overfitting

Technical details: The input for our proposed incremental boosting layer is the output features from a fully connected layer. For each iteration, our proposed layer will sequentially select some discriminative features as weak classifiers. Each selected feature is a weak classifier. The weak classifier is a one level decision tree. Each weak classifier has two parameters: one is the threshold of the decision tree. The other is the confidential rate. The strong classifier is the weighted ensemble of those selected weak classifiers with the confidential rate. This the forward. In the backward stage, we use the gradient descent to update the threshold of the decision tree. Also, we will use the incremental method to update the confidential rate. If one weak classifier is selected frequently, the confidential rate can be very high compared with other less frequent weak classifiers.

Experiments: In the experiment part, of course, our proposed method can improve the performance a lot compared with the baseline method. Also, we conduct two experiment to prove our motivation and prove the robustness of our proposed method. First, it's very tricky to set the learning rate. Usually, we will try the different learning rate and find the best one. In the first experiment, we try the different learning rate and find that our proposed method has stable and better performance compared with the baseline method. For the second experiment, we find that for the traditional method, the performance of the model is sensitive to the input feature number of the decision layers. So we change the feature number of decision layer from 64, and double to 128, then doubled, doubled until 2000 features. For the baseline method, the performance first increased and then dramatically decrease with the increase of feature number, because of overfitting, but the proposed method has stable and better performance compared with the baseline method, because the proposed can select the discriminative features and discard the redundant features. 

Challenge:
1 I need to insert a boosting layer to the Caffe source code. I need to understand the source code an d know how to implement the boosting idea to the existing open source caffe code.   That’s two years ago, there is just few reference materials online.
First, I need to read and understand the source code. There is a lot of codes. I found that I don’t need to understand all the source code. I find the most related layer with my boosting layer that’s the decision layer: fully-connected inner product layer. Then search all the project. I just need to understand the codes wherever I find the existence of the variable of the fully-connected layer. That’s enormously decrease my working.
Second, If I only read the code, I cannot exactly know what are they doing. I also need to print out the intermediate result. Check the value and check the size of the vector. Then I can have a better understanding.

Third, even if I cannot find more materials online about the modification of the source code, but I still can find some and then communicate with online community who has more experience. 

Introduce the ICCV paper:
Project of Optimal Filter Size
Motivation:  For Alexnet, the filter size is 5*5, 3*3, and the rest convolutional layers are all 3*3. The ZF-net proposed to use 7*7, 5*5, and then 3*3 for other consecutive layers.  The googlenet use the inception mode, which is the concatenation of 5*5, 3*3, and 1*1. The residual network use all 3*3 with small receptive filed. but all those CNNs use the fixed filter size, which is predefined hyperpramter. In our proposed method, the filter size is a variable and can be optimized based on the training data using the gradient decent method.

How? We first define an upper bound filter a lower bound filter. The difference is 2. For example, if the lower bound filter size is 3*3. The upper bound filter is 5*5. And the lower bound filter is the innner part of the upper bound filter. For the forward process, we calculate the interpolation of the activations from the upper bound filter and lower bound filters. For the backward, we will calculate the derivation of the loss with respected with the kernel size using the upper bound and lower bound filter based on the derivative defination. 

Challenge: We have a challenge about the mathematic derivative calculation. Although my undergraduate is in the mathematic department, I cannot make sure my theoretical part is correct. Actually there is one problem. The function is not always continuous but partially continuous. How to handle derivative at the non-continuous point.
To solve such challenge. 1 I read a lot of reference material online about how to calculate the derivative to the non-continous function. 2 My advisor recommended me a professor from the other department who has very strong mathematic background. Then I meet him once a week. After we meet several times. Finally, we solve the problem. 1 define the continues k and the corresponding filter.

Introduce the CVPR paper:
We published this paper in 2014. The deep learning is not very popular. A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classiffier. Extensive empirical studies are needed to search for an optimal combination of feature representation, selected feature sets, and classifier to achieve good recognition performance. We proposed a novel boosted deep belief network for performing the three training stages (feature learning, feature selection, and classifier construction)  iteratively in a unified framework. My contribution to such work is that I design, implement and combine the boosting algorithm with the deep belief network. The boosting algorithm can make the decision based on the selected the features from the DBN learned feature and backpropagate the error to the DBN, which is a unified framework and can be optimized jointly with the gradient descent algorithms. 

The biggest challenge is to write the backpropagation and pass the error correctly to the lower layer of Restricted Boltzmann Machines. First, the sign function in AdaBoost can not be calculated the deriviative. Second, how can we know the backpropagated error is correct. For the first problem, we use the continuous function to simulate the sign function. Actually, we have many options including the sigmoid function, tangent function, or polysomal function. Through a lot of experiments and asked a professor with the strong mathematics background, we decided to use the x/sqrt(x^2+1).  How can we know the backpropagation error is correct? we design some toy input examples include totally classified correctly, totally wrong, and borderline to check the backpropagated error whether they make sense. 

Introduce the ECCV paper:
Studies in psychology show that not all facial regions have the same importance in recognizing facial expression. Actually the different facial regions make different contributions in various facial expressions. Motivated by this, a novel framework, named Feature Disentangling Machine (FDM) is proposed to effectively select active features for different facial expressions. 

Introduce the ICIP paper:
This paper is about the face registration, which is a major and critical step for face analysis. Existing facial activity recognition system often employ coarse face alignment based on a few fiducial points such as eyes and extract features from the equal-sized grid. For the equal-sized grid, the input image will be divided into multiple equal-sized grid patches. For each patch, the features such as the histogram of LBP, HOG can be extracted. But we think the alignment for the patches can be improved, because of the variation of face pose, facial deformation, and person-specific geometry. For example, the jaw is very long for some long face person, while can be very short for some flat face person. Also, when the people smiling, the face can be even longer. The patch in the same position can not contain the same face content for the different facial images. We proposed the deformable grid, that is the arbitrary shape. The relationship between the standard equal-sized grid and the mean facial shape is defined. For a new face image, it can be deformed based on the input image shape and the defined relationship. In addition, there are by-product features include the length of the deformed grid length and angle. They can be used to improve the facial action unit recognition.  

Introduce the Forest fire detection method:
In the period of perusing my master degree. I took part in a project named as forest fire and smoke detection, which is both of research and industrial project. I am mainly responsible for the fire detection algorithm part. We use the C++ programming language to develop the algorithm with OpenCV library. We published some papers in this project.
My work: Develop the forest fire detection algorithm and publish the paper based on those algorithms.
My Biggest challenge: The biggest challenge is that our algorithm often making the mistake for the red flag and red cars. Because the color and the texture is really like the fire. Then we extract the edge information, because the edge of the flag and cars are very smooth, while the edge of the fire is zigzag shape. But there one problem that we cannot extract the smooth boundary of the flag and cars. Then we extract the dynamic feature. 



Comments

Popular posts from this blog

The difference between autoencoders and a restricted Boltzmann machine?

SSD: Single Shot MultiBox Detector

You Only Look Once