Botabekov M., Rashiduly A.

Al-Farabi Kazakh National University

Development of programs for recognition

of human emotional state.

Abstract–In this post, we're going to dab a little bit in machine learning and face recognition to predict if an image from a live webcam shows a smiling subject or not. First, we will use an existing dataset, called the "Olivetti faces dataset" and classify the 400 faces seen there in one of two categories: smiling or not smiling. Then, we train a support vector classifier on this dataset to predict if a face depicts a smiling person or not. We do this by using the awesome sklearn machine learning library for Python. Finally, we integrate this classifier into a live loop using OpenCV to capture a frame from our webcam, extract a face and annotate the image with the result of the machine learning prediction human facial expressions emotions gestures body language eye contact and all the after forms of nonverbal communication have been the integrated part of human lives since the birth of the marking. Being humans, we leverage this fact and many of the day to day life actions and conversations and in fact most of the exchange of knowledge is dependent on these factors. In this project I developed this idea by giving machines the power to extract and understand these emotions using supervised machine learning algorithm.

A Tkinter based python GUI was implemented with the machine learning tools built in as Scikit-learn to classify the dataset, train a linear support vector machine and then applying this classifier to the live frames. This report briefly describes all the attributes of the software development process algorithms and discuss about the machine learning algorithm used to learn the objective.

The system developed is made available as open -source source software and the compiled executables are also provided. The results obtained by the trained support vector classifier on the live frames from the video camera are pretty robust and the overall accuracy of the system is more than 90%.

The results obtained are robust in terms of feature detection and extraction of the built in webcam of the laptop and varying light conditions round the room the camera performance is impacted and as a result the classifier gets confused so as to which category to put the input frame in happy or sad.

Introduction

The facial expressions are inherent part of human life and has been for decades. It has been a single most effective piece of important role in human lives and it has demonstrated of why human species are different than others not just in terms of intelligence but in feeling empathy too.

The facial expressions along with other human gestures go hand in hand. The same facial expressions for one hand-shape and movement which is being interpreted as something might be completely be misunderstood if the hand gestures are altered.

This is what the basic idea and motivation behind my project. How cool would it be to teach a machine to identify these facial expressions emotions just like human beings. I decided to make use of profound machine learning techniques and software tools available to couple it with the knowledge gained in Pattern recognition and to apply it to this interest and i was successful in doing so.

(a) Sad expression (b) Happy expression

The system which was developed had an accuracy of greater than 90% and were able to detect the emotions of people with without glasses, facial hairs and or gender the software was completely built and has been written in Python (a programming language) with a powerful machine learning tool called scikit-learn.

This was the expected and obtained outcome of the system.The idea proposed was to detect the emotion based on smiling, non- smiling face. Based on the supervised image classification by the user, the algorithm trains a support vector classifier (based on the feature extraction) and then this classifier is used for classifying the given input frame into dither of various machine learning technique: into several human classification problems is indeed an intuitive and challenging task which involves knowledge in multitude of domains.

The hierarchical components are listed below:

1. Dataset image classification by user.

2. Saving the result and training a support vector classifier (SVM) based on the result.

3. Use the classifier to detect the expression through webcam feed.

4. A packed GUI

As proposed, the software is developed and available open-source.

Related work

It has been a result of a study that people can recognize facial expressions in motion and moving frames for example in a movie a fast slideshow, a documentary or a video for better than in a static photograph.

This was conducted by scientists from the Max Planck Institute for Biological Cybernetics in Tubingen, Germany, to gain the advantage, the video sequence should be at least 1/10th of a second.

Project Plan

The system constituted many domains and this the plan for the project development was divided into four major categories:

1- This part includes the fetching or the dataset separating the images from the data part checking the dataset integrity and classifying the dataset into two separate categories (smiling denoted as 1, and non-smiling faces denoted as 0)

2- This section takes into account the results of previous sections (ie dataset classification into two different categories of 1 and 0 ) and trains a tinker support vector machine classifier which then faces takes any random image from the testing set the dataset to verify the integrity of the classifier.

3- After the classifier has been trained in the previous stage, this stage takes the classifier and applies that to the video frames from the camera, after detecting the faces in the frames using Haar cascade frontal face classifier.

4- This is where all the functionalities are coupled together and a python based Tkinter GUI is coded and all the Reactions are grouped together and the executable is compiler (.exe) and which is deliberately made standalone to be able to use on a computer without any python support available.

A. Classifying dataset & Storing Result .xml

Olivetti faces dataset has been used to cluster different types of images / human faces into two separate different groups of similar facial expressions. This clustering of images into two separate groups has been done by user and once the 400 images are classified into either of the categories then the result is automatically stored so as to know how many of the 400 faces are categorized as smiling and how many are non- smiling.

1. Olivetti faces dataset has 40 different people (subjects) and in various facial expressions with open eyes, closed eyes, smiling, non- smiling, without or without glasses, with or without facial hair and combinations of all of these:

2. The classified images are represented as follows:

3. The bar graph is computed and displayed as follows:

B. Training the linear Support Vector Classifier

In the previous sectors we have finished the part of fetching and verifying the integrity of the dataset into two different groups and served the .xml file as a result. We have also plotted the graph of total number of images in each categories and we have also plotted each category separately now , this section we will compute the mean cross validation method or the results and train a support vector classifier (SVM) based on this.

Linear support vector machine (linear SVM) classifiers are a set of supervised learning methods used for classification and/or regression. This section will train the classifier and will test the classifier with sample test data.

Figure: The classifier result

The classifier trained has been validated against the training data and the confusion matrix has also been plotted. The results are as follows:

The mean score of 0.756 with the accuracy of +/- 0.024 is actually pretty decent. The accuracy on the testing date is 0.86 and the classifier result looks pretty robust.

C. face recognition and Emotion Prediction Using Trainer SVM classifier ( in OpenCV)

Since the classifier has now been trained it was the time to test the classifier on the live video where user can actually see if the machine has learnt its objective or not ie the purpose and robustness.

To determine this, or specifically to apply our linear svm classifier to the frame, we have detect the face is the frame. This is where we use the Haar cascade classifier where detects the faces in the frame by applying the Haar cascade transform.

Once we have detected the face out of the frame and convert it to the size same as the size of images in the database. This step is extremely crucial in order to be able to apply our classifier to the frame. If this step is not done correctly, then the emotion recognition won't be successful.

I have uploaded the testing screenshot of the face detection as well as the emotion detection in this step which was done which optimized coefficients.

The screenshots are as follows:

(a) Non-Smiling Face. (b) Smiling Face

As it is evident from the above pictures, the trained linear support vector classifier works efficiently in the better lighting conditions.

D. Python Tkinter GUI & System Integration

The Application window is displayed as follows in Runtime:

The function of the different button and labels in the GUI has been explained as follows:

Image index: Keeps the track of index number of images. The total number of maximum images available in the database is 400 and the label show the current image being referenced. This can be reset to the first image in the index by processing Reset button.

Smiling: This button marks the current image being displayed as smiling face (ie Happy face) and increments the happy count by 1. Reset after pressing Reset button.

Not smiling: This button marks the current image being displayed as non-smiling face (ie sad face) and increments the sad count by 1.Can be reset by reset button.

Reset: Resets all the indexes and the count values.

Quit Application: edit the application and the control is returned back to the windows gracefully.

Load the Trained Classifier & Test Output: This button load the results .xml which is the result of the image classifier in the first step and trains a linear support vector classifier based on this. After the classifier is the command window which contains the video feed from web-camera. Once users face is detected the classifier predicts the resulting expression based on the learning objective.

References

http://stackoverflow.com/questions/18640804/facial-expression-classification-in-real-time-using-svm/

http://nbviewer.jupyter.org/github/flothesof/posts/blob/master/20150107_SmileRecognition.ipynb

http://flothesof.github.io/tag/machine-learning.html

http://flothesof.github.io/smile-recognition.html

https://realpython.com/blog/python/face-recognition-with-python/

https://realpython.com/blog/python/face-detection-in-python-using-a-webcam/

http://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html

https://habrahabr.ru/post/133909/

https://habrahabr.ru/post/301096/

https://habrahabr.ru/post/135244/

https://habrahabr.ru/post/134857/

http://flothesof.github.io/tag/opencv.html

1. Otsu, N., «A Threshold Selection Method from Gray-Level Histograms,» IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, № 1,1979., pp. 62- 66
2. John Canny, « A Computational Approach to Edge Detection», IEEE Transactions on pattern analysis and machine intelligence, Vol.8, № 6, 1986., pp.679 -698