Botabekov M., Rashiduly A.
Al-Farabi Kazakh National University
Development of programs for recognition
of human emotional state.
Abstract–In this post, we're going to dab a little bit in
machine learning and face recognition to predict if an image from a live webcam
shows a smiling subject or not. First, we will use an existing dataset, called
the "Olivetti faces dataset" and classify the 400 faces seen there in
one of two categories: smiling or not smiling. Then, we train a support vector
classifier on this dataset to predict if a face depicts a smiling person or
not. We do this by using the awesome sklearn machine learning library for
Python. Finally, we integrate this classifier into a live loop using OpenCV to
capture a frame from our webcam, extract a face and annotate the image with the
result of the machine learning prediction human facial expressions emotions
gestures body language eye contact and all the after forms of nonverbal communication
have been the integrated part of human lives since the birth of the marking.
Being humans, we leverage this fact and many of the day to day life actions and
conversations and in fact most of the exchange of knowledge is dependent on
these factors. In this project I developed this idea by giving machines the
power to extract and understand these emotions using supervised machine
learning algorithm.
A Tkinter based python GUI was implemented with the
machine learning tools built in as Scikit-learn to classify the dataset, train
a linear support vector machine and then applying this classifier to the live
frames. This report briefly describes all the attributes
of the software development process algorithms and discuss about the machine
learning algorithm used to learn the objective.
The system developed is made available as open -source
source software and the compiled executables are also provided. The results
obtained by the trained support vector classifier on the live frames from the
video camera are pretty robust and the overall accuracy of the system is more
than 90%.
The results obtained are robust in terms of feature
detection and extraction of the built in webcam of the laptop and varying light conditions round the room the camera performance is impacted and as a result the classifier gets confused so as to which
category to put the input frame in happy
or sad.
Introduction
The facial expressions are inherent part of human life
and has been for decades. It has been a single most effective piece of
important role in human lives and it has demonstrated of why human species are
different than others not just in terms of intelligence but in feeling empathy
too.
The
facial expressions along with other human gestures go hand in hand. The same
facial expressions for one hand-shape and movement which is being interpreted
as something might be completely be misunderstood if the hand gestures are
altered.
This is what the basic idea and motivation
behind my project. How cool would it be to teach a machine to identify these
facial expressions emotions just like human beings. I decided to make use of
profound machine learning techniques and software tools available to couple it
with the knowledge gained in Pattern recognition and to apply it to this
interest and i was successful in doing so.
(a) Sad expression (b)
Happy expression
The system which was developed had an accuracy of
greater than 90% and were able to detect the emotions of people with without
glasses, facial hairs and or gender the software was completely built and has
been written in Python (a programming language) with a powerful machine
learning tool called scikit-learn.
This was the expected and obtained outcome of the
system.The idea proposed was to detect the emotion based on smiling, non-
smiling face. Based on the supervised image classification by the user, the
algorithm trains a support vector classifier (based on the feature extraction)
and then this classifier is used for classifying the given input frame into
dither of various machine learning technique: into several human classification
problems is indeed an intuitive and challenging task which involves knowledge
in multitude of domains.
The hierarchical components are listed below:
1.
Dataset
image classification by user.
2.
Saving
the result and training a support vector classifier (SVM) based on the result.
3.
Use
the classifier to detect the expression through webcam feed.
4.
A
packed GUI
As proposed, the software is developed and available
open-source.
Related
work
It has been a result of a study that people can
recognize facial expressions in motion and moving frames for example in a movie
a fast slideshow, a documentary or a video for better than in a static photograph.
This was conducted by scientists from the Max Planck
Institute for Biological Cybernetics in Tubingen, Germany, to gain the advantage,
the video sequence should be at least 1/10th of a second.
Project
Plan
The system constituted many domains and this the plan
for the project development was divided into four major categories:
1- This part includes the fetching or the dataset
separating the images from the data part checking the dataset integrity and
classifying the dataset into two separate categories (smiling denoted as 1, and
non-smiling faces denoted as 0)
2- This section takes into account the results of
previous sections (ie dataset classification into two different categories of 1
and 0 ) and trains a tinker support vector machine classifier which then faces
takes any random image from the testing set the dataset to verify the integrity
of the classifier.
3- After the classifier has been trained in the
previous stage, this stage takes the classifier and applies that to the video
frames from the camera, after detecting the faces in the frames using Haar
cascade frontal face classifier.
4- This is where all the functionalities are coupled
together and a python based Tkinter GUI is coded and all the Reactions are
grouped together and the executable is
compiler (.exe) and which is
deliberately made standalone to be able to use on a computer without any python
support available.
A.
Classifying dataset & Storing Result .xml
Olivetti faces dataset has been used to cluster different types of
images / human faces into two separate different groups of similar facial
expressions. This clustering of images into two separate groups has been done by
user and once the 400 images are classified into either of the categories then
the result is automatically stored so as to know how many of the 400 faces are
categorized as smiling and how many are non- smiling.
1. Olivetti faces dataset has 40 different people (subjects) and in
various facial expressions with open eyes, closed eyes, smiling, non- smiling,
without or without glasses, with or without facial hair and combinations of all
of these:
2. The classified images are represented as follows:
3. The bar graph is computed and displayed as follows:
B.
Training the linear Support Vector Classifier
In the previous
sectors we have finished the part of fetching and verifying the integrity of
the dataset into two different groups and served the .xml file as a result. We have also plotted the graph of total number of images in each categories and we
have also plotted
each category separately now , this section we will compute
the mean cross validation method or
the results and train a support
vector classifier (SVM) based on this.
Linear support vector machine (linear SVM)
classifiers are a set of supervised learning methods used for classification
and/or regression. This section will train the classifier and will test the
classifier with sample test data.
Figure: The classifier result
The classifier
trained has been validated against the training data and the confusion matrix
has also been plotted. The results are as follows:
The mean score
of 0.756 with the accuracy of +/- 0.024 is actually pretty decent. The accuracy
on the testing date is 0.86 and the classifier result looks pretty robust.
C.
face recognition
and Emotion Prediction Using Trainer
SVM classifier ( in OpenCV)
Since the
classifier has now been trained it was the time to test the classifier on the
live video where user can actually see if the machine has learnt its objective or
not ie the purpose and robustness.
To determine
this, or specifically to apply our linear svm classifier to the frame, we have
detect the face is the frame. This is where we use the Haar cascade classifier
where detects the faces in the frame by applying the Haar cascade transform.
Once we have detected
the face out of the frame and convert it to the size same as the size of images
in the database. This step is extremely crucial in order to be able to apply
our classifier to the frame. If this step is not done correctly, then the emotion
recognition won't be successful.
I have uploaded
the testing screenshot of the face detection as well as the emotion detection
in this step which was done which optimized coefficients.
The screenshots
are as follows:
(a)
Non-Smiling
Face. (b)
Smiling Face
(c)
Non-Smiling Face.
As it is evident
from the above pictures, the trained linear support vector classifier works
efficiently in the better lighting conditions.
D.
Python Tkinter GUI
& System Integration
The Application
window is displayed as follows in Runtime:
The function of the different button and labels in the
GUI has been explained as follows:
Image index: Keeps the track of index number of images. The total
number of maximum images available in the database is 400 and the label show
the current image being referenced. This can be reset to the first image in the
index by processing Reset button.
Smiling: This button marks the current image being displayed as smiling face (ie Happy
face) and increments the happy count by 1. Reset after pressing Reset button.
Not smiling: This button marks the current image being displayed as
non-smiling face (ie sad face) and increments the sad count by 1.Can be reset
by reset button.
Reset: Resets all the indexes and the count values.
Quit Application: edit the application and the control is returned back
to the windows gracefully.
Load the Trained Classifier & Test Output: This button load the results .xml which is the
result of the image classifier in the first step and trains a linear support vector classifier
based on this. After the classifier is the command window which contains the
video feed from web-camera. Once users face is detected the classifier predicts
the resulting expression based on the learning objective.
References
http://stackoverflow.com/questions/18640804/facial-expression-classification-in-real-time-using-svm/
http://nbviewer.jupyter.org/github/flothesof/posts/blob/master/20150107_SmileRecognition.ipynb
http://flothesof.github.io/tag/machine-learning.html
http://flothesof.github.io/smile-recognition.html
https://realpython.com/blog/python/face-recognition-with-python/
https://realpython.com/blog/python/face-detection-in-python-using-a-webcam/
http://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html
https://habrahabr.ru/post/133909/
https://habrahabr.ru/post/301096/
https://habrahabr.ru/post/135244/
https://habrahabr.ru/post/134857/
http://flothesof.github.io/tag/opencv.html
1. Otsu, N., «A Threshold Selection Method from
Gray-Level Histograms,» IEEE Transactions on Systems, Man, and Cybernetics,
Vol. 9, ¹ 1,1979., pp. 62- 66
2. John Canny, « A Computational Approach to
Edge Detection», IEEE Transactions on pattern analysis and machine
intelligence, Vol.8, ¹ 6, 1986., pp.679 -698