Masliy R.V., Kulyk A.Y.

Vinnitsia national technical university

Design verification stage for boosting-based face detection method

Boosting-based face detection methods allow detect face on images in real-time with high probability (above 90%), but the probability of false alarm is not low enough. The reason is low discriminant of simple classifiers on the last stages cascade of strong classifier.

The paper proposed to improve the boosting-based face detection method presented in [1] by applying the verification stage. The process of verification is to check areas of the image that a cascade of strong classifiers identified as the face (“face candidates”) with using classifier, which demands high detection accuracy, while perhaps not a very high speed.

The specified requirements for a responsible approach classifier proposed in paper [2]. This approach allows us to obtain a classifier that uses Bayes' rule in the form of likelihood ratio. In this case, to decide where the “face candidate” is a face image need to value likelihood ratio was more than a certain threshold:

                                             ,                                          (1)

where P (image | face) conditional probability distribution of values taken pixel image if the input face image classifier are given, P (image | non-face) conditional probability distribution of values taken pixel image if the input non-face image classifier are given,    threshold.

The advantage of using likelihood ratio when constructing the classifier is a simple conditional probability distribution P (image | face) and P (image | non-face) using separate sets of images "faces" and "non-face" than the formation probability distribution P (face | image), disadvantage is the complexity of obtaining a representative set of training images, especially for the formation probability distribution P (image | non-face).

There are two approaches to image representation for classifier: global and local. 

Global approach tries to simulate the joint behavior of all input variables. But restrictions on computational resources and memory resources do not allow for the function of input variables. The practical solution for the function of input variables is to reduce the dimension of input data using different transformations (cosine, Fourier, principle component analysis). Higher-order models and nonparametric methods are used to describe the joint behavior of the reduced set of variables.

When you approach a local input variables are combined in sets, in which relations within each set more accurately modeled than between sets. The set can be defined as a group of pixels, or group of transformed variables that meet certain mathematical characteristics. Also the same transformed variable or pixel may contain several sets. The local approach is based on the assumption that a certain object each pixel statistically associated with some pixels more than others, in this case, global representation is not optimal for face image representation.

Local approach will better fit the image representation made by its decorrelation as a statistical dependence will concentrate in a small set of variables that can form local features. As a means to select appropriate correlation wavelet transform image, which simplifies the simulation input data in wavelet space compared with the description in the space of data.  Besides properties of locality and multiscaling allow using wavelet transform effectively describe a wide range of features of the data by developing a set of local features. The local features of local binary patterns, characteristics are presented in Table 1.

Table 1 Characteristics used LBP

Mark LBP

The number of wavelet coefficients

LBP kernel,

 wavelet coefficients

LBP4

4

1

LBP8

8

1

16

2x2

32

2x2

32

4x4

64

4x4

A set of local features to be used for image representation can be divided into several groups:

1. Wavelet-coefficients obtained in one subband. These local features are most localized in the frequency and orientation. Seven local features define of these subband: level 1 LL (LBP8), level 1 LH (LBP8), level 1 HL (LBP8), level 2 LH (), level 2 HL (), level 3 LH (), Level 3 HL ().

2. Inter-frequency wavelet coefficients of one orientation but with different frequencies subband. These local features describe visual pattern covering the frequency bands several subband.  Define six such local features using the following pairs: level 1 LL (LBP4) – level 1 HL (LBP4), level 1 LL (LBP4) – level 1 LH (LBP4), level 1 LH (LBP4) – level 2 LH () , level 1 HL (LBP4) – level 2 HL (), level 2 LH () – level 3 LH (), level 2 HL () – level 3 HL ().

3. Wavelet coefficients from one subband frequency but different orientation. These local features describe horizontal and vertical components of visual images. Define three such local features using the following pairs: level 1 LH (LBP8) – level 1 HL (LBP8), level 2 LH () – level 2 HL (), level 3 LH () – level 3 HL ().

Given decorrelation wavelet coefficients will be considered that within the local features of wavelet coefficients have large statistical dependence, but between the local features statistically independent. In this case the likelihood ratio can be written as follows:

                                                  ,                                      (2)

where featurek - k-th value of local features in “face candidate”, Pk(featurek | face) - conditional probability distribution of values featurek, if the input image classifier given "face», Pk(featurek | non-face) - conditional probability distribution of values featurek, if the input image classifier presented "non-face", - threshold.

Given that local features can be calculated in different spatial coordinates of “face candidate” likelihood ratio can be written as follows:

                                         ,                                  (3)

where k - number of local features, (x,y) - coordinates of the location of local features, fk(x,y) - k-th value of local features in a point with coordinates (x,y), - the distribution of conditional probability values fk(x,y) if the input image classifier given "face" - the distribution of conditional probability values fk(x,y), if the input image classifier presented "non-face".

Using a normalized set of training images "faces" and "not face" certain local features form a conditional probability distribution  and as a histogram.

To simplify the processing stage of verification the candidate faces are normalized by changing the size to 56x56 pixels.

The decision on whether the “face candidate” a face images will be determined by the formula (3).

Using the proposed verification phase allowed to reduce the probability of false detection (from 45 to 15 found false) when the probability of detecting a slight decrease (from 95.3 % to 93.9 %) using boosting-method [1]. Testing was performed on a set of frontal view face images from the database BioID.

References:

1.                  Masliy R. Face detection in grayscale images / R. Masliy, A. Kulyk // Advanced Computer Systems and Networks: Design an Application : Proceeding of the 4-th International Conference ACSN-2009. – Lviv, 2009 – P. 170–172   

2.                  Schneiderman H. Feature-Centric Evaluation for Efficient Cascaded Object Detection // Proc. of IEEE Conf. Computer Vision and Pattern Recognition., 2004. – Vol. 2, P. 29–36.