PEOPLE RECOGNITION BY MOBILE ROBOTS

Ìàãèñòðàíò Áàøèðîâ À.Ì.

Ìåæäóíàðîäíûé óíèâåðñèòåò èíôîðìàöèîííûõ òåõíîëîãèé, Àëìàòû, Êàçàõñòàí

e-mail:aidos.bashirov@gmail.com

 

         This paper addresses the problem of detecting and identifying persons with a mobile robot, by sensory fusion of thermal and colour vision information. In the proposed system, people are first detected with a thermal camera, using image analysis techniques to segment the persons in the thermal images. This information is then used to segment the corresponding regions of the colour images, using an affine transformation to solve the image correspondence between the two cameras. After segmentation, the region of the image containing a person is further divided into regions corresponding to the person’s head, torso and legs. Temperature and colour features are then extracted from each region for input to a pattern recognition system. Three alternative classification methods were investigated in experiments with a moving mobile robot and moving persons in an office environment. The best identification performance was obtained with a dynamic recognition method based on a Bayes classifier, which takes into account evidence accumulated in a sequence of images.

         Until recently, most mobile robotics research concentrated on problems of mobility, especially navigation, and human beings were usually considered merely as “obstacles” in the environment. Recently, however, there has been much interest in so-called service robots that operate in populated environments [1,5], and the topic of human-robot interaction has attracted much attention. To enable this interaction, a mobile robot needs the ability to recognise persons in  its surroundings. The recognition problem can also be decomposed into three subproblems: detection (how many people are there?), localisation (where are  they?) and identification (who are they?).

         In this paper, we investigate a recognition method that is based on a learned model of the person’s whole appearance, including face, hair, clothes, shoes, etc., using fusion of both temperature and colour information. The robot is equipped with an infrared camera and a pan-tilt colour camera, which are positioned very close to one another (see Fig. 1). The thermal camera enables very robust and quick detection and segmentation of persons, without requiring any background

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fig. 1. PeopleBot equipped with a thermal and colour camera in a populated corridor.

model or map of the environment. By using an affine transformation to solve the image correspondence between the two cameras, it is then straightforward to segment the corresponding regions of the colour images. A set of features are extracted from the segmented regions for input to a pattern recognition system. In this paper, we investigated three different identification methods, including a dynamic method that takes into account a sequence of images. Experimental results are presented in Section 4, followed by conclusions and future work in Section 5, including a discussion of the practical issues of integrating such a method into a real world application.

         Recognition Methods

         Most existing work on people recognition concerns non-mobile applications, e.g., security cameras or identity verification systems, where the detection and tracking subproblems are simpler than in the mobile case. People detection is often solved easily by background subtraction for vision systems, or by some other method (e.g., the person uses an ATM machine or logs onto a computer system). State-of-the-art tracking methods can also be applied for localisation of persons, for example, using Kalman filters for tracking single persons [10] or the condensation algorithm for tracking multiple persons in video image sequences. Detection and localisation of multiple persons is more complex because of well-known problems such as occlusion, segmentation and data association. The rest of this section deals with the subproblem of person identification. Most automatic methods for people identification use biometrics. A biometric is a measure based on physiological or behavioral characteristics of a person including appearance, social behaviour, biodynamics, natural physiognomy (e.g., skull measurements, fingerprint, retinal scans) and imposed physiognomy (e.g., dog-tags, bracelets). Some of these identification methods require interaction with the subject (e.g., fingerprints, iris, retina, hand-writing). Below we shortly describe the most common features used in person identification. We only consider features that are accessible without subject intervention (non-invasive methods), which makes them suitable for use by autonomous mobile robots.

         Face. Face recognition is one of the most reliable methods to recognise humans. Most of the existing face recognition systems are vision-based (using either a single image or sequence). However, there are also techniques using range data (analysis of the 3D shape of the face [2]) and thermal information. Existing algorithms can be divided into:

Holistic methods, that use the whole face region as an input. Different ap proaches use eigenfaces (principal component analysis), fisherfaces [2], support vector machines, genetic algorithms and artificial neural networks.

Feature-based methods, based on local features such as the eyes, nose or mouth. Representative examples include graph matching methods [5], hidden Markov models and self-organizing feature maps.

Hybrid methods, this approach is similar to the human perception system, combining analysis of both the whole face and local features.

         Despite many advances in this field, a major problem – sensitivity to pose and illumination variation – still exists. Recent trends in this field lead to methods based on 3D geometrical models of the face.

         Voice. Speaker recognition is the automatic process of recognizing who is speaking on the basis of characteristics (physiological and behavioural) found in speech waves. This information exists both in the short and long-term spectral features. Most speech recognition systems are designed for verification of identity. Existing techniques for speaker identification can be divided into text-dependent and text-independent methods. The latter does not rely on a specific text being spoken, which makes it more useful in mobile robotics. The two most successful approaches are based on vector quantization  and hidden Markov models. Variability generated by the speaker, recording conditions and background noise makes the speaker identification problem still an open issue for further research.

         Gait. Psychophysiological experiments and biomechanics studies provide evidence that gait signature contains possibly unique characteristics for each individual. Recognition methods can be divided into model-based methods which incorporate kinematics and dynamics of the human body [8], and model-free methods. The majority of gait recognition systems are based on vision, which leads to problems with the proper segmentation of persons and occlusions.

         Other. Other features that can be used in recognition systems are the whole appearance of persons (including face, hair, clothes, shoes, etc.) [7], shape and proportions of the body, weight and lip movements. These have been less well studied than the features described above, but they can still be used as a complementary input in systems combining multiple cues. Some examples of systems using multiple cues include [4] (face, voice), (face, fingerprint, hand geometry) and [34] (whole body appearance, voice, face).

         Recognition Systems in Mobile Robotics

         The most popular sensors in existing systems for people recognition by mobile robots are colour vision [3] and laser scanners. However, most systems based solely on vision are restricted to stationary robots, i.e., recognition from a moving platform is not possible, because of the dependency on background subtraction methods for segmentation. On the other hand, systems based on laser scanners are often dependent on both a complete, accurate model of the environment and accurate self-localisation in order to carry out people detection by background subtraction on the moving robot. (In fact, most of the laserbased systems only track “moving objects” rather than humans, since they only measure distances.) Thermal vision helps to overcome some of the limitations of colour vision, since humans have a distinctive thermal profile compared to non-living objects. An interesting example of a system detecting pedestrians from a moving platform using thermal information is described in. Some approaches combine different modalities, using laser for detection and tracking of people, and vision for identification. Existing systems are still far away from human capabilities.

References

1. H. Asoh, S. Hayamizu, I. Hara, Y. Motomura, S. Akaho, and T. Matsui. Socially embedded learning of office-conversant robot Jijo-2. In Proc. of the Int. Joint Conference on Artificial Intelligence (IJCAI), 1997.

2. P.N. Belhumeur, J. Hespanha, and D.J. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. In ECCV (1), pages 45–58, 1996.

3. S. M. Brown, C. L. Lisetti, and A. H. Marpaung. Cherry, the little red robot with a mission and a personality! In Papers from the AAAI Fall Symposium on Human-Robot Interaction, North Falmouth, Massachusetts, November 15-17 2002.

4. R. Brunelli and D. Falavigna. Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(10):955–966, 1995.

5. W. Burgard, A.B. Cremers, D. Fox, D. H ̈hnel, G. Lakemeyer, D. Schulz, a W. Steiner, and S. Thrun. Experiences with an interactive museum tour-guide robot. Artificial Intelligence, 114(1-2), 1999.

6. G. Cielniak, M. Bennewitz, and W. Burgard. Where is . . . ? Learning and utilizing motion patterns of persons with mobile robots. In Proc. of the Int. Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, August 9-15, 2003.

7. G. Cielniak and T. Duckett. Person identification by mobile robots in indoor environments. In Proceedings of the IEEE International Workshop on Robotic Sensing (ROSE), Orebro, Sweden, 2003.

8. D. Cunado, M.S. Nixon, and J.N. Carter. Automatic extraction and description of human gait models for recognition purposes. Computer Vision and Image Understanding, 90(1):1–41, 2003.

9. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, New York, 2nd edition, 2000.

10. U. Frese and T. Duckett. A multigrid approach for accelerating relaxation-based SLAM. In Proc. IJCAI Workshop on Reasoning with Uncertainty in Robotics, Acapulco, Mexico, August 9, 2003.