PEOPLE RECOGNITION BY MOBILE ROBOTS
Ìàãèñòðàíò Áàøèðîâ À.Ì.
Ìåæäóíàðîäíûé óíèâåðñèòåò èíôîðìàöèîííûõ òåõíîëîãèé,
Àëìàòû, Êàçàõñòàí
e-mail:aidos.bashirov@gmail.com
This paper addresses the problem of
detecting and identifying persons with a mobile robot, by sensory fusion of
thermal and colour vision information. In the proposed system, people are first
detected with a thermal camera, using image analysis techniques
to segment the persons in the thermal images. This information is then used to
segment the corresponding regions of the colour images, using an affine
transformation to solve the image correspondence between the two cameras. After segmentation, the region of the image
containing a person is further divided into regions corresponding to the person’s
head, torso and legs. Temperature and colour features are then extracted
from each region for input to a pattern recognition system. Three
alternative classification methods were investigated in experiments with a
moving mobile robot and moving persons in an office environment. The
best identification performance was obtained with a dynamic recognition method
based on a Bayes classifier, which takes into account
evidence accumulated in a sequence of images.
Until recently, most mobile robotics
research concentrated on problems of mobility, especially navigation, and human
beings were usually considered merely as
“obstacles” in the environment. Recently, however, there has been much interest
in so-called service robots that operate in populated environments [1,5], and the
topic of human-robot interaction has attracted much attention. To enable this
interaction, a mobile robot needs the ability to recognise persons in its surroundings. The recognition problem
can also be decomposed into three subproblems: detection (how
many people are there?), localisation (where are they?) and identification (who are they?).
In this paper, we investigate a
recognition method that is based on a learned model
of the person’s whole appearance, including face, hair, clothes, shoes, etc., using
fusion of both temperature and colour information. The robot is equipped with an
infrared camera and a pan-tilt colour camera, which are positioned very close
to one another (see Fig. 1). The thermal camera enables very robust and quick
detection and segmentation of persons, without requiring any background

Fig. 1.
PeopleBot equipped with a thermal and colour camera in a populated corridor.
model
or map of the environment. By using an affine transformation to solve the
image correspondence between the two cameras, it is then straightforward to
segment the corresponding regions of the colour images. A set of features are extracted
from the segmented regions for input to a pattern recognition system. In this
paper, we investigated three different identification methods, including a
dynamic method that takes into account a sequence of images. Experimental
results are presented in Section 4, followed by conclusions
and future work in Section 5, including a discussion of the practical issues
of integrating such a method into a real world application.
Recognition Methods
Most existing work on people
recognition concerns non-mobile applications, e.g., security
cameras or identity verification systems, where the detection and tracking
subproblems are simpler than in the mobile case. People detection is often solved
easily by background subtraction for vision systems, or by some other method
(e.g., the person uses an ATM machine or logs onto a computer system).
State-of-the-art tracking methods can also be applied for localisation of persons,
for example, using Kalman filters for tracking single persons [10] or the
condensation algorithm for tracking multiple persons in video image sequences.
Detection and localisation of multiple persons is more complex because of
well-known problems such as occlusion, segmentation and data association. The
rest of this section deals with the subproblem of person identification. Most
automatic methods for people identification use biometrics. A biometric is a
measure based on physiological or behavioral characteristics of a person
including appearance, social behaviour, biodynamics, natural physiognomy (e.g.,
skull measurements, fingerprint, retinal scans) and imposed physiognomy (e.g.,
dog-tags, bracelets). Some of these identification methods require interaction
with the subject (e.g., fingerprints, iris, retina, hand-writing). Below we shortly
describe the most common features used in person identification. We only consider
features that are accessible without subject intervention (non-invasive methods),
which makes them suitable for use by autonomous mobile robots.
Face. Face
recognition is one of the most reliable methods to recognise humans. Most of
the existing face recognition systems are vision-based (using either a single
image or sequence). However, there are also techniques using range data (analysis
of the 3D shape of the face [2]) and thermal information. Existing algorithms
can be divided into:
– Holistic
methods, that use the whole face region as an input. Different ap proaches use
eigenfaces (principal component analysis), fisherfaces [2], support
vector machines, genetic algorithms and artificial neural networks.
– Feature-based
methods, based on local features such as the eyes, nose or mouth.
Representative examples include graph matching methods [5], hidden Markov
models and self-organizing feature maps.
– Hybrid
methods, this approach is similar to the human perception system, combining
analysis of both the whole face and local features.
Despite many advances in this field, a
major problem – sensitivity to pose and illumination
variation – still exists. Recent trends in this field lead to methods based
on 3D geometrical models of the face.
Voice. Speaker recognition is
the automatic process of recognizing who is speaking on the basis of
characteristics (physiological and behavioural) found in speech waves.
This information exists both in the short and long-term spectral features. Most
speech recognition systems are designed for verification of identity. Existing
techniques for speaker identification can be divided into text-dependent and text-independent
methods. The latter does not rely on a specific text being spoken,
which makes it more useful in mobile robotics. The two most successful approaches
are based on vector quantization and
hidden Markov models. Variability generated by the
speaker, recording conditions and background noise makes
the speaker identification problem still an open issue for further research.
Gait. Psychophysiological
experiments and biomechanics studies provide evidence that gait signature
contains possibly unique characteristics for each individual. Recognition
methods can be divided into model-based methods which incorporate
kinematics and dynamics of the human body [8], and model-free methods.
The majority of gait recognition systems are based on vision, which leads
to problems with the proper segmentation of persons and occlusions.
Other. Other features that can
be used in recognition systems are the whole appearance
of persons (including face, hair, clothes, shoes, etc.) [7], shape and proportions
of the body, weight and lip movements. These have been less well studied
than the features described above, but they can still be used as a
complementary input in systems combining multiple cues. Some examples of
systems using multiple cues include [4] (face, voice), (face, fingerprint, hand
geometry) and [34] (whole body appearance, voice, face).
Recognition Systems in Mobile
Robotics
The most popular sensors in existing
systems for people recognition by mobile robots
are colour vision [3] and laser scanners. However, most systems based
solely on vision are restricted to stationary robots, i.e., recognition from a
moving platform is not possible, because of the dependency on background subtraction
methods for segmentation. On the other hand, systems based on laser
scanners are often dependent on both a complete, accurate model of the environment
and accurate self-localisation in order to carry out people detection by
background subtraction on the moving robot. (In fact, most of the laserbased
systems only track “moving objects” rather than humans, since they only measure
distances.) Thermal vision helps to overcome some of the limitations of
colour vision, since humans have a distinctive thermal profile compared to non-living
objects. An interesting example of a system detecting pedestrians from a
moving platform using thermal information is described in. Some approaches
combine different modalities, using laser for detection and tracking of
people, and vision for identification. Existing systems are still far away from
human capabilities.
References
1. H.
Asoh, S. Hayamizu, I. Hara, Y. Motomura, S. Akaho, and T. Matsui. Socially embedded
learning of office-conversant robot Jijo-2. In Proc. of the Int. Joint Conference
on Artificial Intelligence (IJCAI), 1997.
2. P.N.
Belhumeur, J. Hespanha, and D.J. Kriegman. Eigenfaces vs. fisherfaces: Recognition
using class specific linear projection. In ECCV (1), pages 45–58, 1996.
3. S.
M. Brown, C. L. Lisetti, and A. H. Marpaung. Cherry, the little red robot with a
mission and a personality! In Papers from the AAAI Fall Symposium on Human-Robot
Interaction, North Falmouth, Massachusetts, November 15-17 2002.
4. R.
Brunelli and D. Falavigna. Person identification using multiple cues. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 17(10):955–966, 1995.
5. W.
Burgard, A.B. Cremers, D. Fox, D. H ̈hnel, G. Lakemeyer, D. Schulz, a W.
Steiner, and S. Thrun. Experiences with an interactive museum tour-guide robot.
Artificial Intelligence, 114(1-2), 1999.
6. G.
Cielniak, M. Bennewitz, and W. Burgard. Where is . . . ? Learning and utilizing motion
patterns of persons with mobile robots. In Proc. of the Int. Joint Conference on
Artificial Intelligence (IJCAI), Acapulco, Mexico, August 9-15, 2003.
7. G.
Cielniak and T. Duckett. Person identification by mobile robots in indoor environments.
In Proceedings of the IEEE International Workshop on Robotic Sensing (ROSE),
Orebro, Sweden, 2003.
8. D.
Cunado, M.S. Nixon, and J.N. Carter. Automatic extraction and description of
human gait models for recognition purposes. Computer Vision and Image
Understanding, 90(1):1–41, 2003.
9. R.
O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, New York, 2nd
edition, 2000.
10. U.
Frese and T. Duckett. A multigrid approach for accelerating relaxation-based SLAM.
In Proc. IJCAI Workshop on Reasoning with Uncertainty in Robotics, Acapulco,
Mexico, August 9, 2003.