Ýêîëîãèÿ/1.Ñîñòîÿíèå áèîñôåðû è åãî âëèÿíèå íà
çäîðîâüå ÷åëîâåêà.
E.D. Konstantinova, PhD, A.N. Varaksin, Prof.
Institute of Industrial Ecology, the Urals Branch of
the Russian Academy of Sciences, Ekaterinburg, Russia (620990 Ekaterinburg, 20
Sofia Kovalevskaya Str.)
Environmental Risk Factors and Public Health: Multifactor Models Based
on Hierarchical Classification Methods
Introduction
Within the
framework of contemporary paradigm of epidemiology of non-contagious diseases in
contrast to the epidemiology of contagious diseases, one refers not to simple
cause-and-effect relationships, but to the so-called web of causation. A
disease is associated with many causes, and each one of them can increase the
risk of the disease emergence but is neither essential, nor sufficient for the
disease to occur (Beaglehole, Bonita, Kjellstrom, 1994; Fletcher, Fletcher,
Wagner, 1996). Human health is defined by simultaneous exposure to many risk
factors (RF) including those of ecological, social, familial, etc. and therefore,
their effects should be assessed in full. Thus, the task of the multifactor
analysis is to reveal such RF combinations which affect public health the most.
The Classification
Tree Method (CT) is one of the methods for classification of observations (Breiman,
Friedman, Stone, Olshen, 1984). Application of classification methods to solve
tasks on the RF effects on public health is based on an assumption that a
selection of predictors which may allow for a reliable classification of
disease cases and healthy people is the selection of risk factors with the
strongest effects regarding disease emergence.
The classification methods
have not yet been extensively applied to tasks on the RF effects on disease
incidence. This is caused by the following: the decision rule elaborated
through a classification method shall be of top quality (high percentage of
accurate object identification). And this is basically impracticable for the
tasks on health affecting RF since the Risk Factors only increase a possibility
of disease contraction but not cause it. That is why, even a big RF complex
cannot guarantee a disease emergence (the same as the absence RF – the absence
of disease), and therefore, it is not possible to closely predict a disease
during a mathematical description based on risk factors’ knowledge (impossible
to elaborate a top quality decision rule). The specific character of the CT
method is the possibility to use the decision rule of non-top quality which
does not necessarily give an accurate distinction between a disease case and a
healthy person, but allows for dividing the cohort under consideration into
groups with high and low disease incidence. This turns out to be sufficient to
solve the tasks on RF health effects.
Generally, environment
pollution (environmental RF) is classified as nonmodifiable while the familial,
behavioral or social factors are mostly modifiable. Therefore, it is essential
to solve the task of revealing the complex of non-environmental factors capable
of maximum compensation of negative effects from environment pollution on
public health.
The aim. Development of dedicated statistical models for
describing multifactor effects of the life environment on public health;
elaboration of an algorithm of revealing a complex of factors compensating the
environmental negative effects. Coming up with maximum efficient propositions
on children’s health maintenance and rehabilitation.
Materials and Methods
In the course of
preventive examination of 441 children attending preschool institutions of the
City of Ekaterinburg data were obtained
on their health status and breakdown risk factors involved. The health status
is defined with regard to presence/ absence of respiratory diseases, diseases
of musculoskeletal system. The risk factors considered in this work comprise
data on atmospheric air pollution with motor vehicle emissions, quality of
potable water, and type of a stove (electric/ gas) used by a child’s family, as
well as data on a child’s family including bad habits of parents, dwelling
space condition, mother’s education, etc. (E.D. Konstantinova, A.N. Varaksin, A.A.
Zhivoderov, I.V. Zhovner, 2007).
In order to reveal
the complex of factors with the strongest negative health effects, as well as
factors compensating the negative effects from environment pollution,
algorithms based on the idea of hierarchical classification (Classification
Tree Method) were used (E.D. Konstantinova, A.N. Varaksin, 2009; 2010; 2011).
Results
Two algorithms were
elaborated to solve the set tasks.
Algorithm No.1. Revealing a risk factor
complex with the strongest negative effects on public health.
This algorithm is
based in the Classification Tree Method. Fig. 1 shows a classification tree for
respiratory diseases (pathology D10). We shall
explain the essence if the algorithm. Initially all
children from our selection are assigned to the root node (node No.1 Fig. 1). Number
n =441 above the root node corresponds to the sample size (number of children),
and W=24.0% is D10 pathology prevalence rate within this selection. At the
first stage of the tree development the root node is divided into two nodes. Out
of the risk factors considered by us the Classification Tree Method chooses for
the first splitting the factor “Child’s Physical Activity” (the factor has two
gradations: physical activity and physical inactivity; the physical inactivity
is considered to be a risk factor increasing disease probability). According to
the CT method such division of children allows to get two groups of children
the W values varying to the maximum regarding the D10 pathology prevalence rate
(W=29, 5% for n =235 children with physical inactivity, and W=18, 4% for n =206
children with physical activity).
Fig. 1. Classification
tree for respiratory organs pathology.
Obviously, one risk
factor cannot at once divide children into medical cases and healthy children
(i.e. get groups of children with W=1 and W=0), that is why the division
process should be continued. Each of the No.2 and No.3 nodes may be divided
according to the same principle (maximum W variables in two obtained groups). As
shown in Fig. 1 both No.2 and No.3 nodes are divided as per risk factor “Level
of Car Emission” (in our research this factor has three gradations: low, middle
and high level of car emission). For children living in highly polluted
environment node No.2 evolves into terminal node No.5 (which is non-dividable);
this node corresponds to the sample of children with very high prevalence rate W=36,
0%. Other node (node No.4) at the next stage of the tree development is divided
under factor “Type of Stove in a Child’s Apartment”: electric or gas stove.
As a result of this
splitting we get two terminal nodes (No.8 and No.9), and the process of
division of this tree branch completes here. One of these two terminal nodes (node
No.9) corresponds to the group of children with high D10 pathology prevalence (W=30,
0%). Node No.7 at the third stage of the tree development is divided under
factor “Type of Stove in a Child’s Apartment”: electric or gas stove (similar
to node No.4). This splitting results in two terminal nodes (No.10 and No.11). Node
No.10 corresponds to the group of children with low D10 pathology prevalence (W=16%),
and node No.11 comprises the group of children with medium pathology prevalence
in the whole sample (W=24, 0%).
Now we shall choose
nodes with low W values (these are nodes No.No.6 and 10) and nodes with high W
values (nodes No.No.5 and 9). Based on these the following decision rule may be
formulated:
- a child belongs
to the group with low D10 pathology prevalence in case of sufficient physical
activity in addition to living in an area of low level of car emission in atmospheric
air or in an apartment with electric stove. Two terminal nodes No.6 and No.10 comprise
the total of 106 children among whom the average D10 pathology prevalence is
W0=13, 2%; and
- a child belongs
to the group with high W of D10 pathology prevalence in case the factor
“Physical Inactivity” is present in addition to high level of car emission in atmospheric
air or living in an apartment where a gas stove is installed. Gas stove is a famous
risk factor for respiratory diseases; however, in case of the Ekaterinburg
children (as revealed by our analysis) it is placed behind the risk factors
“Physical Inactivity” and “High Level of Car emission”. Two terminal nodes No.5
and No.9 comprise the total of 178 children among whom the average D10
pathology prevalence is W1=32, 0%.
Thus, relative risk
of effects of the above mentioned complex of risk factors equals RR= W1/W0=32,
0/13, 2=2, 4; 95% confidence interval for RR equals (1, 7 – 3, 0).
Algorithm No.2. Revealing complexes
of socioeconomic and familial factors compensating the negative public health
effects from environment pollution.
A new algorithm is
given for risk factors complex revealing based on the idea of hierarchical
classification (Classification Tree Method). The algorithm is applied to reveal
a complex of non-environmental factors compensating the negative effects from
environment pollution on health of preschool children of the City of
Ekaterinburg. It demonstrates that altering some of the behavioral factors may
reduce the prevalence rate of diseases of musculoskeletal system, as well as diseases
of blood circulatory system for preschoolers of Ekaterinburg 2, 0 – 2, 5 times.
By way of example
let us consider the possibility of revealing factors compensating the negative
effects on children’s health from atmospheric air pollution registered as the
increase of the prevalence rate of diseases of musculoskeletal system (class
D13), Fig. 2.
In this example as
an initial sample we shall consider a sample of children exposed to effects of
risk factor “Atmospheric Air Pollution with Car Emissions”. This group of
children is represented by node No.1 in Fig. 2 (110 children out of the total
number of 441 children). The prevalence rate of D13 class pathology for them is
= 24, 5%.
Fig. 2. Classification
tree for revealing a complex of factors compensating the negative effects from car
emission in atmospheric air on prevalence rate of diseases of musculoskeletal system
(class D13).
The Classification Tree
algorithm chooses factor “Quality potable Water” for the first splitting. According
to node No.2 children drinking quality water show a disease incidence of W=18,2%
(1,35 times lower). Other nodes leading to a low W value are numbered 4 and 8; these
correspond to a sufficient level of a child's activity and a non-smoking
mother.
Thus, in accordance
to the elaborated algorithm the negative effects on children's health from car
emission in atmospheric air, which result in increased prevalence rate of
diseases of musculoskeletal system, may be compensated by drinking quality
potable water, a child’s sufficient physical activity and non-smoking mother. Following
the listed recommendations may reduce D13 disease incidence with the
preschoolers of Ekaterinburg from W=24, 5% (node No.1) to W=10, 8% (node No.2),
i.e. make it almost 2, 5 times lower.
When organizing the
splitting one could have stopped at node No.4. In this case the recommendations
would have read as follows: if a child lives in the area of high level of car
emission in atmospheric air, D13 incidence may be reduced (2, 1 times) by means
of drinking quality potable water and providing a child with sufficient level
of physical activity. Thus, “Smoking Mother” in this certain web of causation is
not the most significant risk factor (that would not mean, of course, that
smoking is not a risk factor at all). Meanwhile, one should remember that
effects from any factor are assessed against a background of other factors
since risk factors have comprehensive effects.
Conclusions
1. We describe a
philosophy of using the Classification Tree Method – an efficient method of
revealing a combination of risk factors with the strongest negative effects on
children's health. The results obtained through the CT method may be
conveniently represented as diagrams that is why the conclusions based on them
are compelling and easily interpreted both by researchers of
medical-and-environmental monitoring field and specialists of practical
healthcare.
2. The method is
applied to the analysis of the effects of risk factor combinations on incidence
of respiratory diseases among the preschoolers of Ekaterinburg. The most
adverse risk factor combinations comprise a child’s insufficient physical
activity coupled with high levels of car emission in atmospheric air and
pollution of air in a child’s apartment with natural gas combustion products in
a gas stove (electric stove is a more friendly variant). Combination of these
risk factors increases incidence of children’s respiratory diseases from W0=13,2%
to W1=32,0%, making it 2,42 higher (relative risk RR is W1/W0=2,42;
95%-CI for RR from 1,42 to 4,13).
3. A method is
elaborated for revealing complexes of socioeconomic and behavioral factors
capable of compensating the negative public health effects from environment
pollution. The method is based on a hierarchical classification searching a
certain (small) number of variants of the classification tree splitting at each step.
4. The specific
data on the preschoolers of Ekaterinburg demonstrates that children
considerably gain in health thanks to such measures as parents’ giving up
smoking in their child’s presence, providing a child with sufficient physical
activity, and drinking quality potable water by a family. All these factors are
classified as modifiable (could be altered at a child’s parents’ wish) and may
be eliminated in contrast to such nonmodifiable environmental factor as atmospheric
air pollution in a big industrial city.
Following the
listed recommendations may reduce incidence of musculoskeletal diseases with
the preschoolers of Ekaterinburg from W1=24,5% to W0=10,8%
(RR=2,27; 95%-CI for RR from 1,02 to 6,06).
Acknowledgments
This study has been aided by the Research
Programme “Fundamental Science for Medicine” of the Project of Ural Department
of Russian Academy of Sciences, Project No.12–Ï–2-1033.
References
1. Beaglehole R., Bonita R., Kjellstrom T. (1994). Basic epidemiology. - Teacher's
guide / - 2nd ed. Geneva, Switzerland :
World Health Organization.
2. Fletcher R.,
Fletcher S., Wagner E. (1996).
Clinical epidemiology: The essentials. Williams & Wilkins.
3. Breiman L., Friedman J., Stone C., Olshen R.A. (1984).
Classification and Regression Trees. Chapman and Hall.
4. Konstantinova E.D., Varaksin A.N., Zhivoderov A.A.,
Zhovner I.V. (2007). Ecological and social factors and children's health in the
industrial city // Ural Medical Journal (Ekaterinburg). ¹ 11(39). P.48-52. (in
Russian).
5.
Konstantinova E.D., Varaksin
A.N. (2009). Classification trees method in problems of the risk factors
influence on children health // Ecological Systems and Devices (Moscow). ¹ 10. P.23-28. (in Russian).
6.
Konstantinova E.D., Varaksin
A.N. (2010). Development of methods to find factors compensating health
impact from adverse environmental pollution // Ecological Systems and Devices
(Moscow). ¹ 5. P.35-38. (in Russian).
7.
Konstantinova E.D., Varaksin
A.N.
Elaboration and application of a new hierarchical classification algorithm in
epidemiological research // 23rd annual Conference of International
Society for Environmental Epidemiology. Barcelona (Spain), 13-16 September
2011. Abstract ¹ 00389.