Ýêîëîãèÿ/1.Ñîñòîÿíèå áèîñôåðû è åãî âëèÿíèå íà çäîðîâüå ÷åëîâåêà.
E.D. Konstantinova, PhD, A.N. Varaksin, Prof.
Institute of Industrial Ecology, the Urals Branch of the Russian Academy of Sciences, Ekaterinburg, Russia (620990 Ekaterinburg, 20 Sofia Kovalevskaya Str.)
Environmental Risk Factors and Public Health: Multifactor Models Based on Hierarchical Classification Methods
Within the framework of contemporary paradigm of epidemiology of non-contagious diseases in contrast to the epidemiology of contagious diseases, one refers not to simple cause-and-effect relationships, but to the so-called web of causation. A disease is associated with many causes, and each one of them can increase the risk of the disease emergence but is neither essential, nor sufficient for the disease to occur (Beaglehole, Bonita, Kjellstrom, 1994; Fletcher, Fletcher, Wagner, 1996). Human health is defined by simultaneous exposure to many risk factors (RF) including those of ecological, social, familial, etc. and therefore, their effects should be assessed in full. Thus, the task of the multifactor analysis is to reveal such RF combinations which affect public health the most.
The Classification Tree Method (CT) is one of the methods for classification of observations (Breiman, Friedman, Stone, Olshen, 1984). Application of classification methods to solve tasks on the RF effects on public health is based on an assumption that a selection of predictors which may allow for a reliable classification of disease cases and healthy people is the selection of risk factors with the strongest effects regarding disease emergence.
The classification methods have not yet been extensively applied to tasks on the RF effects on disease incidence. This is caused by the following: the decision rule elaborated through a classification method shall be of top quality (high percentage of accurate object identification). And this is basically impracticable for the tasks on health affecting RF since the Risk Factors only increase a possibility of disease contraction but not cause it. That is why, even a big RF complex cannot guarantee a disease emergence (the same as the absence RF – the absence of disease), and therefore, it is not possible to closely predict a disease during a mathematical description based on risk factors’ knowledge (impossible to elaborate a top quality decision rule). The specific character of the CT method is the possibility to use the decision rule of non-top quality which does not necessarily give an accurate distinction between a disease case and a healthy person, but allows for dividing the cohort under consideration into groups with high and low disease incidence. This turns out to be sufficient to solve the tasks on RF health effects.
Generally, environment pollution (environmental RF) is classified as nonmodifiable while the familial, behavioral or social factors are mostly modifiable. Therefore, it is essential to solve the task of revealing the complex of non-environmental factors capable of maximum compensation of negative effects from environment pollution on public health.
The aim. Development of dedicated statistical models for describing multifactor effects of the life environment on public health; elaboration of an algorithm of revealing a complex of factors compensating the environmental negative effects. Coming up with maximum efficient propositions on children’s health maintenance and rehabilitation.
Materials and Methods
In the course of preventive examination of 441 children attending preschool institutions of the City of Ekaterinburg data were obtained on their health status and breakdown risk factors involved. The health status is defined with regard to presence/ absence of respiratory diseases, diseases of musculoskeletal system. The risk factors considered in this work comprise data on atmospheric air pollution with motor vehicle emissions, quality of potable water, and type of a stove (electric/ gas) used by a child’s family, as well as data on a child’s family including bad habits of parents, dwelling space condition, mother’s education, etc. (E.D. Konstantinova, A.N. Varaksin, A.A. Zhivoderov, I.V. Zhovner, 2007).
In order to reveal the complex of factors with the strongest negative health effects, as well as factors compensating the negative effects from environment pollution, algorithms based on the idea of hierarchical classification (Classification Tree Method) were used (E.D. Konstantinova, A.N. Varaksin, 2009; 2010; 2011).
Two algorithms were elaborated to solve the set tasks.
Algorithm No.1. Revealing a risk factor complex with the strongest negative effects on public health.
This algorithm is based in the Classification Tree Method. Fig. 1 shows a classification tree for respiratory diseases (pathology D10). We shall explain the essence if the algorithm. Initially all children from our selection are assigned to the root node (node No.1 Fig. 1). Number n =441 above the root node corresponds to the sample size (number of children), and W=24.0% is D10 pathology prevalence rate within this selection. At the first stage of the tree development the root node is divided into two nodes. Out of the risk factors considered by us the Classification Tree Method chooses for the first splitting the factor “Child’s Physical Activity” (the factor has two gradations: physical activity and physical inactivity; the physical inactivity is considered to be a risk factor increasing disease probability). According to the CT method such division of children allows to get two groups of children the W values varying to the maximum regarding the D10 pathology prevalence rate (W=29, 5% for n =235 children with physical inactivity, and W=18, 4% for n =206 children with physical activity).
Fig. 1. Classification tree for respiratory organs pathology.
Obviously, one risk factor cannot at once divide children into medical cases and healthy children (i.e. get groups of children with W=1 and W=0), that is why the division process should be continued. Each of the No.2 and No.3 nodes may be divided according to the same principle (maximum W variables in two obtained groups). As shown in Fig. 1 both No.2 and No.3 nodes are divided as per risk factor “Level of Car Emission” (in our research this factor has three gradations: low, middle and high level of car emission). For children living in highly polluted environment node No.2 evolves into terminal node No.5 (which is non-dividable); this node corresponds to the sample of children with very high prevalence rate W=36, 0%. Other node (node No.4) at the next stage of the tree development is divided under factor “Type of Stove in a Child’s Apartment”: electric or gas stove.
As a result of this splitting we get two terminal nodes (No.8 and No.9), and the process of division of this tree branch completes here. One of these two terminal nodes (node No.9) corresponds to the group of children with high D10 pathology prevalence (W=30, 0%). Node No.7 at the third stage of the tree development is divided under factor “Type of Stove in a Child’s Apartment”: electric or gas stove (similar to node No.4). This splitting results in two terminal nodes (No.10 and No.11). Node No.10 corresponds to the group of children with low D10 pathology prevalence (W=16%), and node No.11 comprises the group of children with medium pathology prevalence in the whole sample (W=24, 0%).
Now we shall choose nodes with low W values (these are nodes No.No.6 and 10) and nodes with high W values (nodes No.No.5 and 9). Based on these the following decision rule may be formulated:
- a child belongs to the group with low D10 pathology prevalence in case of sufficient physical activity in addition to living in an area of low level of car emission in atmospheric air or in an apartment with electric stove. Two terminal nodes No.6 and No.10 comprise the total of 106 children among whom the average D10 pathology prevalence is W0=13, 2%; and
- a child belongs to the group with high W of D10 pathology prevalence in case the factor “Physical Inactivity” is present in addition to high level of car emission in atmospheric air or living in an apartment where a gas stove is installed. Gas stove is a famous risk factor for respiratory diseases; however, in case of the Ekaterinburg children (as revealed by our analysis) it is placed behind the risk factors “Physical Inactivity” and “High Level of Car emission”. Two terminal nodes No.5 and No.9 comprise the total of 178 children among whom the average D10 pathology prevalence is W1=32, 0%.
Thus, relative risk of effects of the above mentioned complex of risk factors equals RR= W1/W0=32, 0/13, 2=2, 4; 95% confidence interval for RR equals (1, 7 – 3, 0).
Algorithm No.2. Revealing complexes of socioeconomic and familial factors compensating the negative public health effects from environment pollution.
A new algorithm is given for risk factors complex revealing based on the idea of hierarchical classification (Classification Tree Method). The algorithm is applied to reveal a complex of non-environmental factors compensating the negative effects from environment pollution on health of preschool children of the City of Ekaterinburg. It demonstrates that altering some of the behavioral factors may reduce the prevalence rate of diseases of musculoskeletal system, as well as diseases of blood circulatory system for preschoolers of Ekaterinburg 2, 0 – 2, 5 times.
By way of example let us consider the possibility of revealing factors compensating the negative effects on children’s health from atmospheric air pollution registered as the increase of the prevalence rate of diseases of musculoskeletal system (class D13), Fig. 2.
In this example as an initial sample we shall consider a sample of children exposed to effects of risk factor “Atmospheric Air Pollution with Car Emissions”. This group of children is represented by node No.1 in Fig. 2 (110 children out of the total number of 441 children). The prevalence rate of D13 class pathology for them is = 24, 5%.
Fig. 2. Classification tree for revealing a complex of factors compensating the negative effects from car emission in atmospheric air on prevalence rate of diseases of musculoskeletal system (class D13).
The Classification Tree algorithm chooses factor “Quality potable Water” for the first splitting. According to node No.2 children drinking quality water show a disease incidence of W=18,2% (1,35 times lower). Other nodes leading to a low W value are numbered 4 and 8; these correspond to a sufficient level of a child's activity and a non-smoking mother.
Thus, in accordance to the elaborated algorithm the negative effects on children's health from car emission in atmospheric air, which result in increased prevalence rate of diseases of musculoskeletal system, may be compensated by drinking quality potable water, a child’s sufficient physical activity and non-smoking mother. Following the listed recommendations may reduce D13 disease incidence with the preschoolers of Ekaterinburg from W=24, 5% (node No.1) to W=10, 8% (node No.2), i.e. make it almost 2, 5 times lower.
When organizing the splitting one could have stopped at node No.4. In this case the recommendations would have read as follows: if a child lives in the area of high level of car emission in atmospheric air, D13 incidence may be reduced (2, 1 times) by means of drinking quality potable water and providing a child with sufficient level of physical activity. Thus, “Smoking Mother” in this certain web of causation is not the most significant risk factor (that would not mean, of course, that smoking is not a risk factor at all). Meanwhile, one should remember that effects from any factor are assessed against a background of other factors since risk factors have comprehensive effects.
1. We describe a philosophy of using the Classification Tree Method – an efficient method of revealing a combination of risk factors with the strongest negative effects on children's health. The results obtained through the CT method may be conveniently represented as diagrams that is why the conclusions based on them are compelling and easily interpreted both by researchers of medical-and-environmental monitoring field and specialists of practical healthcare.
2. The method is applied to the analysis of the effects of risk factor combinations on incidence of respiratory diseases among the preschoolers of Ekaterinburg. The most adverse risk factor combinations comprise a child’s insufficient physical activity coupled with high levels of car emission in atmospheric air and pollution of air in a child’s apartment with natural gas combustion products in a gas stove (electric stove is a more friendly variant). Combination of these risk factors increases incidence of children’s respiratory diseases from W0=13,2% to W1=32,0%, making it 2,42 higher (relative risk RR is W1/W0=2,42; 95%-CI for RR from 1,42 to 4,13).
3. A method is elaborated for revealing complexes of socioeconomic and behavioral factors capable of compensating the negative public health effects from environment pollution. The method is based on a hierarchical classification searching a certain (small) number of variants of the classification tree splitting at each step.
4. The specific data on the preschoolers of Ekaterinburg demonstrates that children considerably gain in health thanks to such measures as parents’ giving up smoking in their child’s presence, providing a child with sufficient physical activity, and drinking quality potable water by a family. All these factors are classified as modifiable (could be altered at a child’s parents’ wish) and may be eliminated in contrast to such nonmodifiable environmental factor as atmospheric air pollution in a big industrial city.
Following the listed recommendations may reduce incidence of musculoskeletal diseases with the preschoolers of Ekaterinburg from W1=24,5% to W0=10,8% (RR=2,27; 95%-CI for RR from 1,02 to 6,06).
This study has been aided by the Research Programme “Fundamental Science for Medicine” of the Project of Ural Department of Russian Academy of Sciences, Project No.12–Ï–2-1033.
1. Beaglehole R., Bonita R., Kjellstrom T. (1994). Basic epidemiology. - Teacher's guide / - 2nd ed. Geneva, Switzerland : World Health Organization.
2. Fletcher R., Fletcher S., Wagner E. (1996). Clinical epidemiology: The essentials. Williams & Wilkins.
3. Breiman L., Friedman J., Stone C., Olshen R.A. (1984). Classification and Regression Trees. Chapman and Hall.
4. Konstantinova E.D., Varaksin A.N., Zhivoderov A.A., Zhovner I.V. (2007). Ecological and social factors and children's health in the industrial city // Ural Medical Journal (Ekaterinburg). ¹ 11(39). P.48-52. (in Russian).
5. Konstantinova E.D., Varaksin A.N. (2009). Classification trees method in problems of the risk factors influence on children health // Ecological Systems and Devices (Moscow). ¹ 10. P.23-28. (in Russian).
6. Konstantinova E.D., Varaksin A.N. (2010). Development of methods to find factors compensating health impact from adverse environmental pollution // Ecological Systems and Devices (Moscow). ¹ 5. P.35-38. (in Russian).
7. Konstantinova E.D., Varaksin A.N. Elaboration and application of a new hierarchical classification algorithm in epidemiological research // 23rd annual Conference of International Society for Environmental Epidemiology. Barcelona (Spain), 13-16 September 2011. Abstract ¹ 00389.