*99606*

Sergiy Illichevskyy

Postgraduate student of the Taras Shevchenko National University of Kyiv, Ukraine

The Modeling of the Insurance Company by Bayesian Networks

Bayesian networks (BNs), also known as belief networks (or Bayes nets for short), belong to the family of probabilistic graphical models (GMs). These graphical structures are used to represent knowledge about an uncertain domain. In particular, each node in the graph represents a random variable, while the edges between the nodes represent probabilistic dependencies among the corresponding random variables.

These conditional dependencies in the graph are often estimated by using known statistical and computational methods. Hence, BNs combine principles from graph theory, probability theory, computer science, and statistics. GMs with undirected edges are generally called Markov random fields or Markov networks. These networks provide a simple definition of independence between any two distinct nodes based on the concept of a Markov blanket. Markov networks are popular in fields such as statistical physics and computer vision. BNs correspond to another GM structure known as a directed acyclic graph (DAG) that is popular in the statistics, the machine learning, and the artificial intelligence societies. BNs are both mathematically rigorous and intuitively understandable. They enable an effective representation and computation of the joint probability distribution (JPD) over a set of random variables.

The structure of a DAG is defined by two sets: the set of nodes (vertices) and the set of directed edges. The nodes represent random variables and are drawn as circles labeled by the variable names. The edges represent direct dependence among the variables and are drawn by arrows between nodes. In particular, an edge from node Xi to node Xj represents a statistical dependence between the corresponding variables. Thus, the arrow indicates that a value taken by variable Xj depends on the value taken by variable Xi , or roughly speaking that variable Xi “influences” Xj . Node Xi is then referred to as a parent of Xj and, similarly, Xj is referred to as the child of Xi. An extension of these genealogical terms is often used to define the sets of “descendants” – the set of nodes that can be reached on a direct path from the node, or “ancestor” nodes – the set of nodes from which the node can be reached on a direct path. The structure of the acyclic graph guarantees that there is no node that can be its own ancestor or its own descendent. Such a condition is of vital importance to the factorization of the joint probability of a collection of nodes as seen below.

Note that although the arrows represent direct causal connection between the variables, the reasoning process can operate on BNs by propagating information in any direction. A BN reflects a simple conditional independence statement. Namely that each variable is independent of its non-descendents in the graph given the state of its parents. This property is used to reduce, sometimes significantly, the number of parameters that are required to characterize the JPD of the variables. This reduction provides an efficient way to compute the posterior probabilities given the evidence. In addition to the DAG structure, which is often considered as the “qualitative” part of the model, one needs to specify the “quantitative” parameters of the model. The parameters are described in a manner which is consistent with a Markovian property, where the conditional probability distribution (CPD) at each node depends only on its parents.

For discrete random variables, this conditional probability is often represented by a table, listing the local probability that a child node takes on each of the feasible values – for each combination of values of its parents. The joint distribution of a collection of variables can be determined uniquely by these local conditional probability tables (CPTs). Following the above discussion, a more formal definition of a BN can be given. A Bayesian network B is an annotated acyclic graph that represents a JPD over a set of random variables. The graph encodes independence assumptions, by which each variable Xi is independent of its nondescendents given its parents in G. The second component denotes the set of parameters of the network. For simplicity of representation we omit the subscript B henceforth. If Xi has no parents, its local probability distribution is said to be unconditional, otherwise it is conditional. If the variable represented by a node is observed, then the node is said to be an evidence node, otherwise the node is said to be hidden or latent. Consider the following example that illustrates some of the characteristics of BNs. The example shown in Figure 1 has a similar structure to the classical “earthquake” example in Pearl. It considers a person who might suffer from a back injury, an event represented by the variable Back (denoted by B). Such an injury can cause a backache, an event represented by the variable Ache (denoted by A).

The back injury might result from a wrong sport activity, represented by the variable Sport (denoted by S) or from new uncomfortable chairs installed at the person’s office, represented by the variable Chair (denoted by C). In the latter case, it is reasonable to assume that a co-worker will suffer and report a similar backache syndrome, an event represented by the variable Worker (denoted by W). All variables are binary; thus, they are either true (denoted by “T”) or false (denoted by “F”).

Such an approach results in a maximum a posteriori estimate and is also known as the equivalent sample size (ESS) method. In general, the other learning cases are computationally intractable. In the second case with known structure and partial observability, one can use the EM (expectation maximization) algorithm to find a locally optimal maximum-likelihood estimate of the parameters [4]. MCMC is an alternative approach that has been used to estimate the parameters.

References

1.                 Aksoy ,S. (2006). Parametric Models: Bayesian Belief Networks, Lecture Notes, Department of Computer Engineering Bilkent University, available at http:// www.cs.bilkent.edu.tr/saksoy/courses/cs551/slides/cs551 parametric4.pdf.

2.                 Boutilier, C., Friedman, N., Goldszmidt, M. & Koller, D. (1996). Context specific independence in Bayesian networks, in Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Portland, August1–4 1996, pp.115–123.

3.                 Friedman, N. & Goldszmidt, M. (1996).Learning Bayesian networks with local structure, in Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Portland, August1–4 1996. Artificial