Sergiy Illichevskyy

Postgraduate student of the Taras Shevchenko National University of Kyiv, Ukraine

Bayesian Network as a Tool for Modeling of Insurance Companies

Bayesian networks (BNs) became extremely popular models in the last decade. They have been used for applications in various areas, such as machine learning, text mining, natural language processing, speech recognition, signal processing, bioinformatics, error-control codes, medical diagnosis, weather forecasting, and cellular networks. The name BNs might be misleading. Although the use of Bayesian statistics in conjunction with BN provides an efficient approach for avoiding data over-fitting, the use of BN models does not necessarily imply a commitment to Bayesian statistics. In fact, practitioners often follow frequentists’ methods to estimate the parameters of the BN.

On the other hand, in a general form of the graph, the nodes can represent not only random variables but also hypotheses, beliefs, and latent variables. Such a structure is intuitively appealing and convenient for the representation of both causal and probabilistic semantics. This structure is ideal for combining prior knowledge, which often comes in causal form, and observed data. BN can be used, even in the case of missing data, to learn the causal relationships and gain an understanding of the various problem domains and to predict future events.

To compensate for zero occurrences of some sequences in the training dataset, one can use appropriate (mixtures of) conjugate prior distributions, prior for the multinomial case as in the above backache example or the prior for the Gaussian case. Such an approach results in a maximum a posteriori estimate and is also known as the equivalent sample size (ESS) method. In general, the other learning cases are computationally intractable. In the second case with known structure and partial observability, one can use the EM (expectation maximization) algorithm to find a locally optimal maximum-likelihood estimate of the parameters. MCMC is an alternative approach that has been used to estimate the parameters of the BN model. In the third case, the goal is to learn a DAG that best explains the data. This is an NP-hard problem, since the number of DAGs on N variables is superexponential in N. One approach is to proceed with the simplest assumption that the variables are conditionally independent given a class, which is represented by a single common parent node to all the variable nodes.

This structure corresponds to the naive BN, which surprisingly is found to provide reasonably good results in some practical problems. To compute the Bayesian score in the fourth case with partial observability and unknown graph structure, one has to marginalize out the hidden nodes as well as the parameters. Since this is usually intractable, it is common to use an asymptotic approximation to the posterior called Bayesian information criterion (BIC) also known as the minimum description length (MDL) approach. In this case one considers the trade-off effects between the likelihood term  nd a penalty term associated with the model complexity. An alternative approach is to conduct local search steps inside of the M step of the EM algorithm, known as structural EM, that presumably converges to a local maximum of the BIC score.

It is well known that classic machine learning methods like Hidden Markov models (HMMs), neural networks, and Kalman filters can be considered as special cases of BNs. Specific types of BN models were developed to address stochastic processes, known as dynamic BN, and counterfactual information, known as functional BN.

The authors introduce the variable-order Bayesian network (VOBN) model as an extension of the position weight matrix (PWM) model, the fixed-order Markov model (MM) including HMMs, the variable order Markov (VOM) model, and the BN model. The PWM model is presumably the simplest and the most common context-independent model for DNA sequence classification. The basic assumption of the PWM model is that the random variables (e.g., nucleotides at different positions of the sequence) are statistically independent. Since this model has no memory it can be regarded as a fixed-order MM of order 0. In contrast, higher fixed-order models, such as MMs, HMMs, and interpolated MMs, rely on the statistical dependencies within the data to indicate repeating motifs in the sequence. VOM models stand in between the above two types of models with respect to the number of model parameters. In fact, VOM models do not ignore statistical dependencies between variables in the sequence, yet, they take into account only those dependencies that are statistically significant. In contrast to fixed-order MMs, where the order is the same for all positions and for all contexts, in VOM models the order may vary for each position, based on its contexts. Unlike the VOM models, which are homogeneous and which allow statistical dependences only between adjacent variables in the sequence, VOBN models are inhomogeneous and allow statistical dependences between nonadjacent positions in a manner similar to BN models.

Yet, as opposed to BN models, where the order of the model at a given node depends only on the size of the set of its parents, in VOBN models the order also depends on the context, i.e. on the specific observed realization in each set of parents. As a result, the number of parameters that need to be estimated in VOBN models is potentially smaller than in BN models, yielding a smaller chance for overfitting of the VOBN model to the training dataset. Context-specific BNs are closely related to, yet constructed differently from, the VOBN models. To summarize, the VOBN model can be regarded as an extension of PWM, fixed-order Markov. If statistical dependencies exist only between adjacent positions in the sequence and the memory length is identical for all contexts, the VOBN model degenerates to an inhomogeneous fixed-order MM.

References

1.                 Boutilier, C., Friedman, N., Goldszmidt, M. & Koller, D. (1996). Context specific independence in Bayesian networks, in Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Portland, August1–4 1996, pp.115–123.

2.                 Friedman,N.,Geiger,D.&Goldszmidt,M.(1997). Bayesian network classifiers, Machine Learning 29,131–163.

3.                 Spirtes,P.,Glymour,C.&Schienes,R.(1993). Causation Prediction and Search, SpringerVerlag, NewYork.