Evgeny Faizrakhmanov

National Research Tomsk Polytechnic University, Russian Federation

The data integration in heterogeneous information systems

This article describes technologies used to create data integration systems. Data integration system of well testing between two large production systems is used as an example. Moreover, problems encountered during the system implementation are defined in details as well as solution approaches.

The aim of data integration technologies consists in an overcoming the numerous heterogeneity manifestations peculiar for information systems. The systems have a different functionality, use different data types, are based on different hardware platforms, have different data management tools, different middleware, data models, user interfaces, etc.

There are two alternatives for data integration: the old one – syntactical, the new one – semantical. The first one is based on the merge data formal resemblance, the second one on the content similarity [1].

An example of data integration in heterogeneous information systems is developed by TPU together with LLC "Siberian Center of High Technologies" data integration system of well test «Integra». The objective of this system is to transfer data from one database to another of two different industrial AIS.

A major problem in the process of integration between these systems is different platforms and design concepts used in creating these databases. The first database is based on SQL Server 2005 database, the second - on the basis of database Oracle 11g.

All data on the drillholes’ studies tied to three main contents – ‘Deposits’, ‘Geologic beds’ and ‘Drillholes’. Figure 1 shows the relation between these contents in the database ( Fig. 1). As can be seen from the figure, in the database on the SQL Server there is additional table ‘Hierarchical relationships’, which describes how the associated contents are interrelated. In the database based on Oracle tables relate directly with one another. To perform a correct synchronization it is necessary to compare these contents. For this purpose it is necessary to solve the problems:

1. Different relations between tables

2. The difference in the data formats

251659264

Fig. 1. The relation scheme

Due to the additional table in SQL Server we can be sure that even if we change the database scheme, we still can get data on drillholes and geological bed belonging to the same deposit. In the case of the database based on Oracle, this architecture does not allow this. To get around this, you need to check all the related table to table ‘deposit’, and if the required table ‘Beds’ or ‘Drillholes’ is not found, we have to check all related tables on the lower level . This is repeated until we reach the desired table. Thus, you can associate the deposits with drillholes and geological beds.

The second problem is solved significantly easier than the first one. The developed system performs data conversion according to the types specified in the documentation for the database data. It is decided to use the field ‘Name’ for the data comparison. The data introduction for this field is based on the general rule for both information systems, but with different registers. A data transfer system brings them into a single register, and if the data matches it compares their unique identifiers. Otherwise, it reports about the data that are not comparable and the operator is able to specify comparable object individually.

Thus, the data integration system was developed that allows integrating data between heterogeneous information systems. Moreover, the issue of automatic schema validation of matched data is solved. Due to a technical impossibility of semantic relations organization for the data scheme in the database based on Oracle 11g we used syntactic approach, which leads to additional difficulties in the implementation of the system.

The list of references:

1. L. S. Chernyak. Data integration: syntax and semantics // Open systems-2010. -URL: http://www.osp.ru/os/2010/06/11170978/

data-integration.htm

3. Data integration – URL: http://en.wikipedia.org/wiki/Data_integration

т различную производительность. Системы строятся на разных аппаратных платформах, имеют разные средства управления данными, разное ПО промежуточного слоя, модели данных, пользовательские интерфейсы и м