Espolov B

Espolov B.K., Maulenov K.S., Zharlykasov B.J.

A. Baitursynov Kostanai State University, Kazakhstan

Software model Open Computing Language

OpenCL is worth considering after realizing what a hardware and software architecture CUDA, because CUDA - this is a de facto standard, and OpenCL - it is still a new standard that is designed to develop a variety of applications for heterogeneous systems.

In OpenCL programming model based on the concept of core (kernel). Core - a function that will be executed in parallel on the accelerator certain number of threads. Kernel programmer defined as functions on some extension of the C language. Creating core is divided into several stages, each of which corresponds to the appropriate function call OpenCL on the CPU. In the beginning you want to convert the source code of one or more nuclei, given as a string array in OpenCL-program, then the resulting program should be compiled for your device [1]. After compiling OpenCL - program from it can be obtained kernel functions corresponding to its source code.

To run the kernel on the accelerator you must specify its arguments. Core arguments can be either scalar values and address buffers in memory accelerator. Since, in general, the accelerator has no direct access to the memory of CPU cores are working with addresses in the memory of the accelerator. OpenCL provides a mechanism for allocating memory buffers on the accelerator and the exchange of data between memory and CPU accelerator.

Run the desired engine is carried out by calling the library OpenCL on the CPU. Accelerator, which will run the kernel, is given command queuing, which passed a request to perform kernel.

Platform model gives a high level description of a heterogeneous system.

The central element of this model is the notion of the host - the primary device that controls the OpenCL - performs all calculations and user interaction. Host is always present in a single copy, while OpenCL - devices that run OpenCL - instructions can be presented in the plural. OpenCL - the device can be CPU, GPU, DSP or any other processor in the system, the system supports the establishment of OpenCL - drivers.

OpenCL - the device is logically divided into a model to compute units [2, 3], which in turn are divided into processing elements. Calculations on OpenCL - devices actually occur in the processing elements. In Figure 1 schematically depicted OpenCL - of the platform 3 devices.

Описание: Схема платформы OpenCL

Figure 1. Logical device OpenCL

Execution model describes an abstract representation of how instruction streams are executed in a heterogeneous system.

The kernel is created in the host program and then using a special command is queued for execution in one of the OpenCL devices. During the execution of said instruction OpenCL Runtime System creates integer index space, each element of which is called the global identifier (global ID). Each instance of the kernel is done separately for each value of a global identifier . Engine instance is called work-item. Thus, each work-item is uniquely identified by its global identifier .

A plurality of work-item is divided into groups . Such a group is called a work-group. With each work-group referenced by its unique identifier (work-group ID). All work-item in a work-group uniquely identified within its group number : local ID. Thus, each work-item is defined as by a unique global ID or by a combination of work-group ID and local ID within their group .

Figure 2. Example two-dimensional NDRange

All work-item within a work-group are executed in parallel processing elements on a single computing unit OpenCL-device. This is guaranteed by the standard, while absolutely no guarantee that a few work-item from different groups are performed in parallel. This important property of parallelism should always remember when developing OpenCL-programs. Space indexes N-dimensionally and commonly called NDRange. In the case of the standard version OpenCL 1.1 dimension N takes the values 1,2 or 3. Thus, the grid coordinates global ID and local ID N-dimension, ie, N coordinates are determined.

An illustrative example of a two-dimensional NDRange, shown in Figure 2, where G_x и G_y - by global identifiers, W_x и W_y - number of groups, а L_x и L_y - number of local identifiers in NDRange. Another important concept model of computation is the context, the definition of which (by calling special functions OpenCL API) is the first task OpenCL-application. Context determines the runtime cores, which includes the following components: the device core, program objects (program objects, source and executable code for future cores), memory objects.

If you draw a line between the CUDA and OpenCL and ask the obvious question that plagues many novice developers. And what actually better CUDA or OpenCL and answers may not be quite so straightforward. For starters, you can conduct a test, based on the simulation will be based on the evolution of the particle system. The result will be displayed as a number of FPS (frames per 1 second), but in this case the display will be turned off and FPS, will mean the number of iterations in one second. Thus the result is obtained as shown in Table 1.

Table 1. Comparative analysis

number of particles	OpenCL FPS	CUDA FPS
2048	2147.2	2698.9
4096	883.3	1172.33
8192	348.21	501.67
16384	117.1	183.14

Before you jump to conclusions directly test, it should be noted that the test was done using the NVidia GeForce GTX 650Ti with updated drivers (relevance February 2014 - not a beta version), the characteristics of the graphics card have been exposed to a minimum.

Now directly to the results. This test was won by CUDA, which was not surprising, since CUDA is by far the best product in this area, but this does not mean that you should not use OpenCL. Already at this stage it is worth noting that because OpenCL also shows good results, and this can only mean one thing - the leader CUDA present opponent (in fact he is not one), which seeks to become a leader in this sector. And in the future, these aspirations must bring new fruit development as another performance boost.

Литература:

1. Munshi A., Gaster B., Mattson T, Fung J., Ginsburg D., OpenCL Programming Guide, Издательство: Addison-Wesley Professional, 2011 г. – 648 с.

2. Spinellis D., Code Reading: The Open Source Perspective, Издательство: Вильямс, 2004 г. – 528 с.

3. Кормен Т. X., Лейзерсон Ч. И., Ривест Р. Л., Штайн К. Алгоритмы: построение и анализ, 2-е издание.-М.: Изд.дом "Вильямс", 2005 г. – 736 с.