IMPLEMENTATION  PARALLEL COMPUTING IN A  MULTIPROCESSOR SYSTEM

Zhoranova N. Zh., Tazhibai A

 

  Many tasks require a lot of computing operations, which occupy considerable resources even modern technology, moreover, can be read with confidence that no matter what the velocity no reached computer technology will always be problems to be solved which took considerable time. Many of these complex tasks require that the result was obtained for the least possible time, or even strictly limited. These problems, for example, include weather forecasting, image processing and recognition of images in the management of technology. On the other hand represents a great technical problem of reducing the time of execution of each operation in the microprocessor.

   The obvious way to increase computing speed would be the use of not just one computing device, and several working together to solve a problem. This approach is known as parallel computing. Despite the seeming simplicity of the solution, it is sometimes very trivial task for the design of computer technology and the development of algorithms. The first problem lies in the fact that for the task can be solved with the help of parallel computing algorithm for its solution should allow parallelization moreover, not every problem can be solved by a parallel algorithm. The other, equally important problem is the construction of the system, which has been the implementation of parallel computing, would be possible.

   There are two basic approaches to parallel computing in microprocessor systems, called single-threaded and multithreaded parallelism. The difference lies in the use of one or more execution threads for parallel computing.

  Single-threaded parallelism - is parallel execution of operations within a single thread of execution. Possibility of single-threaded parallelism determined by architecture the microprocessors, namely its ability to be read from the memory and perform multiple operations.

  Multithreaded parallelism - the use of multiple threads to achieve parallel execution of operations. In order to ensure multi-threaded parallelism need to create a system with multiple processors or processor cores.

Principles of organization of single-threaded and multithreaded parallelism, their features, advantages and disadvantages are very different and have little in common in the implementation of the computer system, and in building software.

Single-threaded parallelism

As already described above, a single-threaded parallelism is implemented within a single thread of execution. Possibility of single-threaded parallel execution is almost entirely on the microprocessor architecture - it determines the ability to perform parallel microprocessor instructions.

Single-threaded parallelism possesses with its advantages and disadvantages.

Advantages:

·                    Lack of need of synchronization - all the operations are performed within a single stream, and therefore in a strictly defined sequence.

·                    Lack of need of support parallelism at the operating system level.

·                    Lack of need for shared management funds (arbitration).

Disadvantages:

·                     Difficulty use algorithms with conditional transitions.

·                     Need to adapt the program to effectively use the resources of the microprocessor, for example, when switching from one model to another.

There are the following methods to achieve parallel computing:

·                     Data of package for batch processing unit instructions using special techniques. For example, it is possible to add two pairs of 8-bit data, 16-bit operation, eliminating the possibility of overflow. The method has very limited application.

·                    Superscalar architecture. Microprocessor control unit independently analyzes the flow of instructions for the possibility of parallel execution and manages several functional units.

Vector processing. The microprocessor has instructions that produce the same type of group operations. Multiples operands are packed in one vector register. This method is similar to the first, but to ensure parallelism lies in the microprocessor architecture. Vector registers tend to have a greater capacity. Required to adapt the program to use vector instructions or use an optimizing compiler.

·                    Microprocessor with explicit parallelism. The method is similar to the second, but the program for this processor contains explicit guidance on what operations should be carried out in parallel. Parallelization of computations in this case the sole responsibility of the programmer or optimizing compiler.

Therefore, special technologies for creating multiprocessor systems have been developed. This allowed to process data in parallel, and, therefore, faster. Respectively, created the operating system supports multi-processor technology. Such as: Solaris (Sun Microsystems), Unix-like OS: Irix (SGI), AIX (IBM); Linux Red Hat; Windows XP. Consider the Solaris operating system version 2.4. Solaris 2.4 - is a Unix-like system developed by Sun Microsystems.

In the Solaris 2.4 operating system, there is such a thing as a stream. The stream (thread) - a sequence of instructions executed within the context of the process. This operating system supports multi-threaded processes. The word "multi-threaded" means the set of managed content flows. The traditional UNIX process contains a controlled flow. Multithreading in turn comprises a plurality of threads that run independently. Because each thread is executed independently of the program code parallelization leads to:

·                    improved sensitivity applications

·                    using multiprocessor more efficiently

·                    improve the structure of the program,

·                    use less system resources,

·                    improving the presentation.

Definition of dimensional and parallel execution. Simultaneity occurs when at least two streams in the process at one time. Parallelism occurs when at least two threads are executed concurrently. In a multithreaded process on a single processor, the processor can allocate resources between threads, resulting in concurrent execution. In a similar process on a multithreaded shared memory in a multiprocessor system, each thread in the process can be run on a separate processor at the same time, the result is a parallel execution. When the process has the same thread, or less, and many processors, the threads support system and operational system "confident" that each thread is executed on your processor.

The streams are only visible inside the process where they share the process resources like address space, open files, and so on. D. Each thread has a unique ID, a register stack, the mask priority. Because flows divide the process instructions and most of the data, changing data by one thread can be seen by other threads in the process. When a thread needs to interact with other process streams, it can do so without involving the operating system. The streams are the main programming interface in multithreaded programming. Streams user levels are managed in user space and can therefore prohibit kernel context switching resources. An application can have thousands of flows and not consume much resources of the nucleus. The number of kernel resources consumed by the application, is largely determined by the application. "Lightweight" Default streams. For more control over the flow, the application may limit the flows. When the application restricts the flow of access to resources, the flow becomes a core resource. Options for streaming, such as thr_create (....), Thr_self (), thr_join (....), And so on  are described in libthread library. thr_create function creates a thread, depending on the set parameters. thr_join function combines flows in a parallel process. thr_self function returns the thread number in the process.

    Using a larger number of processors speeds up the work program and greatly complicates the work of programmers. Therefore, when working with large amounts of data make better use of multi-processor systems.

   Thus, according to the observer, which was later called "Moore's Law", about 2000-2001, the number of transistors on a chip doubles every two years, with the growing and clock frequency. But since the beginning of the two thousandth's production technology has changed. The development of affordable server processors went by increasing the clock frequency, and in the direction of adding processors. Now, not only increases the amount of CPU, but also the number of their nuclei. This happened because now the development of crystals is utilized nanotechnology and there are physical limitations already defined, in particular, the size of the atomic lattice.

Today, parallel computing have been using a small number of programs. This is due primarily to the fact that the development of applications, using parallels computations, more complex and more time consuming than with traditional sequential computing.

Modern DBMS use multithreaded schemes of work, and at first glance it seems that parallel computing is provided automatically. Each user corresponds to a compound that runs in a separate thread.

Users can send their requests to the database are absolutely parallel. The problem is that not all requests (requests implied by including the change operations) may be processed in parallel. This is due to the logic and the problems associated with dirty "reading. In MSSQL inefficiently configured locking mechanism does not allow all users to work as much as possible in parallel with the system. This is only possible for systems in which all requests are read-only. Also it is necessary to allocate separate regulatory procedures with long execution time. In some cases, they may be well parallelized and thus accelerated, but for this they must be appropriately programmed. Otherwise, the routine procedures to be executed in one thread and use the limited server resources and, accordingly, will be run much longer.

You can give an example for the simulation tasks. For example, the problem of calculating the trajectory of interacting physical bodies. Such problems are solved, as a rule, by computer modeling. It is assumed that, if the time interval divided into small pieces, it is possible to iteratively calculate new coordinate for each of the bodies. The time interval is less, the accuracy of calculations is higher. Previously, single-processor servers were meaningless data parallelize problem. In the cycle corresponding to the set of all bodies was calculated for each coordinate. You also need a pool of sessions 1C and coordinator tasks that will give each session 1C your document to hold. Once the session holding the document and released, the coordinator gives her a new holding on. You may ask, what about the sequence and exact chronology? Coordinator is responsible for this lock. If the document is associated with ongoing at this time by other documents at the data level, for example, the nomenclature or the client, and then it will wait for their turn. If the document is not connected, it will be carried out in parallel. Each individual processor is parallel to the handle holding his instrument, thereby will accelerate. In this case, if the data are not closely related to each other, the acceleration can be very significant, sometimes 10s times. Of course, this depends on the server resources. On the weak acceleration servers will be negligible.

Now, many languages support multithreading and the possibility of parallel computing. But, in my opinion, it is made inconvenient for developers. The complexity of the design covers the advantages of scalability features of the system.

Literature:

1.Nemnyugin S. Stesik O. Parallel Programming for multiprocessor systems. SPb .: BHV-Peterburg, 2002. 400p.

2. Antonov AS Parallel programming using MPI technology. M., from the Moscow State University in 2004

3. Zhangisina GD Parallel computing and multiprocessor computers in solving scientific and engineering problems. Almaty: ATU 2006-1 copy.