Ережепова  А.К. - магистрант, Костанайский Государственный Университет имени А. Байтурсынова, г. Костанай.

Иванова И.В. – кандидат педагогических наук, старший преподаватель кафедры программного обеспечения, Костанайский Государственный Университет имени А.Байтурсынова, г.Костанай.

 

Yerezhepova A.K.  - Undergraduate, Kostanay State University named A.Baitursynov, Kostanay.

Ivanova IV - Ph.D., senior lecturer in software Kostanay State University A.Baitursynov, Kostanay.

 

Voice quality in IP-telephone network

This article examines methods for assessing voice quality in IP-telephone network, in order to minimize the cost of re-evaluation, the impact of changing factors. The choice of method of quality assessment was proofed. Comparative characteristics of these methods were compiled. Parameters measured at the approach of each of the methods were analyzed.

 Keywords: IP-telephone, MOS, PESQ, R-Factor, E-model.

The parameter that is associated with the terminal and affects the quality of services based on QoS techniques for link-layer OSI model, provides the codec type and delay, and the parameters are associated with the network packet loss, delay and delay’s variation. This means that for a certain level of quality should be regarded as a terminal and a network complex, presenting them to each particular set of requirements.

The total delay consists of a delay’s coding and packetization speech signal routing network delays, the signal propagation delay and the delay associated with the buffer capacity and the fact that jitter is introduced network, but it is compensated by the terminal, it can be concluded that the terminal delay is a constant and the network delay is a function of distance and routing points. Consequently, there is a task to choose the method that in its assessment covers the factors that affect the transmitted speech all the way from the speaker to the listener. It is also necessary to determine the method that when you change the quality settings, it will re-evaluate the changes, making adjustments to an existing estimate.

 

For each of the QoS classes defined quality requirements imposed on the duration of the delay from end to end:

 

- 4 "Supreme" the delay to 10 ms;

- 3 "High" the delay to 100ms;

- 2 "Medium" the delay to 150 ms;

 - 1 "Available" the delay to 400ms.

 

The time of connection establishment defined as the delay time after dialing. Basic requirements to the delay time after a set defined in ITU-T E.721:

 

- Local call <3 ms;

- Long-distance call to <5 ms;

- International call <8 ms;

There are three classes of networks, which take into account the delay variation, packet loss, but does not take into account the propagation delay and routing delay.

 

Table 1

 

Class

 

Packet Loss

Delay’s variation

 

I

 

0.5%

to 10 ms

 

II

 

1%

to 20 ms

 

III

2%

to 40 ms

 

 

 

In IP networks evaluation of the quality of services must be considered that the requirements for network performance from the application data and applications associated with the delivery of voice, differ significantly. Many methods for assessing voice quality were created in IP-telephone networks. Each of these has the differences in algorithms, estimation parameters for the assessment and the actual scale. In this regard, you need to analyze approaches to the evaluation of different types of methods to compare them. And to determine what method evaluates considering the influence that has the network and the terminal.

 

Subjective methods of quality assessment based on the statistical processing of the results in a certain large number of listeners experts. These estimates depend essentially on the age and sex of the speaker, uttering phrases speed and other circumstances. Tests on receipt of subjective evaluations carried out with an imitation of real conditions, such as background noise, background speech of others, and so on. The quantitative results of these tests show the average quality level of the listener's efforts, the intelligibility and naturalness of sound.

 

The most widely used method of subjective assessment of the quality described in ITU R.800 and is known as a technique of MOS. According to this speech quality obtained by passing the signal from the speaker (source) via the system connection to the listener (receiver), is estimated as an arithmetic average of all evaluations, the experts put up after listening to the test transmission path.

 

The basis of the objective method laid called E-model, which is connected to the measurement terminals and network characteristics. After creating the E-model, we conducted a large number of tests, in which the level of exposure varied network distorting factors. Data from these tests were used in the E-model to calculate the objective assessments. The result of the calculation in accordance with the E-model is a number called the R-factor ("ranking factor").

E-model is the multicriterial evaluation of speech quality in the IP networks and R-factor and it is determined in accordance with its value ranging from 0 to 100, where 100 represents the highest level of quality. However, the theoretical value R-factor is reduced from 100 to 93.2, which corresponds to an estimate MOS, equal to 4.4.

In practice, the value of R-factor varies from 0 to 93.2, corresponding to a change MOS rating from 1 to 4.4. The value of the R-factor determined by the following formula: R = Ro - Is - Id - Ie + A,

where: Ro = 93,2 - the original value of R-factor;

Is – the distortions introduced by the codec and the noise in the channel;

Id – the distortions due to the total delay through ("end-to-end") on the network;

Ie – the distortions introduced equipment, including packet loss;

A – the factor so-called advantages.

Psychoacoustic quality assessment methods consider characteristics of human perception of sound in general and in particular voice. Feature of these methods is that only the estimated subjective quality of a signal using hardware and software. Thus they are more relevant to the objective methods, but built on the basis of the features of the subjective perception of the sound by person.

 

The objective of any method of assessing the quality of the speech signal in order is to achieve a high degree of correlation with the subjective-statistical tests, which are still the most accurate assessment of voice quality.

Most methods based on comparison of the original and coded signals using a psychoacoustic model. It assesses the degree of visibility distortions in the encoded signal for the person. Psychoacoustic model is a model that converts the audio signal in its internal representation in terms of the human auditory system, and is compared with the internal representation of the original signal.

The most common is the estimation of the PESQ, as defined in ITU-T P.862 recommendation. It is an objective method of determining the quality of voice communications in the telephone system, which predicts the results of subjective assessment of the quality of this type of communication listener’s experts. To determine the speech quality in comparison PESQ provided input, or reference, signal distorted version of its output communication system.

The result of the comparison of the input and output signals is a communication quality evaluation, which is similar to the average subjective evaluation of MOS. Next PESQ evaluation results are calibrated using a huge database MOS estimates.

To compare the above described methods, you must set the parameters affecting the level of voice distortion, the naturalness of its sound and the delay introduced by the network and the terminal. Therefore, to compare the quality of assessment methods were chosen parameters, giving an assessment which can be attributed provided voice services, as well as terminal and network settings to a specific quality class.

 

The main quality characteristics were selected:

- The total delay of the transmission of voice information between subscribers;

 - While establishing a connection;

- The probability of packet loss;

- The level of voice distortion;

- Absence or presence of echo;

 - Distortions introduced by codec. The main quantitative characteristics of the following parameters were selected:

 - An overall assessment of the transmission quality;

- Speech quality perceived by the listener;

- A delay from end to end.

 

MOS Methodology evaluates the absence or presence of echo, voice distortion, and delay from end to end; an overall assessment of the quality of speech is a subjective assessment of experts. This assessment is formed as an arithmetic average, where the main evaluation parameters are: clarity, natural sounding voices and the level of effort the listener. This technique is not applicable for the comparison of mathematical models and as a result, makes it impossible to identify the impact of a single factor.

 

From this viewpoint, we can consider the E-model method and PESQ. E-model provides a description of almost all of the selected parameters, in addition to evaluation of speech quality directly to the listener. This method evaluates the distortion introduced by the network terminals and each one individually. In calculating R-factors considered parameters 20, where in the main parameters are:

- Unidirectional delay;

- Rate of packet loss;

- Loss of data due to the overflow of the jitter buffer;

 - Distortion introduced by the conversion of the analog signal into a digital compression and subsequent (signal processing codec);

- The effect of the echo;

- The total end delay (end to end);

- Distortion introduced by equipments.

PESQ The rating takes into account the following factors:

- Distortion in the coding of the signal;

 - Transmission errors;

- Loss of product lines;

 - The packet delay time and fluctuation of the time;

- Filtering of signals in analogue network components.

But at the same time in its assessment does not include some of the factors that depend on the network parameters and perception of speech:

- Changes in the level of network signal;

- The presence of the echo signal;

 - Circular delay.

It follows that similar to the PESQ score E-model, but yields an estimate fewer network factors.

Table 2

 

 

Factors

MOS

E-Model

PESQ

 

The total delay of the transmission of voice information between subscribers

 

-

+

-

The connection time

-

 

+

-

The probability of packet loss

-

+

+

The level of distortion of voice

+

+

+

The absence or presence of echo

+

+

-

Distortions introduced by codec

-

+

+

Overall transmission quality

+

+

+

The speech quality perceived by the listener

+

-

-

Delay from end to end

 

-

 

+

-

 

 

 

Thus the method of estimating MOS gives a clear assessment of the quality, but it is impossible to detect the specific parameters that do not match the characteristics of network’s classes. And when you change network settings, terminals, coding, and so on. And needed to redial the group of experts, which is a time-consuming process. Valuation techniques using the E-model and more accurately PESQ voice quality associated with the classes of services and networks indicate the specific parameters that can be improved. Their evaluation depends on the codec used in the tests, so when changing the codec quality assessment must be repeated. From the standpoint of E-used model parameters is more accurate method of evaluating the quality of transmitted speech.