Ережепова А.К. - магистрант, Костанайский
Государственный Университет имени А. Байтурсынова, г. Костанай.
Иванова И.В. –
кандидат педагогических наук, старший преподаватель кафедры программного
обеспечения, Костанайский Государственный Университет имени А.Байтурсынова,
г.Костанай.
Yerezhepova A.K. -
Undergraduate, Kostanay State University named A.Baitursynov, Kostanay.
Ivanova
IV - Ph.D., senior lecturer in software Kostanay State University
A.Baitursynov, Kostanay.
Voice quality in IP-telephone network
This article examines methods for assessing voice
quality in IP-telephone network, in order to minimize the cost of
re-evaluation, the impact of changing factors. The choice of method of quality
assessment was proofed. Comparative characteristics of these methods were
compiled. Parameters measured at the approach of each of the methods were
analyzed.
Keywords:
IP-telephone, MOS, PESQ, R-Factor, E-model.
The parameter that is associated with the terminal and
affects the quality of services based on QoS techniques for link-layer OSI
model, provides the codec type and delay, and the parameters are associated
with the network packet loss, delay and delay’s variation. This means that for
a certain level of quality should be regarded as a terminal and a network
complex, presenting them to each particular set of requirements.
The total delay consists of a delay’s coding and
packetization speech signal routing network delays, the signal propagation
delay and the delay associated with the buffer capacity and the fact that
jitter is introduced network, but it is compensated by the terminal, it can be
concluded that the terminal delay is a constant and the network delay is a
function of distance and routing points. Consequently, there is a task to
choose the method that in its assessment covers the factors that affect the
transmitted speech all the way from the speaker to the listener. It is also
necessary to determine the method that when you change the quality settings, it
will re-evaluate the changes, making adjustments to an existing estimate.
For each of the QoS classes defined quality
requirements imposed on the duration of the delay from end to end:
- 4 "Supreme" the delay to 10 ms;
- 3 "High" the delay to 100ms;
- 2 "Medium" the delay to 150 ms;
- 1 "Available" the delay to 400ms.
The time of connection establishment defined as the
delay time after dialing. Basic requirements to the delay time after a set defined
in ITU-T E.721:
- Local call <3 ms;
- Long-distance call to <5 ms;
- International call <8 ms;
There are three classes of networks, which take into
account the delay variation, packet loss, but does not take into account the
propagation delay and routing delay.
Table 1
|
Class |
Packet Loss |
Delay’s variation |
|
I |
0.5% |
to 10 ms |
|
II |
1% |
to 20 ms |
|
III |
2% |
to 40 ms |
In IP networks evaluation of the quality of services
must be considered that the requirements for network performance from the
application data and applications associated with the delivery of voice, differ
significantly. Many methods for assessing voice quality were created in IP-telephone
networks. Each of these has the differences in algorithms, estimation
parameters for the assessment and the actual scale. In this regard, you need to
analyze approaches to the evaluation of different types of methods to compare
them. And to determine what method evaluates considering the influence that has
the network and the terminal.
Subjective methods of quality assessment based on the
statistical processing of the results in a certain large number of listeners
experts. These estimates depend essentially on the age and sex of the speaker,
uttering phrases speed and other circumstances. Tests on receipt of subjective
evaluations carried out with an imitation of real conditions, such as
background noise, background speech of others, and so on. The quantitative
results of these tests show the average quality level of the listener's
efforts, the intelligibility and naturalness of sound.
The most widely used method of subjective assessment
of the quality described in ITU R.800 and is known as a technique of MOS.
According to this speech quality obtained by passing the signal from the
speaker (source) via the system connection to the listener (receiver), is
estimated as an arithmetic average of all evaluations, the experts put up after
listening to the test transmission path.
The basis of the objective method laid called E-model,
which is connected to the measurement terminals and network characteristics.
After creating the E-model, we conducted a large number of tests, in which the
level of exposure varied network distorting factors. Data from these tests were
used in the E-model to calculate the objective assessments. The result of the
calculation in accordance with the E-model is a number called the R-factor
("ranking factor").
E-model is the multicriterial evaluation of speech quality in the IP networks
and R-factor and it is determined in accordance with its value ranging from 0
to 100, where 100 represents the highest level of quality. However, the
theoretical value R-factor is reduced from 100 to 93.2, which corresponds to an
estimate MOS, equal to 4.4.
In practice, the value of R-factor varies from 0 to
93.2, corresponding to a change MOS rating from 1 to 4.4. The value of the
R-factor determined by the following formula: R = Ro - Is - Id - Ie + A,
where: Ro = 93,2 - the original value of R-factor;
Is – the distortions introduced by the codec and the
noise in the channel;
Id – the distortions due to the total delay through
("end-to-end") on the network;
Ie – the distortions introduced equipment, including
packet loss;
A – the factor so-called advantages.
Psychoacoustic quality assessment methods consider
characteristics of human perception of sound in general and in particular
voice. Feature of these methods is that only the estimated subjective quality
of a signal using hardware and software. Thus they are more relevant to the
objective methods, but built on the basis of the features of the subjective
perception of the sound by person.
The objective of any method of assessing the quality
of the speech signal in order is to achieve a high degree of correlation with
the subjective-statistical tests, which are still the most accurate assessment
of voice quality.
Most methods based on comparison of the original and
coded signals using a psychoacoustic model. It assesses the degree of
visibility distortions in the encoded signal for the person. Psychoacoustic
model is a model that converts the audio signal in its internal representation
in terms of the human auditory system, and is compared with the internal
representation of the original signal.
The most common is the estimation of the PESQ, as
defined in ITU-T P.862 recommendation. It is an objective method of determining
the quality of voice communications in the telephone system, which predicts the
results of subjective assessment of the quality of this type of communication
listener’s experts. To determine the speech quality in comparison PESQ provided
input, or reference, signal distorted version of its output communication system.
The result of the comparison of the input and output
signals is a communication quality evaluation, which is similar to the average
subjective evaluation of MOS. Next PESQ evaluation results are calibrated using
a huge database MOS estimates.
To compare the above described methods, you must set
the parameters affecting the level of voice distortion, the naturalness of its
sound and the delay introduced by the network and the terminal. Therefore, to
compare the quality of assessment methods were chosen parameters, giving an
assessment which can be attributed provided voice services, as well as terminal
and network settings to a specific quality class.
The main quality characteristics were selected:
- The total delay of the transmission of voice
information between subscribers;
- While establishing a connection;
- The probability of packet loss;
- The level of voice distortion;
- Absence or presence of echo;
- Distortions introduced by codec. The main
quantitative characteristics of the following parameters were selected:
- An overall assessment of the transmission
quality;
- Speech quality perceived by the listener;
- A delay from end to end.
MOS Methodology evaluates the absence or presence of
echo, voice distortion, and delay from end to end; an overall assessment of the
quality of speech is a subjective assessment of experts. This assessment is
formed as an arithmetic average, where the main evaluation parameters are:
clarity, natural sounding voices and the level of effort the listener. This
technique is not applicable for the comparison of mathematical models and as a
result, makes it impossible to identify the impact of a single factor.
From this viewpoint, we can consider the E-model
method and PESQ. E-model provides a description of almost all of the selected
parameters, in addition to evaluation of speech quality directly to the
listener. This method evaluates the distortion introduced by the network
terminals and each one individually. In calculating R-factors considered
parameters 20, where in the main parameters are:
- Unidirectional delay;
- Rate of packet loss;
- Loss of data due to the overflow of the jitter
buffer;
- Distortion introduced by the conversion of the
analog signal into a digital compression and subsequent (signal processing
codec);
- The effect of the echo;
- The total end delay (end to end);
- Distortion introduced by equipments.
PESQ The rating takes into account the following
factors:
- Distortion in the coding of the signal;
- Transmission errors;
- Loss of product lines;
- The packet delay time and fluctuation of the
time;
- Filtering of signals in analogue network components.
But at the same time in its assessment does not
include some of the factors that depend on the network parameters and
perception of speech:
- Changes in the level of network signal;
- The presence of the echo signal;
- Circular delay.
It follows that similar to the PESQ score E-model, but
yields an estimate fewer network factors.
Table 2
|
Factors |
MOS |
E-Model |
PESQ |
|
The total delay of the
transmission of voice information between subscribers |
- |
+ |
- |
|
The connection time |
- |
+ |
- |
|
The probability of packet
loss |
- |
+ |
+ |
|
The level of distortion
of voice |
+ |
+ |
+ |
|
The absence or presence
of echo |
+ |
+ |
- |
|
Distortions introduced by
codec |
- |
+ |
+ |
|
Overall transmission
quality |
+ |
+ |
+ |
|
The speech quality
perceived by the listener |
+ |
- |
- |
|
Delay from end to end |
- |
+ |
- |
Thus the method of estimating MOS gives a clear
assessment of the quality, but it is impossible to detect the specific
parameters that do not match the characteristics of network’s classes. And when
you change network settings, terminals, coding, and so on. And needed to redial
the group of experts, which is a time-consuming process. Valuation techniques
using the E-model and more accurately PESQ voice quality associated with the
classes of services and networks indicate the specific parameters that can be improved.
Their evaluation depends on the codec used in the tests, so when changing the
codec quality assessment must be repeated. From the standpoint of E-used model
parameters is more accurate method of evaluating the quality of transmitted
speech.