Toward automatic music transcription: problems and solution

Korzh R.A.

Krivoy Rog national university, Ukraine

TOWARD AUTOMATIC MUSIC TRANSCRIPTION: PROBLEMS AND SOLUTION

Introduction

At present, a great number of developments are dedicated to the problems of automatic music transcription. The start was in 1975, when James Moorer designed the first automatic control system which could separate duets of masterpieces [1].

A general approach consists of the following. Firstly, audio signal decomposes into harmonic series using different mathematical methods [2]. Secondly, local harmonic peaks are grouping into harmonic sets by various criteria. And, finally, received information is used to estimate timbre of musical instruments by using different approaches [3].

A complete transcription would require that the pitch, timing and instrument of all the sound events be resolved. As this can be very hard or even theoretically impossible in some cases, the goal is usually redefined as being either to notate as many of the constituent sounds as possible (complete transcription) or to transcribe only some well-defined part of the music signal, for example the dominant melody or the most prominent drum sounds (partial transcription) [1].

Critical analysis

According to [4], most of automatic systems was designed for very particular classes of masterpieces. It means that they aren't able to convert real music information into an object-oriented format. Let's consider such kind of restriction in more detail.

Music signals are characteristically non-stationary, meaning that the sound evolves over time but over sufficiently short periods of time are often considered to be "quasi-stationary" [5]. This property of local stationarity suggests the discrete short-time Fourier transform (fig. 1a) as a signal representation and widely used by the researchers [1]. However, this approach has the following disadvantages:

- it is necessary to provide increased resolution on the high frequency bands;

- sound object can be located at the edges of the windows and must be identified like the rest concentrated at the centre of these functions.

Frequency

Time

а)

Time

Figure 1 – Time-frequency resolution: а) Fourier transform, b) wavelet transform

These two problems can be solved by using of wavelet transform (fig. 1b). Here the basis functions are localized both in the time domain and in the frequency domain (in contrast of Fourier transform, where basis functions are sinusoidal envelopes – these functions have an ideal frequency localization and don't have any time restrictions) and act as a window functions.

Now, let's consider the fundamental problems of the complex music analysis:

1. partial overlapping shows a great influence on a result accuracy, which can be reflected in either sound merging or separation of simultaneously sounding notes;

2. timbre overlapping, especially for different instrument parts;

3. spectral resolution – the higher frequency requires the higher resolution for more detailed spectral analysis of sound wave characteristics;

4. meter estimation, especially in cases of input music signals with amplitude-aligned representation.

Taking these aspects into consideration, it is possible to extract the following widespread restrictions for today’s development:

1. musical polyphony degree (restriction of simultaneously sounding objects within a specific piece of music);

2. object polyphony degree (restriction of simultaneously sounding objects within a specific musical instrument);

3. instrumental polyphony degree (restriction of simultaneously sounding musical instruments within a specific piece of music).

Description of a proposed method

Figure 4 shows the algorithm of a new approach which aim is to get the highest possible quality of sound information to object model conversion by adaptation to the specific sound data.

Before the arranger picks up the notes of some masterpiece, he or she, firstly, performs pre-listening and determines the instrumental polyphony degree. That is, the arranger forms a list of instruments which are used in the piece of music. And then, he picks up the parts of each musical instrument separately [6].

This approach has the same idea as the arranger operations. To prove it let’s describe each block in the algorithm in more detail (fig. 2).

The block of input data receives the values of musical signal s_i (i = 1, 2, …, n). Then, if input data is null or contains a part of a whole audio signal then the corresponding message is shown (block 3) and the process repeats until abort or the system receives the correct data. Otherwise, the process goes to block 4.

In the block of tone envelope generation harmonics are formed which correspond to the frequency band of piano as the instrument with the widest harmonic range among the other ones (table 1).

Block 5 performs timbre signal decomposition using discrete wavelet transform. Harmonics, defined in the previous block are used as basis functions. It allows to localize timbre components of various musical instruments.

Block instrument identification defines onset and offset times with the minimal harmonic components (fig. 3).

Figure 2 – Algorithm of a proposed method

T₁ , T₂ , T₃ … T_Nins

Figure 3 – Т-set

Table 1 – Frequency ranges of some musical instruments

Instrument	Boundary frequencies, Hz
Instrument	low	high
Piano (Concert piano)	27	4200
Contrabass	40	300
Violins	210	2800
Oboe	230	1480
Flute	240	2300

If T-set is not defined then a corresponding message is shown (block 8). Otherwise, process goes to block 9.

Block time-frequency signal decomposition performs discrete wavelet transform for each elementary T-image. Elements of T-set are used as basis functions.

Scaling operations run accordingly to the recurrent principle using the relation of a harmonic distribution in musical instruments [7]. It results in W-set that contains source audio signal decomposed into timbre components.

Block 10 performs teaching parameter initialization to determine sound patterns within W-set. Received information from block 11 is synthesized and converted into MIDI format (Music Instrument Digital Interface).

Results

The proposed approach is going to get good results in spectral resolution and in cases of harmonic overlapping due to wavelet transform algorithms. Timbre crossing can be separated by a collection of basis wavelet functions and the musical metre can be estimated after the multipitch analysis when we’ll have enough note information for each instrumental part of the musical masterpiece.

Conclusions

The critical analysis of existing methods in the field of automatic music transcription has been performed. The fundamental problems and negative aspects of extracting sounding objects have been considered. Finally, a new approach was introduced. It uses an adaptive scheme which allows learning the character of a specific music signal.

Its main advantage is that the conversion of the sound information based on the timbre analysis of a specific music signal. In contrast to the previous developments the timbre patterns were defined beforehand and used to all of the audio signals regardless of their style and class, the proposed method forms a collection of timbre data during timbre signal decomposition. This approach will increase the quality and decrease identification errors in the different musical styles.

References:

1. Klapuri A. Signal Processing Methods for Music Transcription / A. Klapuri, M. Davy. — Springer, New York, 2006.

2. Ellis, D. Extracting Information from Music Audio / D. Ellis // LabROSA, Dept. of Electrical Engineering Columbia University, NY, March 15, 2006.

3. Фадеев, А. С. Идентификация музыкальных объектов на основе непрерывного вейвлет-преобразования / А. С. Фадеев // Диссертация. — Томский политехнический университет. — 2008.

4. Martin, K. D. Toward automatic sound source recognition: Identifying musical instruments / K. D. Martin // Proc. of the 1998 NATO Advanced Study Institute on Computational Hearing, II Ciocco, Italy, July, 1998.

5. Лукин, A. Введение в цифровую обработку сигналов (математические основы)., Лаборатория компьютерной графики и мультимедиа, МГУ, 2007.

6. Андреева, А. В. Развитие звуковысотного слуха младших школьников при обучении игре на скрипке / А. В. Андреева // Диссертация. — Нижний Новгород. — 2009.

7. Способин, И. В. Элементарная теория музыки. — Государственное музыкальное издательство. — М. — 1963.