PhD Thesis

Melody Extraction from Polyphonic Music Signals

Salamon, J. (2013). Melody Extraction from Polyphonic Music Signals. Ph.D. thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2013.
[MTG][EURASIP][PDF][BibTeX][Slides]

Thesis advisors:

Dr. Emilia Gómez, Pompeu Fabra University (MTG, UPF)
Dr. Xavier Serra, Pompeu Fabra University (MTG, UPF)

Defense board:

Dr. Geoffroy Peeters, Institut de Recherche et Coordination Acoustic/Musique (ANASYN, IRCAM Paris)
Dr. Juan Pablo Bello, New York University (MARL, NYU)
Dr. Fabien Gouyon, Institute for Systems and Computer Engineering of Porto (SMC, INESC Porto)

Abstract

Music was the first mass-market industry to be completely restructured by digital technology, and today we can have access to thousands of tracks stored locally on our smartphone and millions of tracks through cloud-based music services. Given the vast quantity of music at our fingertips, we now require novel ways of describing, indexing, searching and interacting with musical content. In this thesis we focus on a technology that opens the door to a wide range of such applications: automatically estimating the pitch sequence of the melody directly from the audio signal of a polyphonic music recording, also referred to as melody extraction. Whilst identifying the pitch of the melody is something human listeners can do quite well, doing this automatically is highly challenging. We present a novel method for melody extraction based on the tracking and characterisation of the pitch contours that form the melodic line of a piece. We show how different contour characteristics can be exploited in combination with auditory streaming cues to identify the melody out of all the pitch content in a music recording using both heuristic and model-based approaches. The performance of our method is assessed in an international evaluation campaign where it is shown to obtain state-of-the-art results. In fact, it achieves the highest mean overall accuracy obtained by any algorithm that has participated in the campaign to date. We demonstrate the applicability of our method both for research and end-user applications by developing systems that exploit the extracted melody pitch sequence for similarity-based music retrieval (version identification and query-by-humming), genre classification, automatic transcription and computational music analysis. The thesis also provides a comprehensive comparative analysis and review of the current state-of-the-art in melody extraction and a first of its kind analysis of melody extraction evaluation methodology.

Audio Examples

Here are some audio examples of melody pitch sequences extracted using the model-free algorithm presented in the thesis (requires Flash). For each example there are 3 audio files:

In the first you can listen to an excerpt of the original song.
In the second, the pitch estimated by the algorithm.
In the third, the original song in the left channel and the estimated pitch in the right channel (I recommend using headphones for this file).

NOTE: the extracted f0 sequence is synthesised using a single sinusoid!

Figure 4.5
Vocal jazz: