Justin Salamon
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology

Melody Extraction by Contour Classification

21/7/2015

0 Comments

 
Picture
Due to the scarcity of labeled data, most melody extraction algorithms do not rely on fully data-driven processing blocks but rather on careful engineering. For example, the Melodia melody extraction algorithm employs a pitch contour selection stage that relies on a number of heuristics for selecting the melodic output. In this paper we explore the use of a discriminative model to perform purely data-driven melodic contour selection. Specifically, a discriminative binary classifier is trained to distinguish melodic from non-melodic contours. This classifier is then used to predict likelihoods for a track’s extracted contours, and these scores are decoded to generate a single melody output. The results are compared with the Melodia algorithm and with a generative model used in a previous study. We show that the discriminative model outperforms the generative model in terms of contour classification accuracy, and the melody output from our proposed system performs comparatively to Melodia. The results are complemented with error analysis and avenues for future improvements.

For further details please see our paper:

R. Bittner, J. Salamon, S. Essid and J. P. Bello. "Melody Extraction by Contour Classification". Proc. 16th International Society for Music Information Retrieval Conference (ISMIR 2015), Malaga, Spain, Oct. 2015.
[ISMIR][PDF][BibTex]

0 Comments

Feature Learning with Deep Scattering for Urban Sound Analysis

10/6/2015

0 Comments

 
Picture
In this paper we evaluate the scattering transform as an alternative signal representation to the mel-spectrogram in the context of unsupervised feature learning for urban sound classification. We show that we can obtain comparable (or better) performance using the scattering transform whilst reducing both the amount of training data required for feature learning and the size of the learned codebook by an order of magnitude. In both cases the improvement is attributed to the local phase invariance of the representation. We also observe improved classification of sources in the background of the auditory scene, a result that provides further support for the importance of temporal modulation in sound segregation.

For further details please see our paper:

J. Salamon and J. P. Bello. "Feature Learning with Deep Scattering for Urban Sound Analysis", 2015 European Signal Processing Conference (EUSIPCO), Nice, France, August 2015.
[EURASIP][PDF][BibTex]

0 Comments

Tony: A New Tool for Transcribing Melodies

3/4/2015

2 Comments

 
We present Tony, a software tool for the interactive annotation of melodies from monophonic audio recordings, and evaluate its usability and the accuracy of its note extraction method. The scientific study of acoustic performances of melodies, whether sung or played, requires the accurate transcription of notes and pitches. To achieve the desired transcription accuracy for a particular application, researchers manually correct results obtained by automatic methods. Tony is an interactive tool directly aimed at making this correction task efficient. It provides (a) state-of-the art algorithms for pitch and note estimation, (b) visual and auditory feedback for easy error-spotting, (c) an intelligent graphical user interface through which the user can rapidly correct estimation errors, (d) extensive export functions enabling further processing in other applications. We show that Tony’s built in automatic note transcription method compares favourably with existing tools. We report how long it takes to annotate recordings on a set of 96 solo vocal recordings and study the effect of piece, the number of edits made and the annotator’s increasing mastery of the software. Tony is Open Source software, with source code and compiled binaries for Windows, Mac OS X and Linux available from:
https://code.soundsoftware.ac.uk/projects/tony/
Picture
Screenshot of the Tony interface on OSX.
For further details please check out our paper:

M. Mauch, C. Cannam, R. Bittner, G. Fazekas, J. Salamon, J. Dai, J. P. Bello, and S. Dixon. Computer-aided melody note transcription using the Tony software: Accuracy and efficiency. In First International Conference on Technologies for Music Notation and Representation (TENOR), Paris, France, May 2015.
[TENOR][PDF][BibTex]
2 Comments

Unsupervised Feature Learning for Urban Sound Classification

5/2/2015

0 Comments

 
Recent studies have demonstrated the potential of unsupervised feature learning for sound classification. In this paper we further explore the application of the spherical k-means algorithm for feature learning from audio signals, here in the domain of urban sound classification. Spherical k-means is a relatively simple technique that has recently been shown to be competitive with other more complex and time consuming approaches. We study how different parts of the processing pipeline influence performance, taking into account the specificities of the urban sonic environment. We evaluate our approach on the largest public dataset of urban sound sources available for research, and compare it to a baseline system based on MFCCs. We show that feature learning can outperform the baseline approach by configuring it to capture the temporal dynamics of urban sources. The results are complemented with error analysis and some proposals for future research.
salamon_urban_codewords
J. Salamon and J. P. Bello. "Unsupervised Feature Learning for Urban Sound Classification", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, April 2015.
[IEEE][DOI][PDF][BibTeX][Copyright]
0 Comments

Announcing the  Urban  Sound  dataset and taxonomy

23/10/2014

0 Comments

 
Picture
 We are pleased to announce the release of UrbanSound, a dataset containing 27 hours of field-recordings with over 3000 labelled sound source occurrences from 10 sound classes. The dataset focuses on sounds that occur in urban acoustic environments.

To facilitate comparable research on urban sound source classification, we are also releasing a second version of this dataset, UrbanSound8K, with 8732 excerpts limited to 4 seconds (also with source labels), and pre-sorted into 10 stratified folds. In addition to the source ID both datasets also include a (subjective) salience label for each source occurrence: foreground / background.

The datasets are released for research purposes under a Creative Commons Attribution Noncommercial License, and are available online at the dataset companion website:


http://urbansounddataset.weebly.com/

This companion website also contains further information about each dataset, including the Urban Sound Taxonomy from which the 10 sound classes in this dataset were selected.

The datasets and taxonomy will be presented at the ACM Multimedia 2014 conference in Orlando in a couple of weeks. For those interested, please see our paper:

J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", in Proc. 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.

For those attending ISMIR 2014 next week, I will also be there if you would like to discuss the datasets and taxonomy.

I hope you find the datasets useful for your work and look forward to seeing some of you at ISMIR and ACM-MM in the coming weeks!

0 Comments

3 papers to make MIR a better place

1/9/2014

 
Picture
This year I've collaborated on 3 papers for the ISMIR 2014 conference, and they are all about making MIR a more reproducible, transparent, and reliable field of research. In a nutshell, they're about making MIR a better place :)

The first, lead by Rachel Bittner (MARL @ NYU), describes MedleyDB, a new dataset of multitrack recordings we have compiled and annotated, primarily for melody extraction evaluation. Unlike previous datasets, it contains over 100 songs, most of which are full-length (rather than excerpts), in a variety of musical genres, and of professional quality (not only in the recording, but also in the content):

  • R. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam and J. P. Bello. "MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research", in Proc. 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan, October 2014.

We hope this new dataset will help shed light on the remaining challenges in melody extraction (we have identified a few ourselves in the paper), and allow researchers to evaluate their algorithms on a more realistic dataset. The dataset can also be used for research in musical instrument identification, source separation, multiple f0 tracking, and any other MIR task that benefits from the availability of multitrack audio data. Congratulations to my co-authors Rachel, Mike, Matthias, Chris and Juan!


The second paper, lead by Eric Humphrey (MARL @ NYU), introduces JAMS, a new specification we've been working on for representing MIR annotations. JAMS = JSON Annotated Music Specification, and as you can imagine, is JSON based:

  • E. J. Humphrey, J. Salamon, O. Nieto, J. Forsyth, R. M. Bittner and J. P. Bello. "JAMS: A JSON Annotated Music Specification for Reproducible MIR Research", in Proc. 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan, October 2014.

The three main concepts behind JAMS are:

  1. Comprehensive annotation: moving away from lab files, a JAMS file can store comprehensive annotation data and annotation metadata in a structured way that can be easily loaded from and saved to disk.
  2. Multiple annotations: sometimes an annotation should be considered more of a reference than a ground truth, in that different annotators may produce different references (e.g. chord annotations). JAMS allows to store multiple annotations for the same recording in a single file.
  3. Multiple tasks: traditionally, the annotation for each MIR task (e.g. melody extraction, chord recognition, genre identification, etc.) is stored in a separate file. JAMS allows to store the annotations of different tasks for the same recording in a single JAMS file which, in addition to keeping things tidy, facilitates the development/evaluation of algorithms that use  / extract multiple musical facets at once.

As with all new specifications / protocols / conventions, the real success of JAMS depends on its adoption by the community. We are fully aware that this is but a proposal, a first step, and hope to develop / improve JAMS by actively discussing it with the MIR community. To ease adoption, we're providing a python library for loading / saving / manipulating JAMS files, and have ported the annotations of several of the most commonly used corpora in MIR into JAMS. Congratulations to my co-authors Eric, Uri (Oriol), Jon, Rachel and Juan!

The third paper, lead by Colin Raffel (LabROSA @ Columbia), describes mir_eval, an open-source python library that implements the most common evaluation measures for a large selection of MIREX tasks including melody extraction, chord recognition, beat detection, onset detection, structural segmentation and source separation:
  • C. Raffel, B. McFee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang and D. P. W. Ellis. "mir_eval: A Transparent Implementation of Common MIR Metrics", Proc. 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan, October 2014.
We hope this (a) makes the life of MIR researchers easier, providing an easy-to-use MIR DIY library and more importantly (b) promotes transparency and reproducibility in MIR research by ensuring researchers use the same evaluation code (as opposed to every researcher re-implementing their own eval code as is the case right now) and making that code available online for inspection. Congratulations to my co-authors Colin, Brian, Eric, Uri (Oriol), Dawen and Dan!

Looking forward to discussing these papers and ideas with everyone at ISMIR 2014! See you in Taipei ^_^

JNMR review of tonic ID algorithms for Indian classical music published

1/4/2014

 
Picture
Our article reviewing and comparing tonic identification algorithms for Indian classical music has just been published in the Journal of New Music Research.

Abstract: The tonic is a fundamental concept in Indian art music. It is the base pitch, which an artist chooses in order to construct the melodies during a rag(a) rendition, and all accompanying instruments are tuned using the tonic pitch. Consequently, tonic identification is a fundamental task for most computational analyses of Indian art music, such as intonation analysis, melodic motif analysis and rag recognition. In this paper we review existing approaches for tonic identification in Indian art music and evaluate them on six diverse datasets for a thorough comparison and analysis. We study the performance of each method in different contexts such as the presence/absence of additional metadata, the quality of audio data, the duration of audio data, music tradition (Hindustani/Carnatic) and the gender of the singer (male/female). We show that the approaches that combine multi-pitch analysis with machine learning provide the best performance in most cases (90% identification accuracy on average), and are robust across the aforementioned contexts compared to the approaches based on expert knowledge. In addition, we also show that the performance of the latter can be improved when additional metadata is available to further constrain the problem. Finally, we present a detailed error analysis of each method, providing further insights into the advantages and limitations of the methods.

  • S. Gulati, A. Bellur, J. Salamon, H. G. Ranjani, V. Ishwar, H. A. Murthy, and X. Serra, "Automatic Tonic Identification in Indian Art Music: Approaches and Evaluation", J. of New Music Research, 43(1):53–71, Mar. 2014.
        [Taylor & Francis][DOI][PDF][BibTex]

Congratulations to all of the authors, and in particular to Sankalp Gulati for all the effort he put into this paper.

IEEE SPM Melody Extraction Review published online

16/2/2014

 
IEEE SPM cover
Our review article on melody extraction algorithms for the IEEE Signal Processing Magazine is finally available online! The printed edition will be coming out in March 2014:

J. Salamon, E. Gómez, D. P. W. Ellis and G. Richard, "Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges", IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.

Abstract—Melody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. Over the past decade melody extraction has emerged as an active research topic, comprising a large variety of proposed algorithms spanning a wide range of techniques. This article provides an overview of these techniques, the applications for which melody extraction is useful, and the challenges that remain. We start with a discussion of ‘melody’ from both musical and signal processing perspectives, and provide a case study which interprets the output of a melody extraction algorithm for specific excerpts. We then provide a comprehensive comparative analysis of melody extraction algorithms based on the results of an international evaluation campaign. We discuss issues of algorithm design, evaluation and applications which build upon melody extraction. Finally, we discuss some of the remaining challenges in melody extraction research in terms of algorithmic performance, development, and evaluation methodology.

For further information about this article please visit my Research page.

Melody Extraction Review Published in IEEE Signal Processing Magazine 

6/7/2013

 
Picture
Our review article on melody extraction algorithms has been accepted for publication in the IEEE Signal Processing Magazine!








Here are the full details (including a link to a preprint of the article):

J. Salamon, E. Gómez, D. P. W. Ellis and G. Richard, "Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges", IEEE Signal Processing Magazine, In Press (2013).
[IEEE][DOI][PDF][BibTeX][Copyright]

The paper provides a detailed review of the current state of the art in melody extraction. For a slightly longer description here's the abstract:

Melody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. Over the past decade melody extraction has emerged as an active research topic, comprising a large variety of proposed algorithms spanning a wide range of techniques. This article provides an overview of these techniques, the applications for which melody extraction is useful, and the challenges that remain. We start with a discussion of `melody' from both musical and signal processing perspectives, and provide a case study which interprets the output of a melody extraction algorithm for specific excerpts. We then provide a comprehensive comparative analysis of melody extraction algorithms based on the results of an international evaluation campaign. We discuss issues of algorithm design, evaluation and applications which build upon melody extraction. Finally, we discuss some of the remaining challenges in melody extraction research in terms of algorithmic performance, development, and evaluation methodology.
A special thanks to the co-authors of the article: Emilia Gómez, Dan Ellis and Gaël Richard!

Paper on version identification and query-by-humming published

15/11/2012

0 Comments

 
Our paper:

J. Salamon, J. Serrà and E. Gómez, "Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming", International Journal of Multimedia Information Retrieval, special issue on Hybrid Music Information Retrieval, In Press.

has now been officially accepted for publication. The paper compares different tonal representations (melody, bass line and harmony) for version identification (automatically detecting cover songs). We also show how our approach for vesrion ID can be easily adapted for query-by-humming (QBH, i.e. searching for a song stuck in your head by singing or humming part of the melody), and since both the melody extraction (using MELODIA) and the matching is fully automatic, this is a fully automatic audio-to-audio QBH system prototype!

We're also planning to put all the queries we recorded for the experiments online, together with a list of the songs in the music collections we used for evaluation (unfortunately we can't share the songs themselves because they are protected by copyright law). I'll write a new post once the files are up.

I'd like to thank my co-authors Joan Serrà and Emilia Gómez for their excellent work. Hope you find article interesting!
0 Comments
<<Previous
Forward>>

    NEWS

    Machine listening research, code, data & hacks!

    Archives

    March 2023
    April 2022
    November 2021
    October 2021
    June 2021
    January 2021
    October 2020
    June 2020
    May 2020
    April 2020
    January 2020
    November 2019
    October 2019
    June 2019
    May 2019
    March 2019
    February 2019
    January 2019
    November 2018
    October 2018
    August 2018
    July 2018
    May 2018
    April 2018
    February 2018
    October 2017
    August 2017
    July 2017
    June 2017
    April 2017
    March 2017
    January 2017
    December 2016
    November 2016
    October 2016
    August 2016
    June 2016
    May 2016
    April 2016
    February 2016
    January 2016
    November 2015
    October 2015
    July 2015
    June 2015
    April 2015
    February 2015
    November 2014
    October 2014
    September 2014
    June 2014
    April 2014
    March 2014
    February 2014
    December 2013
    September 2013
    July 2013
    May 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    August 2012
    July 2012
    June 2012

    Categories

    All
    ACM MM'13
    ACM MM'14
    Acoustic Ecology
    Acoustic Event Detection
    Acoustic Sensing
    AES
    Applied Acoustics
    Article
    Audio-annotator
    Audio To Midi
    Auditory Scene Analysis
    Avian
    Award
    Baseball
    Beer
    Best Oral Presentation
    Best Paper Award
    Best Student Paper Award
    BigApps
    Bioacoustics
    BirdVox
    Book
    Chapter
    CHI
    Citizen Science
    Classification
    Computer Vision
    Conference
    Connected Cities
    Convolutional Neural Networks
    Cornell Lab Of Ornithology
    Coursera
    Cover Detection
    CREPE
    Crowdcrafting
    Crowdsourcing
    CUSP
    CVPR
    Data Augmentation
    Data Science
    Dataset
    Data Structures
    Dcase
    Deep Learning
    Domain
    Education
    Entrepreneurship
    Environmental Sound
    Essentia
    Eusipco
    Eusipco2015
    Evaluation
    Few-shot Learning
    Flight Calls
    Girl Scouts
    Grant
    Hackathon
    Hackday
    Hackfest
    HCI
    Hildegard Von Bingen
    ICASSP
    ICASSP 2020
    IEEE Signal Processing Letters
    Ieee Spm
    Indian Classical Music
    Interface
    Interspeech
    Interview
    Ismir 2012
    Ismir2014
    Ismir2015
    Ismir2016
    Ismir2017
    Ismir2020
    ITP
    Jams
    Javascript
    JNMR
    Journal
    Machine Learning
    Machine Listening
    Map
    Media
    Melodia
    Melody Extraction
    Metric Learning
    Midi
    Migration Monitoring
    MIR
    Mir_eval
    MOOC
    MTG-QBH
    Music Informatics
    Music Information Retrieval
    Music Similarity
    National Science Foundation
    Neumerator
    New York Times
    Noise Pollution
    Notebook
    NPR
    NSF
    NYC
    NYU
    Open Source
    Pitch
    Pitch Contours
    Pitch Tracking
    Plos One
    Plug In
    Plug-in
    Presentation
    Press
    PRI
    Prosody
    Publication
    Python
    Query By Humming
    Query-by-humming
    Radio
    Representation Learning
    Research
    Robots
    Scaper
    Science And The City
    Science Friday
    Self-supervision
    Sensor Network
    Sensors
    Sight And Sound Workshop
    Smart Cities
    Software
    SONYC
    Sound Classification
    Sound Education
    Sound Event Detection
    Soundscape
    Sounds Of New York City
    Sound Workshop
    Speech
    STEM
    Synthesis
    Taste Of Science
    Taxonomy
    Technical Report
    Time Series
    Tonic ID
    Tony
    Tutorial
    Unsupervised Feature Learning
    Urban
    Urban Sound Analysis
    Urban Sound Tagging
    Vamp
    Version Identification
    Visualization
    Vocaloid
    Vocoder
    Warblers
    Wav To Midi
    Welcome
    Wired
    WNYC
    Women In Science
    Workshop
    World Domination
    Wsf14
    Youtube

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology