Justin Salamon
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology

Few-shot Drum Transcription in Polyphonic Music

11/10/2020

0 Comments

 
Data-driven approaches to automatic drum transcription (ADT) are often limited to a predefined, small vocabulary of percussion instrument classes. Such models cannot recognize out-of-vocabulary classes nor are they able to adapt to finer-grained vocabularies. In this work, we address open vocabulary ADT by introducing few-shot learning to the task. We train a Prototypical Network on a synthetic dataset and evaluate the model on multiple real-world ADT datasets with polyphonic accompaniment. We show that, given just a handful of selected examples at inference time, we can match and in some cases outperform a state-of-the art supervised ADT approach under a fixed vocabulary setting. At the same time, we show that our model can successfully generalize to finer-grained or extended vocabularies unseen during training, a scenario where supervised approaches cannot operate at all. We provide a detailed analysis of our experimental results, including a breakdown of performance by sound class and by polyphony.

To learn more please read out paper:

Few-Shot Drum Transcription in Polyphonic Music
Y. Wang, J. Salamon, M. Cartwright, N. J. Bryan, J. P. Bello
In 21st International Society for Music Information Retrieval Conference (ISMIR), Montreal, Canada (virtual), Oct. 2020.

You can find more related materials including a short video presentation and a poster here:

https://program.ismir2020.net/poster_1-14.html
Picture
0 Comments

Disentangled Multidimensional Metric Learning for Music Similarity

3/5/2020

0 Comments

 
Music similarity search is useful for a variety of creative tasks such as replacing one music recording with another recording with a similar "feel", a common task in video editing. For this task, it is typically necessary to define a similarity metric to compare one recording to another. Music similarity, however, is hard to define and depends on multiple simultaneous notions of similarity (i.e. genre, mood, instrument, tempo). While prior work ignore this issue, we embrace this idea and introduce the concept of multidimensional similarity and unify both global and specialized similarity metrics into a single, semantically disentangled multidimensional similarity metric. To do so, we adapt a variant of deep metric learning called conditional similarity networks to the audio domain and extend it using track-based information to control the specificity of our model. We evaluate our method and show that our single, multidimensional model outperforms both specialized similarity spaces and alternative baselines. We also run a user-study and show that our approach is favored by human annotators as well.

Disentangled Multidimensional Metric Learning for Music Similarity
J. Lee, N.J. Bryan, J. Salamon, Z. Jin, J. Nam
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020.
[IEEE][PDF][BibTeX][Copyright]
Picture
0 Comments

Deep Salience Representations for F0 Estimation in Polyphonic Music

15/7/2017

0 Comments

 
Picture
Estimating fundamental frequencies in polyphonic music remains a notoriously difficult task in Music Information Retrieval. While other tasks, such as beat tracking and chord recognition have seen improvement with the application of deep learning models, little work has been done to apply deep learning methods to fundamental frequency related tasks including multi-f0 and melody tracking, primarily due to the scarce availability of labeled data. In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets. We conclude with directions for future research.

Deep Salience Representations for f0 Estimation in Polyphonic Music
Rachel. M. Bittner, B. McFee, J. Salamon, P. Li, and J. P. Bello.
In 18th International Society for Music Information Retrieval Conference, Suzhou, China, Oct. 2017.
0 Comments

Best Student Paper Award at 2017 AES International Conference on Semantic Audio

23/6/2017

0 Comments

 
Picture
I'm excited to report that our paper "Pitch Contours as a Mid-Level Representation for Music Informatics", has won the Best Student Paper Award at the 2017 AES International Conference on Semantic Audio. The paper, led and presented by my colleague Rachel Bittner, proposes a factored architecture for a variety of pitch-informed MIR tasks such predominant and multiple f0 estimation, genre, gender and singing style classification; with pitch contours as a powerful and semantically rich mid-level representation.

So... should all machine learning for music be end-to-end? See what we found in the full paper:

​Pitch Contours as a Mid-Level Representation for Music Informatics
R. M. Bittner, J. Salamon, J. J. Bosch, and J. P. Bello.
In AES Conference on Semantic Audio, Erlangen, Germany, Jun. 2017.
[PDF]
0 Comments

Pitch Contours as a Mid-Level Representation for Music Informatics

14/4/2017

0 Comments

 
Picture
Content-based Music Informatics includes tasks that involve estimating the pitched content of music, such as the main melody or the bass line. To date, the field lacks a good machine representation that models the human perception of pitch, with each task using specific, tailored representations. This paper proposes factoring pitch estimation problems into two stages, where the output of the first stage for all tasks is a multipitch contour representation. Further, we propose the adoption of pitch contours as a unit of pitch organization. We give a review of the existing work on contour extraction and characterization and present experiments that demonstrate the discriminability of pitch contours.

Agree? Disagree? Get the full details here:

Pitch Contours as a Mid-Level Representation for Music Informatics
R. M. Bittner, J. Salamon, J. J. Bosch, and J. P. Bello.
In AES Conference on Semantic Audio, Erlangen, Germany, Jun. 2017.
[PDF]

0 Comments

Ensemble: A Hybrid Human-Machine System for Generating Melody Scores From Audio

20/5/2016

0 Comments

 
Music transcription is a highly complex task that is difficult for automated algorithms, and equally challenging to people, even those with many years of musical training. Furthermore, there is a shortage of high-quality datasets for training automated transcription algorithms. In this research, we explore a semi-automated, crowdsourced approach to generate music transcriptions, by first running an automatic melody transcription algorithm on a (polyphonic) song to produce a series of discrete notes representing the melody, and then soliciting the crowd to correct this melody. We present a novel web-based interface that enables the crowd to correct transcriptions, report results from an experiment to understand the capabilities of non-experts to perform this challenging task, and characterize the characteristics and actions of workers and how they correlate with transcription performance.

For further details check out our paper:

Ensemble: A Hybrid Human-Machine System for Generating Melody Scores From Audio
T. Tse, J. Salamon, A. Williams, H. Jiang and E. Law
Proc. 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York City, USA, Aug. 2016.
[ISMIR][PDF][BibTex]
Screenshot of the Ensemble interface
Screenshot of the Ensemble interface
0 Comments

    NEWS

    Machine listening research, code, data & hacks!

    Archives

    March 2023
    April 2022
    November 2021
    October 2021
    June 2021
    January 2021
    October 2020
    June 2020
    May 2020
    April 2020
    January 2020
    November 2019
    October 2019
    June 2019
    May 2019
    March 2019
    February 2019
    January 2019
    November 2018
    October 2018
    August 2018
    July 2018
    May 2018
    April 2018
    February 2018
    October 2017
    August 2017
    July 2017
    June 2017
    April 2017
    March 2017
    January 2017
    December 2016
    November 2016
    October 2016
    August 2016
    June 2016
    May 2016
    April 2016
    February 2016
    January 2016
    November 2015
    October 2015
    July 2015
    June 2015
    April 2015
    February 2015
    November 2014
    October 2014
    September 2014
    June 2014
    April 2014
    March 2014
    February 2014
    December 2013
    September 2013
    July 2013
    May 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    August 2012
    July 2012
    June 2012

    Categories

    All
    ACM MM'13
    ACM MM'14
    Acoustic Ecology
    Acoustic Event Detection
    Acoustic Sensing
    AES
    Applied Acoustics
    Article
    Audio-annotator
    Audio To Midi
    Auditory Scene Analysis
    Avian
    Award
    Baseball
    Beer
    Best Oral Presentation
    Best Paper Award
    Best Student Paper Award
    BigApps
    Bioacoustics
    BirdVox
    Book
    Chapter
    CHI
    Citizen Science
    Classification
    Computer Vision
    Conference
    Connected Cities
    Convolutional Neural Networks
    Cornell Lab Of Ornithology
    Coursera
    Cover Detection
    CREPE
    Crowdcrafting
    Crowdsourcing
    CUSP
    CVPR
    Data Augmentation
    Data Science
    Dataset
    Data Structures
    Dcase
    Deep Learning
    Domain
    Education
    Entrepreneurship
    Environmental Sound
    Essentia
    Eusipco
    Eusipco2015
    Evaluation
    Few-shot Learning
    Flight Calls
    Girl Scouts
    Grant
    Hackathon
    Hackday
    Hackfest
    HCI
    Hildegard Von Bingen
    ICASSP
    ICASSP 2020
    IEEE Signal Processing Letters
    Ieee Spm
    Indian Classical Music
    Interface
    Interspeech
    Interview
    Ismir 2012
    Ismir2014
    Ismir2015
    Ismir2016
    Ismir2017
    Ismir2020
    ITP
    Jams
    Javascript
    JNMR
    Journal
    Machine Learning
    Machine Listening
    Map
    Media
    Melodia
    Melody Extraction
    Metric Learning
    Midi
    Migration Monitoring
    MIR
    Mir_eval
    MOOC
    MTG-QBH
    Music Informatics
    Music Information Retrieval
    Music Similarity
    National Science Foundation
    Neumerator
    New York Times
    Noise Pollution
    Notebook
    NPR
    NSF
    NYC
    NYU
    Open Source
    Pitch
    Pitch Contours
    Pitch Tracking
    Plos One
    Plug In
    Plug-in
    Presentation
    Press
    PRI
    Prosody
    Publication
    Python
    Query By Humming
    Query-by-humming
    Radio
    Representation Learning
    Research
    Robots
    Scaper
    Science And The City
    Science Friday
    Self-supervision
    Sensor Network
    Sensors
    Sight And Sound Workshop
    Smart Cities
    Software
    SONYC
    Sound Classification
    Sound Education
    Sound Event Detection
    Soundscape
    Sounds Of New York City
    Sound Workshop
    Speech
    STEM
    Synthesis
    Taste Of Science
    Taxonomy
    Technical Report
    Time Series
    Tonic ID
    Tony
    Tutorial
    Unsupervised Feature Learning
    Urban
    Urban Sound Analysis
    Urban Sound Tagging
    Vamp
    Version Identification
    Visualization
    Vocaloid
    Vocoder
    Warblers
    Wav To Midi
    Welcome
    Wired
    WNYC
    Women In Science
    Workshop
    World Domination
    Wsf14
    Youtube

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology