Justin Salamon
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology

Disentangled Multidimensional Metric Learning for Music Similarity

3/5/2020

0 Comments

 
Music similarity search is useful for a variety of creative tasks such as replacing one music recording with another recording with a similar "feel", a common task in video editing. For this task, it is typically necessary to define a similarity metric to compare one recording to another. Music similarity, however, is hard to define and depends on multiple simultaneous notions of similarity (i.e. genre, mood, instrument, tempo). While prior work ignore this issue, we embrace this idea and introduce the concept of multidimensional similarity and unify both global and specialized similarity metrics into a single, semantically disentangled multidimensional similarity metric. To do so, we adapt a variant of deep metric learning called conditional similarity networks to the audio domain and extend it using track-based information to control the specificity of our model. We evaluate our method and show that our single, multidimensional model outperforms both specialized similarity spaces and alternative baselines. We also run a user-study and show that our approach is favored by human annotators as well.

Disentangled Multidimensional Metric Learning for Music Similarity
J. Lee, N.J. Bryan, J. Salamon, Z. Jin, J. Nam
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020.
[IEEE][PDF][BibTeX][Copyright]
Picture
0 Comments

Birdvox-full-night: A Dataset and Benchmark For Avian Flight Call Detection

17/4/2018

0 Comments

 
Picture
We've just released Birdvox-full-night, a new challenging dataset for machine learning on bioacoustic data! Details about the dataset and the models we benchmarked are provided in our ICASSP 2018 paper:

Birdvox-Full-Night: A Dataset and Benchmark for Avian Flight Call Detection
V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, and J. P. Bello
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Canada, Apr. 2018.
[PDF][Copyright]

This article addresses the automatic detection of vocal, nocturnally migrating birds from a network of acoustic sensors. Thus far, owing to the lack of annotated continuous recordings, existing methods had been benchmarked in a binary classification setting (presence vs. absence). Instead, with the aim of comparing them in event detection, we release BirdVox-full-night, a dataset of 62 hours of audio comprising 35402 flight calls of nocturnally migrating birds, as recorded from 6 sensors. We find a large performance gap between energy based detection functions and data-driven machine listening. The best model is a deep convolutional neural network trained with data augmentation. We correlate recall with the density of flight calls over time and frequency and identify the main causes of false alarm.

You can download the dataset here: https://wp.nyu.edu/birdvox/birdvox-full-night/

You can also check out additional bioacoustic datasets for machine learning we have released as part of the BirdVox project here: https://wp.nyu.edu/birdvox/codedata/#datasets

Finally, if you're at ICASSP 2018 and want to learn more be sure to grab my esteemed colleague Vincent Lostanlen for a chat!

0 Comments

New Book Chapter: Sound Analysis in Smart Cities

3/10/2017

0 Comments

 
Picture
​This chapter introduces the concept of smart cities and discusses the importance of sound as a source of information about urban life. It describes a wide range of applications for the computational analysis of urban sounds and focuses on two high-impact areas, audio surveillance, and noise pollution monitoring, which sit at the intersection of dense sensor networks and machine listening. For sensor networks we focus on the pros and cons of mobile versus static sensing strategies, and the description of a low-cost solution to acoustic sensing that supports distributed machine listening. For sound event detection and classification we focus on the challenges presented by this task, solutions including feature design and learning strategies, and how a combination of convolutional networks and data augmentation result in the current state of the art. We close with a discussion about the potential and challenges of mobile sensing, the limitations imposed by the data currently available for research, and a few areas for future exploration.

Sound analysis in smart cities
J. P. Bello, C. Mydlarz, and J. Salamon.
In T. Virtanen, M. D. Plumbley, and D. P. W. Ellis, editors, Computational Analysis of Sound Scenes and Events, pages 373–397. Springer International Publishing, 2018.
[Springer][PDF][BibTeX]

0 Comments

Bettering the World Through Entrepreneurship: Urban Issues

14/8/2017

0 Comments

 
The NYU Entrepreneurial Institute has written an article about how technology & entrepreneurship can help solve urban problems, with a focus on our SONYC project. Read the full article here.
Picture
0 Comments

Can the Internet of Things & AI Solve Urban Noise?

26/4/2017

0 Comments

 
New York City is a loud place. In fact, 9 of 10 adults in NYC are exposed to harmful levels of noise, which can lead to sleep loss, hearing loss and even heart disease. Can anything be done about it? The SONYC research project combines internet of things sensing technology, machine learning, data science and citizen science to tackle noise pollution head on at city scale. Come hear about a sensor that can recognize sounds like a human and how we plan to use it to improve quality of life in NYC.

The talk will be followed by my colleague and astrophysicist Federica Bianco with her talk:
​Twinkle twinkle little city-light!

When: Wednesday, April 26, 2017, 7:30pm  10:00pm
Where:  SingleCut Beersmiths, 19-33 37th Street NY, 11105 United States (map)
Registration: https://tasteofscience.org/ny-events/thescientificcity
Picture
0 Comments

Deep Convolutional Neural Networks and Data Augmentation For Environmental Sound Classification

20/1/2017

0 Comments

 
Picture
The ability of deep convolutional neural networks (CNN) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep convolutional neural network architecture for environmental sound classification. Second, we propose the use of audio data augmentation for overcoming the problem of data scarcity and explore the influence of different augmentations on the performance of the proposed CNN architecture. Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation. Finally, we examine the influence of each augmentation on the model’s classification accuracy for each class, and observe that the accuracy for each class is influenced differently by each augmentation, suggesting that the performance of the model could be improved further by applying class-conditional data augmentation.

​For further details see our paper:

Deep Convolutional Neural Networks and Data Augmentation For Environmental Sound Classification
​J. Salamon and J. P. Bello
IEEE Signal Processing Letters, In Press, 2017.
[IEEE][PDF][BibTeX][Copyright]

0 Comments

Fusing Shallow and Deep Learning for Bioacoustic Bird Species Classification

15/12/2016

0 Comments

 
Picture
Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds’ flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a “shallow learning” approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.

Fusing Shallow and Deep Learning for Bioacoustic Bird Species Classification
J. Salamon, J. P. Bello, A. Farnsworth and S. Kelling
I​n IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, March 2017.

[IEEE][PDF][BibTeX][Copyright]

0 Comments

Three New Datasets For Bioacoustic Machine Learning

23/11/2016

0 Comments

 
We're happy to announce the release of 3 new datasets for research on automatic bioacoustic bird species recognition. The datasets were compiled for our recently published study "Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring", and are freely available on the Dryad Digital Repository:
​
  • CLO-43SD: 5,428 labeled audio clips of flight calls from 43 different species of North American woodwarblers (in the family Parulidae). The clips came from a variety of recording conditions, including clean recordings obtained using highly-directional shotgun microphones, recordings obtained from noisier field recordings using omnidirectional microphones, and recordings obtained from birds in captivity.
Picture
Rosetta Stone For Warblers’ Migration Calls. Source: https://www.allaboutbirds.org/a-rosetta-stone-for-identifying-warblers-migration-calls/
  • CLO-WTSP: 16,703 labeled audio clips captured by remote acoustic sensors deployed in Ithaca, NY and NYC over the fall 2014 and spring 2015 migration seasons. Each clip is labeled to indicate whether it contains a flight call from the target species White-Throated Sparrow (WTSP), a flight call from a non-target species, or no flight call at all.​
  • CLO-SWTH: 179,111 labeled audio clips captured by remote acoustic sensors deployed in Ithaca, NY and NYC over the fall 2014 and spring 2015 migration seasons. Each clip is labeled to indicate whether it contains a flight call from the target species Swainson's Thrush (SWTH), a flight call from a non-target species, or no flight call at all.
​
CLO-43SD is targeted at the closed-set N-class problem (identify which of of these 43 known species produced the flight call in this clip), while CLO-WTSP and CLO-SWTH are targeted at the binary open-set problem (given a clip determine whether it contains a flight call from the target species or not). The latter two come pre-sorted into two subsets: Fall 2014 and Spring 2015. In our study we used the fall subset for training and the spring subset for testing, simulating adversarial yet realistic conditions that require a high level of model generalization.

For further details about the datasets see our article:

​
Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring
J. Salamon , J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck and S. Kelling
PLOS ONE 11(11): e0166866, 2016. doi: 10.1371/journal.pone.0166866. 
[PLOS ONE][PDF][BibTeX]

You can download all 3 datasets from the Dryad Digital Repository at this link.
0 Comments

Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring

23/11/2016

0 Comments

 
PictureA white-throated sparrow, one of the species targeted in the study. Image by Simon Pierre Barrette, license CC-BY-SA 3.0.
Automatic classification of animal vocalizations has great potential to enhance the monitoring of species movements and behaviors. This is particularly true for monitoring nocturnal bird migration, where automated classification of migrants’ flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we investigate the automatic classification of bird species from flight calls, and in particular the relationship between two different problem formulations commonly found in the literature: classifying a short clip containing one of a fixed set of known species (N-class problem) and the continuous monitoring problem, the latter of which is relevant to migration monitoring. We implemented a state-of-the-art audio classification model based on unsupervised feature learning and evaluated it on three novel datasets, one for studying the N-class problem including over 5000 flight calls from 43 different species, and two realistic datasets for studying the monitoring scenario comprising hundreds of thousands of audio clips that were compiled by means of remote acoustic sensors deployed in the field during two migration seasons. We show that the model achieves high accuracy when classifying a clip to one of N known species, even for a large number of species. In contrast, the model does not perform as well in the continuous monitoring case. Through a detailed error analysis (that included full expert review of false positives and negatives) we show the model is confounded by varying background noise conditions and previously unseen vocalizations. We also show that the model needs to be parameterized and benchmarked differently for the continuous monitoring scenario. Finally, we show that despite the reduced performance, given the right conditions the model can still characterize the migration pattern of a specific species. The paper concludes with directions for future research.

The full article is available freely (open access) on PLOS ONE:


​Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring
J. Salamon , J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck and S. Kelling
PLOS ONE 11(11): e0166866, 2016. doi: 10.1371/journal.pone.0166866. 
[PLOS ONE][PDF][BibTeX]

Along with this study, we have also published the three new datasets for bioacoustic machine learning that were compiled for this study.

0 Comments

SONYC featured in New York Times, NPR, Wired and more

7/11/2016

0 Comments

 
Picture
Today SONYC was featured on several major news outlets including the New York Times, NPR and Wired! This follows NYU's press release about the official launch of the SONYC project.

Needless to say I'm thrilled about the coverage the project's launch is receiving. Hopefully it is a sign of the great things yet to come from this project, though, I should note, it has already resulted in several scientific publications.

Here's the complete list of media articles (that I could find) covering SONYC. The WNYC radio segment includes a few words from yours truly :)

Picture
To Create a Quieter City, They’re Recording the Sounds of New York
Picture
BBC World Service - World Update (first minute, then from 36:21)
Picture
Mapping New York City's Excessively Loud Sounds​
Picture
New York, come usare i microfoni per una città più silenziosa​
Picture
Scientists Are Tracking New York Noisiness in Order to Quiet It Down
Picture
NYU Scientists are Trying to Reduce Noise Pollution in New York City
Picture
Researchers Are Recording New York to Make it Quieter
Picture
Sounds of New York City (German Public Radio)
Picture
NYC’s $5 Million Noise Pollution Project
Picture
Mapping the Sounds of New York City Streets
Picture
New UrbanEars project has NYU teaming up with Ohio State to battle noise pollution
Picture
NYU Launches Research Initiative to Combat NYC Noise Pollution
Picture
Smart microphones are recording city sounds to help create a quieter New York
Picture
NYU Moves Forward with Study of City Noise
Picture
How to Take on NYC’s Scary Noise Problem
Picture
Research Initiative Looks to Tame Urban Noise Pollution
If you're interested to learn more about the SONYC project have a look at the SONYC website. You can also check out the SONYC intro video:
0 Comments
<<Previous

    NEWS

    Machine listening research, code, data & hacks!

    Archives

    March 2023
    April 2022
    November 2021
    October 2021
    June 2021
    January 2021
    October 2020
    June 2020
    May 2020
    April 2020
    January 2020
    November 2019
    October 2019
    June 2019
    May 2019
    March 2019
    February 2019
    January 2019
    November 2018
    October 2018
    August 2018
    July 2018
    May 2018
    April 2018
    February 2018
    October 2017
    August 2017
    July 2017
    June 2017
    April 2017
    March 2017
    January 2017
    December 2016
    November 2016
    October 2016
    August 2016
    June 2016
    May 2016
    April 2016
    February 2016
    January 2016
    November 2015
    October 2015
    July 2015
    June 2015
    April 2015
    February 2015
    November 2014
    October 2014
    September 2014
    June 2014
    April 2014
    March 2014
    February 2014
    December 2013
    September 2013
    July 2013
    May 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    August 2012
    July 2012
    June 2012

    Categories

    All
    ACM MM'13
    ACM MM'14
    Acoustic Ecology
    Acoustic Event Detection
    Acoustic Sensing
    AES
    Applied Acoustics
    Article
    Audio-annotator
    Audio To Midi
    Auditory Scene Analysis
    Avian
    Award
    Baseball
    Beer
    Best Oral Presentation
    Best Paper Award
    Best Student Paper Award
    BigApps
    Bioacoustics
    BirdVox
    Book
    Chapter
    CHI
    Citizen Science
    Classification
    Computer Vision
    Conference
    Connected Cities
    Convolutional Neural Networks
    Cornell Lab Of Ornithology
    Coursera
    Cover Detection
    CREPE
    Crowdcrafting
    Crowdsourcing
    CUSP
    CVPR
    Data Augmentation
    Data Science
    Dataset
    Data Structures
    Dcase
    Deep Learning
    Domain
    Education
    Entrepreneurship
    Environmental Sound
    Essentia
    Eusipco
    Eusipco2015
    Evaluation
    Few-shot Learning
    Flight Calls
    Girl Scouts
    Grant
    Hackathon
    Hackday
    Hackfest
    HCI
    Hildegard Von Bingen
    ICASSP
    ICASSP 2020
    IEEE Signal Processing Letters
    Ieee Spm
    Indian Classical Music
    Interface
    Interspeech
    Interview
    Ismir 2012
    Ismir2014
    Ismir2015
    Ismir2016
    Ismir2017
    Ismir2020
    ITP
    Jams
    Javascript
    JNMR
    Journal
    Machine Learning
    Machine Listening
    Map
    Media
    Melodia
    Melody Extraction
    Metric Learning
    Midi
    Migration Monitoring
    MIR
    Mir_eval
    MOOC
    MTG-QBH
    Music Informatics
    Music Information Retrieval
    Music Similarity
    National Science Foundation
    Neumerator
    New York Times
    Noise Pollution
    Notebook
    NPR
    NSF
    NYC
    NYU
    Open Source
    Pitch
    Pitch Contours
    Pitch Tracking
    Plos One
    Plug In
    Plug-in
    Presentation
    Press
    PRI
    Prosody
    Publication
    Python
    Query By Humming
    Query-by-humming
    Radio
    Representation Learning
    Research
    Robots
    Scaper
    Science And The City
    Science Friday
    Self-supervision
    Sensor Network
    Sensors
    Sight And Sound Workshop
    Smart Cities
    Software
    SONYC
    Sound Classification
    Sound Education
    Sound Event Detection
    Soundscape
    Sounds Of New York City
    Sound Workshop
    Speech
    STEM
    Synthesis
    Taste Of Science
    Taxonomy
    Technical Report
    Time Series
    Tonic ID
    Tony
    Tutorial
    Unsupervised Feature Learning
    Urban
    Urban Sound Analysis
    Urban Sound Tagging
    Vamp
    Version Identification
    Visualization
    Vocaloid
    Vocoder
    Warblers
    Wav To Midi
    Welcome
    Wired
    WNYC
    Women In Science
    Workshop
    World Domination
    Wsf14
    Youtube

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology