Justin Salamon
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology

DESED: A New Dataset for Sound Event Detection and a Comparative Evaluation of SOTA Systems

4/5/2020

0 Comments

 
We present a comparative analysis of the performance of state-of-the-art sound event detection systems. In particular, we study the robustness of the systems to noise and signal degradation, which is known to impact model generalization. Our analysis is based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on, in addition to real-world recordings, a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics. Our results show that while overall systems exhibit significant improvements compared to previous work, they still suffer from biases that could prevent them from generalizing to real-world scenarios.

New Dataset: DESED (Domestic Environment Sound Event Detection)

The study was performed using a new dataset for sound event detection, DESED:
  • Dataset website: https://project.inria.fr/desed/
  • Includes both recorded and synthesized soundscapes
  • Includes isolated sound events allowing users to generate new data using Scaper

Sound Event Detection in Synthetic Domestic Environments
R. Serizel, N. Turpault, A. Shah, J. Salamon

In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020.
[IEEE][PDF][BibTeX][Copyright]
Picture
0 Comments

Chirping up the Right Tree: Incorporating Biological Taxonomies into Deep Bioacoustic Classifiers

3/5/2020

0 Comments

 
Class imbalance in the training data hinders the generalization ability of machine listening systems. In the context of bioacoustics, this issue may be circumvented by aggregating species labels into super-groups of higher taxonomic rank: genus, family, order, and so forth. However, different applications of machine listening to wildlife monitoring may require different levels of granularity. This paper introduces TaxoNet, a deep neural network for structured classification of signals from living organisms. TaxoNet is trained as a multitask and multilabel model, following a new architectural principle in end-to-end learning named "hierarchical composition": shallow layers extract a shared representation to predict a root taxon, while deeper layers specialize recursively to lower-rank taxa. In this way, TaxoNet is capable of handling taxonomic uncertainty, out-of-vocabulary labels, and open-set deployment settings. An experimental benchmark on two new bioacoustic datasets (ANAFCC and BirdVox-14SD) leads to state-of-the-art results in bird species classification. Furthermore, on a task of coarse-grained classification, TaxoNet also outperforms a flat single-task model trained on aggregate labels.

Try it out: pip install birdvoxdetect (https://github.com/BirdVox/birdvoxdetect)

Chirping up the Right Tree: Incorporating Biological Taxonomies into Deep Bioacoustic Classifiers
J. Cramer, V. Lostanlen, A. Farnsworth, J. Salamon, J.P. Bello
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020.
[IEEE][PDF][BibTeX][Copyright]
Picture
0 Comments

Few-Shot Sound Event Detection

3/5/2020

0 Comments

 
Locating perceptually similar sound events within a continuous recording is a common task for various audio applications. However, current tools require users to manually listen to and label all the locations of the sound events of interest, which is tedious and time-consuming. In this work, we (1) adapt state-of-the-art metric-based few-shot learning methods to automate the detection of similar-sounding events, requiring only one or few examples of the target event, (2) develop a method to automatically construct a partial set of labeled examples (negative samples) to reduce user labeling effort, and (3) develop an inference-time data augmentation method to increase detection accuracy. To validate our approach, we perform extensive comparative analysis of few-shot learning methods for the task of keyword detection in speech. We show that our approach successfully adapts closed-set few-shot learning approaches to an open-set sound event detection problem.

Few-Shot Sound Event Detection
Y. Wang, J. Salamon, N.J. Bryan and J.P. Bello
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020.
[IEEE][PDF][BibTeX][Copyright]
Picture
0 Comments

Disentangled Multidimensional Metric Learning for Music Similarity

3/5/2020

0 Comments

 
Music similarity search is useful for a variety of creative tasks such as replacing one music recording with another recording with a similar "feel", a common task in video editing. For this task, it is typically necessary to define a similarity metric to compare one recording to another. Music similarity, however, is hard to define and depends on multiple simultaneous notions of similarity (i.e. genre, mood, instrument, tempo). While prior work ignore this issue, we embrace this idea and introduce the concept of multidimensional similarity and unify both global and specialized similarity metrics into a single, semantically disentangled multidimensional similarity metric. To do so, we adapt a variant of deep metric learning called conditional similarity networks to the audio domain and extend it using track-based information to control the specificity of our model. We evaluate our method and show that our single, multidimensional model outperforms both specialized similarity spaces and alternative baselines. We also run a user-study and show that our approach is favored by human annotators as well.

Disentangled Multidimensional Metric Learning for Music Similarity
J. Lee, N.J. Bryan, J. Salamon, Z. Jin, J. Nam
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020.
[IEEE][PDF][BibTeX][Copyright]
Picture
0 Comments

    NEWS

    Machine listening research, code, data & hacks!

    Archives

    April 2022
    November 2021
    October 2021
    June 2021
    January 2021
    October 2020
    June 2020
    May 2020
    April 2020
    January 2020
    November 2019
    October 2019
    June 2019
    May 2019
    March 2019
    February 2019
    January 2019
    November 2018
    October 2018
    August 2018
    July 2018
    May 2018
    April 2018
    February 2018
    October 2017
    August 2017
    July 2017
    June 2017
    April 2017
    March 2017
    January 2017
    December 2016
    November 2016
    October 2016
    August 2016
    June 2016
    May 2016
    April 2016
    February 2016
    January 2016
    November 2015
    October 2015
    July 2015
    June 2015
    April 2015
    February 2015
    November 2014
    October 2014
    September 2014
    June 2014
    April 2014
    March 2014
    February 2014
    December 2013
    September 2013
    July 2013
    May 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    August 2012
    July 2012
    June 2012

    Categories

    All
    ACM MM'13
    ACM MM'14
    Acoustic Ecology
    Acoustic Event Detection
    Acoustic Sensing
    AES
    Applied Acoustics
    Article
    Audio-annotator
    Audio To Midi
    Auditory Scene Analysis
    Avian
    Award
    Baseball
    Beer
    Best Oral Presentation
    Best Paper Award
    Best Student Paper Award
    BigApps
    Bioacoustics
    BirdVox
    Book
    Chapter
    CHI
    Citizen Science
    Classification
    Computer Vision
    Conference
    Connected Cities
    Convolutional Neural Networks
    Cornell Lab Of Ornithology
    Coursera
    Cover Detection
    CREPE
    Crowdcrafting
    Crowdsourcing
    CUSP
    CVPR
    Data Augmentation
    Data Science
    Dataset
    Data Structures
    Dcase
    Deep Learning
    Domain
    Education
    Entrepreneurship
    Environmental Sound
    Essentia
    Eusipco
    Eusipco2015
    Evaluation
    Few-shot Learning
    Flight Calls
    Girl Scouts
    Grant
    Hackathon
    Hackday
    Hackfest
    HCI
    Hildegard Von Bingen
    ICASSP
    ICASSP 2020
    IEEE Signal Processing Letters
    Ieee Spm
    Indian Classical Music
    Interface
    Interspeech
    Interview
    Ismir 2012
    Ismir2014
    Ismir2015
    Ismir2016
    Ismir2017
    Ismir2020
    ITP
    Jams
    Javascript
    JNMR
    Journal
    Machine Learning
    Machine Listening
    Map
    Media
    Melodia
    Melody Extraction
    Metric Learning
    Midi
    Migration Monitoring
    MIR
    Mir_eval
    MOOC
    MTG-QBH
    Music Informatics
    Music Information Retrieval
    Music Similarity
    National Science Foundation
    Neumerator
    New York Times
    Noise Pollution
    Notebook
    NPR
    NSF
    NYC
    NYU
    Open Source
    Pitch
    Pitch Contours
    Pitch Tracking
    Plos One
    Plug In
    Plug-in
    Presentation
    Press
    PRI
    Prosody
    Publication
    Python
    Query By Humming
    Query-by-humming
    Radio
    Representation Learning
    Research
    Robots
    Scaper
    Science And The City
    Science Friday
    Self-supervision
    Sensor Network
    Sensors
    Sight And Sound Workshop
    Smart Cities
    Software
    SONYC
    Sound Classification
    Sound Education
    Sound Event Detection
    Soundscape
    Sounds Of New York City
    Sound Workshop
    Speech
    STEM
    Synthesis
    Taste Of Science
    Taxonomy
    Technical Report
    Time Series
    Tonic ID
    Tony
    Tutorial
    Unsupervised Feature Learning
    Urban
    Urban Sound Analysis
    Urban Sound Tagging
    Vamp
    Version Identification
    Visualization
    Vocaloid
    Vocoder
    Warblers
    Wav To Midi
    Welcome
    Wired
    WNYC
    Women In Science
    Workshop
    World Domination
    Wsf14
    Youtube

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology