Justin Salamon
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology

Elected to the IEEE Audio and Acoustic Signal Processing Technical Committee

14/1/2020

0 Comments

 
Picture
I'm happy to report I've been elected to the IEEE Audio and Acoustic Signal Processing Technical Committee. 

The AASP TC's mission is to support, nourish and lead scientific and technological development in all areas of audio and acoustic signal processing. These areas are currently seeing increased levels of interest and significant growth providing a fertile ground for a broad range of specific and interdisciplinary research and development. Ranging from array processing for microphones and loudspeakers to music genre classification, from psychoacoustics to machine learning, from consumer electronics devices to blue-sky research, this remit encompasses countless technical challenges and many hot topics. The TC numbers some 30 appointed volunteer members drawn roughly equally from leading academic and industrial organizations around the world, unified by the common aim to offer their expertise in the service of the scientific community.

Looking forward to doing my bit for this excellent scientific community!

0 Comments

SONYC-UST: A MULTILABEL DATASET FROM AN URBAN ACOUSTIC SENSOR NETWORK

5/1/2020

0 Comments

 
SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for realworld urban noise monitoring. It consists of 3068 audio recordings from the "Sounds of New York City" (SONYC) acoustic sensor network: 
Picture
Via the Zooniverse citizen science platform, volunteers tagged the presence of 23 fine-grained classes that were chosen in consultation with the New York City Department of Environmental Protection. These 23 fine-grained classes can be grouped into eight coarse-grained classes:
Picture
The SONC-UST taxonomy (click to enlarge)

For more details please see:

SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network
M. Cartwright, A. E. Mendez Mendez, J. Cramer, V. Lostanlen, G. Dove, H.-H. Wu, J. Salamon, O. Nov, and J.P. Bello
Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 35-39, New York University, NY, USA, Oct. 2019.

Download SONYC-UST: 
https://doi.org/10.5281/zenodo.3338310
0 Comments

TriCycle: Audio Representation Learning from Sensor Network Data Using Self-Supervision

4/11/2019

0 Comments

 
Self-supervised representation learning with deep neural networks is a powerful tool for machine learning tasks with limited labeled data but extensive unlabeled data. To learn representations, self- supervised models are typically trained on a pretext task to predict structure in the data (e.g. audio-visual correspondence, short-term temporal sequence, word sequence) that is indicative of higher-level concepts relevant to a target, downstream task. Sensor networks are promising yet unexplored sources of data for self-supervised learning - they collect large amounts of unlabeled yet timestamped data over extended periods of time and typically exhibit long-term temporal structure (e.g., over hours, months, years) not observable at the short time scales previously explored in self-supervised learning (e.g., seconds). This structure can be present even in single-modal data and therefore could be exploited for self-supervision in many types of sensor networks. In this work, we present a model for learning audio representations by predicting the long-term, cyclic temporal structure in audio data collected from an urban acoustic sensor network. We then demonstrate the utility of the learned audio representation in an urban sound event detection task with limited labeled data.
Picture


Read the full paper here:

TriCycle: Audio Representation Learning from Sensor Network Data Using Self-Supervision
M. Cartwright, J. Cramer, J. Salamon, and J.P. Bello
In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2019.
[IEEE][PDF][BibTeX][Copyright]
0 Comments

Robust Sound Event Detection in Bioacoustic Sensor Networks

24/10/2019

0 Comments

 
The innovation: we present a context-adaptive deep network that uses an auxiliary sub-network to model the background environment to improve inference-time robustness to changing environments.

The results: The proposed model produces state-of-the results on flight call detection that are robust to environmental changes across time and space.

The surprise: Interestingly, we find that while context adaptation alone doesn't help significantly, and applying PCEN pre-processing doesn't help much either, applying both combined leads to dramatic gains.

The tech: We release BirdVoxDetect, an open-source tool for automatically detecting avian flight calls in continuous audio recordings: https://github.com/BirdVox/birdvoxdetect

Installing BirdVoxDetect (assuming Python is installed) is as easy as calling : pip install birdvoxdetect

Full paper:
Robust Sound Event Detection in Bioacoustic Sensor Networks
V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, and J.P. Bello
PLoS ONE 14(10): e0214168, 2019. DOI: https://doi.org/10.1371/journal.pone.0214168
[PLoS ONE][PDF][BibTeX]

Model block diagram:
Picture
Abstract:
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 ms) and long-term (30 min) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer, i.e. an affine layer whose weights are dynamically adapted at prediction time by an auxiliary network taking long-term summary statistics of spectrotemporal features as input. We show that PCEN reduces temporal overfitting across dawn vs. dusk audio clips whereas context adaptation on PCEN-based summary statistics reduces spatial overfitting across sensor locations. Moreover, combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.
0 Comments

Build a Web-Based Ukulele Tuner using CREPE

7/10/2019

0 Comments

 
The YouTube coding channel The Coding Train just published its latest Coding Challenge: "Ukulele Tuner with Machine Learning Pitch Detection Model", where they use our CREPE model to build a web-based ukulele tuner! Awesome!

They use CREPE via the ml5.js library which offers a variety of ML algorithms for the web. It was a delightful surprise to discover that CREPE is being used to power the pitchDetection function.

You can watch the full video here:
0 Comments

What's Broken in Music Informatics Research? Three Uncomfortable Statements

20/6/2019

0 Comments

 
Companion website for my invited talk at the Machine Learning for Music Discovery workshop at ICML 2019: "What's Broken in Music Informatics Research? Three Uncomfortable Statements".

VIDEO:

PAPER:

What's Broken in Music Informatics Research? Three Uncomfortable Statements
Justin Salamon

Invited talk, Machine Learning for Music Discovery workshop, International Conference on Machine Learning (ICML), Long Beach, California, USA, June 2019.
[PDF][Video]


SUPPLEMENTARY MATERIAL:

Figure 1 (see paper for details):
Picture
Figure 1: (a) ground truth melody f0 (pitch) sequence, (b) first algorithm's melody estimate, (c) second algorithm's melody estimate.
0 Comments

Bioacoustics Datasets: A New Website Listing Bioacoustics Datasets and Repositories

25/5/2019

0 Comments

 

I couldn't find a centralized list of #bioacoustics datasets (for #machinelearning or otherwise), so I created this page. Feedback welcome! #opendatahttps://t.co/42YHDBiBtM

— Justin Salamon (@justin_salamon) May 24, 2019
To explore Bioacoustic Datasets, a centralized list of bioacoustics datasets and repositories, visit: https://bioacousticsdatasets.weebly.com
0 Comments

OpenL3: A Competitive and Open Deep Audio Embedding

7/5/2019

0 Comments

 
We're excited to announce the release of OpenL3, an open-source deep audio embedding based on the self-supervised L3-Net. OpenL3 is an improved version of L3-Net, and outperforms VGGish and SoundNet (and the original L3-Net) on several sound recognition tasks. Most importantly, OpenL3 is open source and readily available for everyone to use: if you have TensorFlow installed just run pip install openl3 and you're good to go!

Full details are provided in our paper:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
J. Cramer, H.-H. Wu, J. Salamon, and J. P. Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), pp 3852-3856, Brighton, UK, May 2019.
[IEEE][PDF][BibTeX][Copyright]

How well does it work?

Here's a comparison of classification results on three environmental sound datasets using embeddings from OpenL3 (blue), SoundNet (orange) and VGGish (green) as input to a simple 2-layer MLP:
Picture
Using OpenL3 we are able to match the current state of the art on UrbanSound8K, the most challenging of the three datasets, using a simple MLP without any of the tricks usually necessary for relatively small datasets (such as data augmentation). 

Using OpenL3

Installing OpenL3, a Python module, is as easy as calling (assuming TensorFlow is already installed):
$ pip install openl3

Once installed, using OpenL3 in python can be done like this (simplest use case without setting custom parameter values):
import openl3
import soundfile as sf

audio, sr = sf.read('/path/to/file.wav')
embedding, timestamps = openl3.get_embedding(audio, sr)

We also provide a command-line interface (CLI) that can be launched by calling "openl3" from the command line:
$ openl3 /path/to/file.wav

The API (both python and CLI) includes more options such as changing the hop size used to extract the embedding, the output dimensionality of the embedding and several other parameters. A good place to start is the OpenL3 tutorial.

How was OpenL3 trained?

OpenL3 is an improved version of L3-Net by Arandjelovic and Zisserman, which is trained on a subset of AudioSet using self-supervision by exploiting the correspondence between sound and visual objects in video data:
Picture
The embedding is obtained by taking the output of the final convolutional layer of the audio subnetwork. For more details please see our paper.

We look forward to seeing what the community does with OpenL3!

...and, if you're attending ICASSP 2019, be sure to stop by our poster on Friday, May 17 between 13:30-15:30 (session MLSP-P17: Deep Learning V, Poster Area G, paper 2149)! 
0 Comments

HistoryTracker: Minimizing Human Interactions in Baseball Game Annotation

4/5/2019

0 Comments

 
The sport data tracking systems available today are based on specialized hardware (high-definition cameras, speed radars, RFID) to detect and track targets on the field. While effective, implementing and maintaining these systems pose a number of challenges, including high cost and need for close human monitoring. On the other hand, the sports analytics community has been exploring human computation and crowdsourcing in order to produce tracking data that is trustworthy, cheaper and more accessible. However, state-of-the-art methods require a large number of users to perform the annotation, or put too much burden into a single user. We propose HistoryTracker, a methodology that facilitates the creation of tracking data for baseball games by warm-starting the annotation process using a vast collection of historical data. We show that HistoryTracker helps users to produce tracking data in a fast and reliable way.

HistoryTracker: Minimizing Human Interactions in Baseball Game Annotation
J. P. Ono, A. Gjoka, J. Salamon, C. A. Dietrich, and C. T. Silva
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI'19), Glasgow, UK, May 2019.
[ACM][PDF][BibTeX]

The paper received a CHI 2019 Honorable Mention Award:
Picture
30-second teaser video:
Jorge's full presentation at CHI 2019:
Picture
0 Comments

DCASE 2019 Workshop in NYC: Call for Papers

22/3/2019

0 Comments

 
The 4th Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2019, will be held in New York City on October 25-26 2019: 
http://dcase.community/workshop2019/

The workshop immediately follows WASPAA 2019 and SANE 2019, also hosted in New York, offering a full week of exciting audio related research!

As in previous years the workshop is organized in conjunction with the DCASE challenge. We aim to bring together researchers from many different universities and companies with an interest in the topic, and provide the opportunity for scientific exchange of ideas and opinions.

The technical program will include invited speakers on the topic of computational everyday sound analysis and recognition, and oral and poster presentations of accepted papers. In addition, a special poster session will be dedicated to the DCASE 2019 challenge entries and results.

We invite submissions on the topics of computational analysis of acoustic scenes and sound events, including but not limited to:

Tasks in computational environmental audio analysis
  • Acoustic scene classification
  • Sound event detection and localization
  • Audio tagging
  • Challenges in real-life applications (e.g., rare events, overlapping sound events, weak labels)

Methods for computational environmental audio analysis
  • Signal processing methods
  • Machine learning methods
  • Auditory-motivated methods
  • Cross-disciplinary methods involving, e.g., acoustics, biology, psychology, geography, materials science, transports science

Resources, applications, and evaluation of computational environmental audio analysis
  • Publicly available datasets or software, taxonomies and ontologies, evaluation procedures
  • Ethics, privacy, responsible research
  • Applications
  • Description of systems submitted to the DCASE 2019 Challenge, expanded from the challenge technical report submissions to include evaluation results and comparison.

Reproducible research with open-source code and open data is encouraged (but not mandatory).

Important notice for challenge participants
Note that while each DCASE challenge submission must be accompanied by a technical report describing the system, in order to be considered for publication at the peer reviewed workshop, such reports must be augmented with final results from the challenge and a careful analysis of those results in the context of the other submissions in a way that provides meaningful, useable insight.

IMPORTANT DATES
  • 05 Jul 2019: Workshop paper abstract submission
  • 12 Jul 2019: Workshop paper submission
  • 23 Aug 2019: Notification of paper acceptance
  • 25 Oct 2019 - 26 Oct 2019: Workshop

We look forward to receiving your submissions!

GENERAL CHAIRS
Juan P. Bello, New York University
Mark Cartwright, New York University

PROGRAM CHAIRS
Daniel P. W. Ellis, Google, Inc.
Michael Mandel, Brooklyn College (CUNY)
Justin Salamon, Adobe Research

LOCAL ORGANIZATION
Vincent Lostanlen, Cornell University
Picture
0 Comments
<<Previous
Forward>>

    NEWS

    Machine listening research, code, data & hacks!

    Archives

    March 2023
    April 2022
    November 2021
    October 2021
    June 2021
    January 2021
    October 2020
    June 2020
    May 2020
    April 2020
    January 2020
    November 2019
    October 2019
    June 2019
    May 2019
    March 2019
    February 2019
    January 2019
    November 2018
    October 2018
    August 2018
    July 2018
    May 2018
    April 2018
    February 2018
    October 2017
    August 2017
    July 2017
    June 2017
    April 2017
    March 2017
    January 2017
    December 2016
    November 2016
    October 2016
    August 2016
    June 2016
    May 2016
    April 2016
    February 2016
    January 2016
    November 2015
    October 2015
    July 2015
    June 2015
    April 2015
    February 2015
    November 2014
    October 2014
    September 2014
    June 2014
    April 2014
    March 2014
    February 2014
    December 2013
    September 2013
    July 2013
    May 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    August 2012
    July 2012
    June 2012

    Categories

    All
    ACM MM'13
    ACM MM'14
    Acoustic Ecology
    Acoustic Event Detection
    Acoustic Sensing
    AES
    Applied Acoustics
    Article
    Audio-annotator
    Audio To Midi
    Auditory Scene Analysis
    Avian
    Award
    Baseball
    Beer
    Best Oral Presentation
    Best Paper Award
    Best Student Paper Award
    BigApps
    Bioacoustics
    BirdVox
    Book
    Chapter
    CHI
    Citizen Science
    Classification
    Computer Vision
    Conference
    Connected Cities
    Convolutional Neural Networks
    Cornell Lab Of Ornithology
    Coursera
    Cover Detection
    CREPE
    Crowdcrafting
    Crowdsourcing
    CUSP
    CVPR
    Data Augmentation
    Data Science
    Dataset
    Data Structures
    Dcase
    Deep Learning
    Domain
    Education
    Entrepreneurship
    Environmental Sound
    Essentia
    Eusipco
    Eusipco2015
    Evaluation
    Few-shot Learning
    Flight Calls
    Girl Scouts
    Grant
    Hackathon
    Hackday
    Hackfest
    HCI
    Hildegard Von Bingen
    ICASSP
    ICASSP 2020
    IEEE Signal Processing Letters
    Ieee Spm
    Indian Classical Music
    Interface
    Interspeech
    Interview
    Ismir 2012
    Ismir2014
    Ismir2015
    Ismir2016
    Ismir2017
    Ismir2020
    ITP
    Jams
    Javascript
    JNMR
    Journal
    Machine Learning
    Machine Listening
    Map
    Media
    Melodia
    Melody Extraction
    Metric Learning
    Midi
    Migration Monitoring
    MIR
    Mir_eval
    MOOC
    MTG-QBH
    Music Informatics
    Music Information Retrieval
    Music Similarity
    National Science Foundation
    Neumerator
    New York Times
    Noise Pollution
    Notebook
    NPR
    NSF
    NYC
    NYU
    Open Source
    Pitch
    Pitch Contours
    Pitch Tracking
    Plos One
    Plug In
    Plug-in
    Presentation
    Press
    PRI
    Prosody
    Publication
    Python
    Query By Humming
    Query-by-humming
    Radio
    Representation Learning
    Research
    Robots
    Scaper
    Science And The City
    Science Friday
    Self-supervision
    Sensor Network
    Sensors
    Sight And Sound Workshop
    Smart Cities
    Software
    SONYC
    Sound Classification
    Sound Education
    Sound Event Detection
    Soundscape
    Sounds Of New York City
    Sound Workshop
    Speech
    STEM
    Synthesis
    Taste Of Science
    Taxonomy
    Technical Report
    Time Series
    Tonic ID
    Tony
    Tutorial
    Unsupervised Feature Learning
    Urban
    Urban Sound Analysis
    Urban Sound Tagging
    Vamp
    Version Identification
    Visualization
    Vocaloid
    Vocoder
    Warblers
    Wav To Midi
    Welcome
    Wired
    WNYC
    Women In Science
    Workshop
    World Domination
    Wsf14
    Youtube

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • News
  • Research
  • Publications
  • Code/Data
  • Melody Extraction
  • PhD Thesis
  • Contact
    • Music
    • Music Technology