Blog Archives

Compute Deep Image & Audio Embeddings with OpenL3

27/1/2020

OpenL3 computes deep image & audio embeddings using a self-supervised L3-Net model and can be installed with a simple "pip install openl3" !

Back in May 2019 we announced the release of OpenL3, an open-source python library for computing deep audio embeddings using an improved L3-Net architecture trained on AudioSet. Today we're happy to announce the release of OpenL3 v0.3.0 which can also compute deep image embeddings!

Installing OpenL3 is easy (requires TensorFlow):

pip install openl3

And computing image embeddings is as easy as:

image = imread('/path/to/file.png')
emb = openl3.get_image_embedding(image, embedding_size=512)

Computing audio embeddings is equally simple:

audio, sr = soundfile.read('/path/to/file.wav')
embedding, timestamps = openl3.get_audio_embedding(audio, sr)

You can even process video files directly to obtain both image and audio embeddings:

openl3.process_video_file(video_filepath, output_dir='/path/to/output/folder')

Full instructions and all available options are described in the OpenL3 Tutorial. There's also an OpenL3 command line interface (CLI) if you want to script outside of python.

Full details about the embedding models and how they were trained can be found in:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
J. Cramer, H.-H. Wu, J. Salamon, and J. P. Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), pp 3852-3856, Brighton, UK, May 2019.
[IEEE][PDF][BibTeX][Copyright]

We hope the machine learning community, including both computer vision and machine listening researchers, find OpenL3 useful in their work!

0 Comments

Adobe Audio Research Interns Submit Papers to ICASSP, CVPR, ICLR, CHI, IEEE VR, INTERSPEECH, ISMIR

24/1/2020

0 Comments

This past summer we had the pleasure of hosting 14 brilliant students (most of them pursuing their PhD but also Master's and Bachelor's) for audio research internships in San Francisco and Seattle. We worked together on a wide range of audio-related topics including speech processing, music, video, animation, spatial audio, VR, DAFX... the works! The projects also covered a range of disciplines including machine learning (including a healthy dose of deep learning), signal processing and human computer interaction; and a range of problems within these disciplines such as self-supervision and representation learning, metric learning, classification, transformation and generation (synthesis).

As a mentor, it was an incredible experience to guide, collaborate, and learn from this diverse group of people coming from a variety of disciplines and universities in 5 different countries. Between interns and mentors we spanned 11 different countries of origin, making it a truly international group! I'm also delighted to say a large proportion of internship projects have resulted in paper submissions to top-tier venues including ICASSP, CVPR, ICLR, CHI and IEEE VR, with upcoming submissions to INTERSPEECH and ISMIR in preparation!

The 2019 audio research interns were:

Emma Frid, KTH Royal Institute of Technology
Nathan Keil, Rensselaer Polytechnic Institute
Jongpil Lee, Korea Advanced Institute of Science and Technology
Stylianos Mimilakis, Technical University of Ilmenau
Max Morrison, Northwestern University
Kaizhi Qian, University of Illinois at Urbana-Champaign
Lucas Rencker, University of Surrey
Oona Risse-Adams, University of California, Santa Cruz
Jiaqi Su, Princeton University
Zhenyu Tang, University of Maryland-College Park
Yapeng Tian, University of Rochester
Yu Wang, New York University
Karren Yang, Massachusetts Institute of Technology
Yang Zhou, University of Massachusetts Amherst

I look forward to keeping in touch with everyone and hope we get to collaborate again in the future!

The 2019 San Francisco audio research mentors and interns (Seattle team we love you!)

0 Comments

Elected to the IEEE Audio and Acoustic Signal Processing Technical Committee

14/1/2020

0 Comments

I'm happy to report I've been elected to the IEEE Audio and Acoustic Signal Processing Technical Committee.

The AASP TC's mission is to support, nourish and lead scientific and technological development in all areas of audio and acoustic signal processing. These areas are currently seeing increased levels of interest and significant growth providing a fertile ground for a broad range of specific and interdisciplinary research and development. Ranging from array processing for microphones and loudspeakers to music genre classification, from psychoacoustics to machine learning, from consumer electronics devices to blue-sky research, this remit encompasses countless technical challenges and many hot topics. The TC numbers some 30 appointed volunteer members drawn roughly equally from leading academic and industrial organizations around the world, unified by the common aim to offer their expertise in the service of the scientific community.

Looking forward to doing my bit for this excellent scientific community!

0 Comments

SONYC-UST: A MULTILABEL DATASET FROM AN URBAN ACOUSTIC SENSOR NETWORK

5/1/2020

0 Comments

SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for realworld urban noise monitoring. It consists of 3068 audio recordings from the "Sounds of New York City" (SONYC) acoustic sensor network:

Via the Zooniverse citizen science platform, volunteers tagged the presence of 23 fine-grained classes that were chosen in consultation with the New York City Department of Environmental Protection. These 23 fine-grained classes can be grouped into eight coarse-grained classes:

The SONC-UST taxonomy (click to enlarge)

For more details please see:

SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network
M. Cartwright, A. E. Mendez Mendez, J. Cramer, V. Lostanlen, G. Dove, H.-H. Wu, J. Salamon, O. Nov, and J.P. Bello
Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 35-39, New York University, NY, USA, Oct. 2019.

Download SONYC-UST: https://doi.org/10.5281/zenodo.3338310

0 Comments

Compute Deep Image & Audio Embeddings with OpenL3

Adobe Audio Research Interns Submit Papers to ICASSP, CVPR, ICLR, CHI, IEEE VR, INTERSPEECH, ISMIR

Elected to the IEEE Audio and Acoustic Signal Processing Technical Committee

SONYC-UST: A MULTILABEL DATASET FROM AN URBAN ACOUSTIC SENSOR NETWORK

NEWS

Archives

Categories