Compute Deep Image & Audio Embeddings with OpenL3

27/1/2020

OpenL3 computes deep image & audio embeddings using a self-supervised L3-Net model and can be installed with a simple "pip install openl3" !

Back in May 2019 we announced the release of OpenL3, an open-source python library for computing deep audio embeddings using an improved L3-Net architecture trained on AudioSet. Today we're happy to announce the release of OpenL3 v0.3.0 which can also compute deep image embeddings!

Installing OpenL3 is easy (requires TensorFlow):

pip install openl3

And computing image embeddings is as easy as:

image = imread('/path/to/file.png')
emb = openl3.get_image_embedding(image, embedding_size=512)

Computing audio embeddings is equally simple:

audio, sr = soundfile.read('/path/to/file.wav')
embedding, timestamps = openl3.get_audio_embedding(audio, sr)

You can even process video files directly to obtain both image and audio embeddings:

openl3.process_video_file(video_filepath, output_dir='/path/to/output/folder')

Full instructions and all available options are described in the OpenL3 Tutorial. There's also an OpenL3 command line interface (CLI) if you want to script outside of python.

Full details about the embedding models and how they were trained can be found in:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
J. Cramer, H.-H. Wu, J. Salamon, and J. P. Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), pp 3852-3856, Brighton, UK, May 2019.
[IEEE][PDF][BibTeX][Copyright]

We hope the machine learning community, including both computer vision and machine listening researchers, find OpenL3 useful in their work!

0 Comments

Compute Deep Image & Audio Embeddings with OpenL3

Leave a Reply.

NEWS

Archives

Categories