Back in May 2019 we announced the release of OpenL3, an open-source python library for computing deep audio embeddings using an improved L3-Net architecture trained on AudioSet. Today we're happy to announce the release of OpenL3 v0.3.0 which can also compute deep image embeddings!
Installing OpenL3 is easy (requires TensorFlow):
pip install openl3
image = imread('/path/to/file.png') emb = openl3.get_image_embedding(image, embedding_size=512)
audio, sr = soundfile.read('/path/to/file.wav') embedding, timestamps = openl3.get_audio_embedding(audio, sr)
openl3.process_video_file(video_filepath, output_dir='/path/to/output/folder')
Full details about the embedding models and how they were trained can be found in:
Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
J. Cramer, H.-H. Wu, J. Salamon, and J. P. Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), pp 3852-3856, Brighton, UK, May 2019.
[IEEE][PDF][BibTeX][Copyright]
We hope the machine learning community, including both computer vision and machine listening researchers, find OpenL3 useful in their work!