Research Interests
OVERVIEW
My research focuses on the application of machine learning and signal processing to audio & video, with applications in machine listening, audiovisual and multi-modal understanding, representation learning & self-supervision, audio for video, music information retrieval, bioacoustics, environmental sound analysis and open source software & data.
For a complete list of my publications please go to my publications page.
REPRESENTATION LEARNING & SELF-SUPERVISION FOR AUDIO-VISUAL DATA
I'm interested in how we can leverage large quantities of unlabeled audio and video data to learn robust & discriminative audio, image and video representations.
Sample publication:
MACHINE LISTENING
I'm interested in models for the automatic extraction of high-level semantic information from acoustic environments under real-world constraints such as limited data, mismatched data (domain adaptation), and weak labels.
I was a founding member of the SONYC project, on which I remain a collaborator.
Sample publication:
MUSIC INFORMATION RETRIEVAL
My work in MIR is focused on content-based MIR, including melody extraction and multi-pitch estimation in polyphonic audio, music classification, transcription, query-by-example/humming, music retrieval and indexing, music and melodic similarity.
I am the author of Melodia, an algorithm for melody extraction from polyphonic music signals that is widely used by researchers, teachers and artists.
Sample publication:
BIOACOUSTICS
In the area of bioacoustics my work is focused on the automatic recognition of bird species from their vocalizations.
I was a founding member of the BirdVox project, on which I remain a collaborator.
Sample publication:
REPRODUCIBLE RESEARCH
I am actively involved in the development of tools and datasets for open, reproducible research including mir_eval, Scaper, JAMS, Essentia, MIR.EDU, MedleyDB and UrbanSound datasets to name some. For a full list see my code/data page.
Sample publication:
My research focuses on the application of machine learning and signal processing to audio & video, with applications in machine listening, audiovisual and multi-modal understanding, representation learning & self-supervision, audio for video, music information retrieval, bioacoustics, environmental sound analysis and open source software & data.
For a complete list of my publications please go to my publications page.
REPRESENTATION LEARNING & SELF-SUPERVISION FOR AUDIO-VISUAL DATA
I'm interested in how we can leverage large quantities of unlabeled audio and video data to learn robust & discriminative audio, image and video representations.
Sample publication:
- It's Time for Artistic Correspondence in Music and Video
D. Surís, C. Vondrick, B. Russell, J. Salamon
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[CVF][arXiv]
MACHINE LISTENING
I'm interested in models for the automatic extraction of high-level semantic information from acoustic environments under real-world constraints such as limited data, mismatched data (domain adaptation), and weak labels.
I was a founding member of the SONYC project, on which I remain a collaborator.
Sample publication:
- Deep Convolutional Neural Networks and Data Augmentation For Environmental Sound Classification
J. Salamon and J. P. Bello
IEEE Signal Processing Letters, 24(3), pages 279 - 283, 2017.
[IEEE][PDF][Copyright]
MUSIC INFORMATION RETRIEVAL
My work in MIR is focused on content-based MIR, including melody extraction and multi-pitch estimation in polyphonic audio, music classification, transcription, query-by-example/humming, music retrieval and indexing, music and melodic similarity.
I am the author of Melodia, an algorithm for melody extraction from polyphonic music signals that is widely used by researchers, teachers and artists.
Sample publication:
- Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics
J. Salamon and E. Gómez
IEEE Transactions on Audio, Speech and Language Processing, 20(6):1759-1770, Aug. 2012.
[IEEE][DOI][PDF][BibTeX][Copyright]
BIOACOUSTICS
In the area of bioacoustics my work is focused on the automatic recognition of bird species from their vocalizations.
I was a founding member of the BirdVox project, on which I remain a collaborator.
Sample publication:
- Robust Sound Event Detection in Bioacoustic Sensor Networks
V. Lostanlen ,J. Salamon, A. Farnsworth, S. Kelling, and J.P. Bello
PLoS ONE 14(10): e0214168, 2019. DOI: https://doi.org/10.1371/journal.pone.0214168
[PLoS ONE][PDF][BibTeX]
REPRODUCIBLE RESEARCH
I am actively involved in the development of tools and datasets for open, reproducible research including mir_eval, Scaper, JAMS, Essentia, MIR.EDU, MedleyDB and UrbanSound datasets to name some. For a full list see my code/data page.
Sample publication:
- Open-source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research
B. McFee, J. W. Kim, M. Cartwright, J. Salamon, R. M. Bittner, and J. P. Bello.
IEEE Signal Processing Magazine, 36(1):128–137, Jan. 2019.
[IEEE][PDF][BibTeX][Copyright]
Scientific Service
Mentoring:
Board member:
Conference TC:
Conference/workshop organization committee:
Journal review:
Conference review:
Professional organizations:
- For the past several years I have participated as a mentor in the Women in Music Information Retrieval (WiMIR) mentoring program.
Board member:
- International Society for Music Information Retrieval (ISMIR), member-at-large
Conference TC:
- ISMIR
- ICASSP
- WASPAA
Conference/workshop organization committee:
Journal review:
- ACM Transactions on Intelligent Systems and Technology
- Acta Acustica united with Acustica
- Avian Conservation and Ecology
- Circuits, Systems, and Signal Processing
- Computational Intelligence
- EURASIP Journal on Advances on Signal Processing - Special issue on Digital Audio Effects
- IEEE Journal of Selected Topics in Signal Processing - Special issue on Music Signal Processing
- IEEE Signal Processing Letters
- IEEE Signal Processing Magazine
- IEEE Transactions on Emerging Topics in Computing
- IEEE Transactions on Audio, Speech and Language Processing
- IEEE Transactions on Multimedia
- IEEE Transactions on Neural Networks and Learning Systems
- International Journal of Multimedia Information Retrieval
- Journal of New Music Research
- Journal of the Audio Engineering Society
- Journal of Open Source Software (JOSS)
- PLOS ONE
- Sensors
Conference review:
- ICMC 2009, SMC 2010, SMC 2011, ICME 2011, SMC 2012, ISMIR 2012, EUSIPCO 2013, ISMIR 2013, WASPAA 2013, AES53, ICMC-SMC 2014, ISMIR 2014, ICASSP 2015, WASPAA 2015, ISMIR 2015, EUSIPCO 2015, ICASSP 2016, ISMIR 2016, ICASSP 2017, ISMIR 2018, ISMIR 2019, DCASE 2019, ICASSP 2020, ML4MD 2020, ISMIR 2020, ICASSP 2021, ISMIR 2021, WASPAA 2021, ICASSP 2022, ISMIR 2022, ISMIR 2023, ICASSP 2024
Professional organizations:
- TC member, IEEE AASP
- Member, IEEE SPS
- Member, IEEE
Teaching
- Even a Geek Can Speak: The Do's and Don'ts of Giving a Public Presentation, (2019-current): Abobe, intern program.
- Machine Learning for Sound Classification (2016): NYU, graduate lecture.
- How Not to Lose Your Code, Your Degree, and Your Future Job, (2013-2016): NYU, graduate seminar [slides]
- Even a Geek Can Speak: The Do's and Don'ts of Giving a Public Presentation, (2013-2016): NYU, graduate seminar.
- Sound Creation Lab, (2011-2012, 2012-2013): UPF, undergraduate course.
- Probabilitat i processos estocàstics (Probability & Stochastic Processes), (2010-2011): UPF, undergraduate course.