Category: Data Augmentation

Deep Convolutional Neural Networks and Data Augmentation For Environmental Sound Classification

20/1/2017

The ability of deep convolutional neural networks (CNN) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep convolutional neural network architecture for environmental sound classification. Second, we propose the use of audio data augmentation for overcoming the problem of data scarcity and explore the influence of different augmentations on the performance of the proposed CNN architecture. Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation. Finally, we examine the influence of each augmentation on the model’s classification accuracy for each class, and observe that the accuracy for each class is influenced differently by each augmentation, suggesting that the performance of the model could be improved further by applying class-conditional data augmentation.

For further details see our paper:

Deep Convolutional Neural Networks and Data Augmentation For Environmental Sound Classification
J. Salamon and J. P. Bello
IEEE Signal Processing Letters, In Press, 2017.
[IEEE][PDF][BibTeX][Copyright]

0 Comments

Pitch Analysis for Active Music Discovery @ ICML 2016

23/6/2016

0 Comments

Today I'll be giving an invited talk at the Machine Learning for Music Discovery Workshop as part of the ICML 2016 conference.

The talk is about Pitch Analysis for Active Music Discovery:

A significant proportion of commercial music is comprised of pitched content: a melody, a bass line, a famous guitar solo, etc. Consequently, algorithms that are capable of extracting and understanding this type of pitched content open up numerous opportunities for active music discovery, ranging from query-by-humming to musical-feature-based exploration of Indian art music or recommendation based on singing style. In this talk I will describe some of my work on algorithms for pitch content analysis of music audio signals and their application to music discovery, the role of machine learning in these algorithms, and the challenge posed by the scarcity of labeled data and how we may address it.

And here's the extended abstract:

Pitch Analysis for Active Music Discovery
J. Salamon
Machine Learning for Music Discovery workshop, International Conference on Machine Learning (ICML), invited talk, New York City, NY, USA, June 2016.
[PDF]

The workshop has a great program lined up, if your'e attending ICML 2016 be sure to drop by!

0 Comments

Deep Convolutional Neural Networks and Data Augmentation For Environmental Sound Classification

Pitch Analysis for Active Music Discovery @ ICML 2016

NEWS

Archives

Categories