Adaptive DCTNet for audio signal classification

Yin Xian, Yunchen Pu, Zhe Gan, Liang Lu, Andrew Thompson

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

4 Citations (Scopus)


In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.
Original languageEnglish
Title of host publication2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Number of pages5
ISBN (Electronic)9781509041176, 9781509041169
ISBN (Print)9781509041183
Publication statusPublished - Mar 2017
Event2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017 - New Orleans, LA, United States
Duration: 5 Mar 20179 Mar 2017

Publication series

NameInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP)
ISSN (Electronic)2379-190X


Conference2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017
Country/TerritoryUnited States
CityNew Orleans, LA
Internet address

User-Defined Keywords

  • Adaptive DCTNet
  • audio signals
  • time-frequency analysis
  • RNN
  • feature extraction


Dive into the research topics of 'Adaptive DCTNet for audio signal classification'. Together they form a unique fingerprint.

Cite this