Abstract
Most state-of-the-art phone classifiers use the same features and decision criteria for all phones, despite the fact that different broad classes are characterized by different manners and place of articulation that result in different acoustic features. This paper uses manifold learning to address structure in the acoustic space. Previous approaches to dimensionality reduction based on manifold learning assumed that the acoustic space can be characterized by a uniform manifold structure. In this paper we relax this assumption by learning different manifold structures for broad phonetic classes. Because all known classifiers make confusions between broad classes, we designed a two-level classifier in which the top level consists of a number of partially overlapping broad classes. Since the resulting classifiers are not statistically independent, we propose a new method for fusing the classifiers. Experimental results show that our two-level classifier obtained slightly better results when broad-class specific manifolds were learned, compared to a uniform manifold. However, the accuracy is still considerably lower than what could be obtained with oracle knowledge about broad class membership. From this we infer that phones do not form compact clusters in acoustic space.
Original language | English |
---|---|
Pages (from-to) | 28-45 |
Number of pages | 18 |
Journal | Computer Speech and Language |
Volume | 38 |
DOIs | |
Publication status | Published - 1 Jul 2016 |
Scopus Subject Areas
- Software
- Theoretical Computer Science
- Human-Computer Interaction
User-Defined Keywords
- Classifier fusion
- Dimensionality reduction
- Manifold learning
- Partial classification
- Phone classification
- TIMIT