This study investigated the effects of two perceptually-based training paradigms in both the perception and production of English /e/ and /æ/ by Cantonese ESL learners with high and low listening and oral proficiency levels. Sixty-four subjects participated in the study, in which 22 (9 with high proficiency, H-HV; 13 with low proficiency, L-HV) were trained under High Variability Phonetic Training (HVPT) approach, 19 (8 with high proficiency, H-LV; 11 with low proficiency, L-LV) were trained under Low Variability Phonetic Training (LVPT) approach whereas 23 (10 with high proficiency, H-CO; 13 with low proficiency, L-CO) were the control subjects. Both training approaches were effective in improving the subjects' perception of the two vowels, with HVPT groups showing more robust improvement than LVPT groups. Perceptual learning could also be generalized to new words and new speakers and be transferred to the production domain, with HVPT groups outperforming LVPT groups. However, subjects with different proficiency levels learned to similar degrees in all tests. The results demonstrated that both approaches offered a type of learning that allows attention to focus on phonetic information, which is different from what is learned in an L2 classroom; whereas stimulus variability also plays a role in the learning.