Topic 3 of the Model Inter-Comparison Study for Asia (MICS-Asia) Phase III examines how online coupled air quality models perform in simulating high aerosol pollution in the North China Plain region during wintertime haze events and evaluates the importance of aerosol radiative and microphysical feedbacks. A comprehensive overview of the MICS-Asia III Topic 3 study design, including descriptions of participating models and model inputs, the experimental designs, and results of model evaluation, are presented. Six modeling groups from China, Korea and the United States submitted results from seven applications of online coupled chemistry–meteorology models. Results are compared to meteorology and air quality measurements, including data from the Campaign on Atmospheric Aerosol Research Network of China (CARE-China) and the Acid Deposition Monitoring Network in East Asia (EANET). The correlation coefficients between the multi-model ensemble mean and the CARE-China observed near-surface air pollutants range from 0.51 to 0.94 (0.51 for ozone and 0.94 for PM2.5) for January 2010. However, large discrepancies exist between simulated aerosol chemical compositions from different models. The coefficient of variation (SD divided by the mean) can reach above 1.3 for sulfate in Beijing and above 1.6 for nitrate and organic aerosols in coastal regions, indicating that these compositions are less consistent from different models. During clean periods, simulated aerosol optical depths (AODs) from different models are similar, but peak values differ during severe haze events, which can be explained by the differences in simulated inorganic aerosol concentrations and the hygroscopic growth efficiency (affected by varied relative humidity). These differences in composition and AOD suggest that future models can be improved by including new heterogeneous or aqueous pathways for sulfate and nitrate formation under hazy conditions, a secondary organic aerosol (SOA) formation chemical mechanism with new volatile organic compound (VOCs) precursors, yield data and approaches, and a more detailed evaluation of the dependence of aerosol optical properties on size distribution and mixing state. It was also found that using the ensemble mean of the models produced the best prediction skill. While this has been shown for other conditions (for example, the prediction of high-ozone events in the US (McKeen et al., 2005)), this is to our knowledge the first time it has been shown for heavy haze events.