TY - GEN
T1 - Comparative Study of Classifier-Guided, Conditional and Classifier-Free Diffusion Models for Minecraft Scene Generation
AU - Lu, Xuanchen
N1 - Publisher copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/12/19
Y1 - 2025/12/19
N2 - Diffusion models have recently emerged as a powerful class of generative models, achieving state-of-the-art performance in many image synthesis tasks. While unconditional denoising diffusion probabilistic models (DDPMs) can generate high-quality images, they lack the ability to produce outputs aligned with specific user-defined semantics. To enable controllable generation, several conditional extensions have been proposed, including Conditional DDPMs, Classifier-Guided DDPMs, and Classifier-Free DDPMs. However, a systematic comparison of these methods under consistent training and evaluation settings remains limited. In this study, this paper conduct a comprehensive comparative analysis of the three conditional diffusion frameworks within the context of Minecraft scene generation. Using a curated version of the MineRL dataset, this paper train all models under identical architectural and hyperparameter configurations. This paper evaluate their performance using both quantitative metrics—Fréchet Inception Distance (FID), Inception Score (IS), and CLIP-based similarity—and qualitative visual inspection. Our results show that the Conditional DDPM achieves the best overall performance in terms of image quality and semantic alignment, as evidenced by its lowest FID and highest CLIP scores. The Classifier-Free model demonstrates slightly better sample diversity, while the Classifier-Guided model underperforms due to instability in gradient-based guidance. Visual comparisons also confirm that Conditional DDPM produces the most perceptually coherent and class-consistent outputs. This work provides the first detailed benchmark of conditional diffusion strategies in structured scene generation and offers practical insights into their relative strengths, limitations, and applicability in real-world generation tasks.
AB - Diffusion models have recently emerged as a powerful class of generative models, achieving state-of-the-art performance in many image synthesis tasks. While unconditional denoising diffusion probabilistic models (DDPMs) can generate high-quality images, they lack the ability to produce outputs aligned with specific user-defined semantics. To enable controllable generation, several conditional extensions have been proposed, including Conditional DDPMs, Classifier-Guided DDPMs, and Classifier-Free DDPMs. However, a systematic comparison of these methods under consistent training and evaluation settings remains limited. In this study, this paper conduct a comprehensive comparative analysis of the three conditional diffusion frameworks within the context of Minecraft scene generation. Using a curated version of the MineRL dataset, this paper train all models under identical architectural and hyperparameter configurations. This paper evaluate their performance using both quantitative metrics—Fréchet Inception Distance (FID), Inception Score (IS), and CLIP-based similarity—and qualitative visual inspection. Our results show that the Conditional DDPM achieves the best overall performance in terms of image quality and semantic alignment, as evidenced by its lowest FID and highest CLIP scores. The Classifier-Free model demonstrates slightly better sample diversity, while the Classifier-Guided model underperforms due to instability in gradient-based guidance. Visual comparisons also confirm that Conditional DDPM produces the most perceptually coherent and class-consistent outputs. This work provides the first detailed benchmark of conditional diffusion strategies in structured scene generation and offers practical insights into their relative strengths, limitations, and applicability in real-world generation tasks.
KW - Classifier-free DDPM
KW - Classifier-guided DDPM
KW - Conditional DDPM
KW - Controllable Generation
KW - Diffusion Model
KW - Image Synthesis
UR - https://www.scopus.com/pages/publications/105026340448
U2 - 10.1145/3773365.3773566
DO - 10.1145/3773365.3773566
M3 - Conference proceeding
SN - 9798400718748
T3 - CISAI: Computer Information Science and Artificial Intelligence
SP - 1264
EP - 1269
BT - CISAI '25: Proceedings of the 2025 8th International Conference on Computer Information Science and Artificial Intelligence
PB - Association for Computing Machinery (ACM)
CY - New York, NY, USA
ER -