Align 3D Representation and Text Embedding for 3D Content Personalization

Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Recent advances in NeRF and 3DGS have significantly enhanced the efficiency and quality of 3D content synthesis. However, efficient personalization of generated 3D content remains a critical challenge. Current 3D personalization approaches predominantly rely on knowledge distillation-based methods, which require computationally expensive retraining procedures. To address this challenge, we propose Invert3D, a novel framework for convenient 3D content personalization. Nowadays, vision-language models such as CLIP enable direct image personalization through aligned vision-text embedding spaces. However, the inherent structural differences between 3D content and 2D images preclude direct application of these techniques to 3D personalization. Our approach bridges this gap by establishing alignment between 3D representations and text embedding spaces. Specifically, we develop a camera-conditioned 3D-to-text inverse mechanism that projects 3D contents into a 3D embedding aligned with text embeddings. This alignment enables efficient manipulation and personalization of 3D content through natural language prompts, eliminating the need for computationally retraining procedures. Extensive experiments demonstrate that Invert3D achieves effective personalization of 3D content.
Original languageEnglish
Title of host publicationMM 2025: Proceedings of the 33rd ACM International Conference on Multimedia
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages12199-12208
Number of pages10
ISBN (Print)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, ACMMM25 - Dublin Royal Convention Centre, Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025
https://whova.com/embedded/event/sa54pNCpHUFy1OTIEiEzceQu5kPuSm3dYlEnqAJdV4o%3D/?utc_source=ems (Conference program)
https://acmmm2025.org/ (Conference website)
https://dl.acm.org/doi/proceedings/10.1145/3746027 (Conference proceedings)

Publication series

NameMM: International Multimedia Conference
PublisherAssociation for Computing Machinery

Conference

Conference33rd ACM International Conference on Multimedia, ACMMM25
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25
Internet address

User-Defined Keywords

  • Embedding Alignment
  • 3D-to-Text
  • Personalized 3D Generation

Fingerprint

Dive into the research topics of 'Align 3D Representation and Text Embedding for 3D Content Personalization'. Together they form a unique fingerprint.

Cite this