GLOSSARY
3D Diffusion Models
3D diffusion models apply the same denoising framework that powers Stable Diffusion to 3D representations — multi-view images, triplanes, voxels, or point clouds — that get converted to a mesh as the final step.
Definition
Image diffusion models learn to reverse a gradual noising process on 2D images. 3D diffusion adapts the idea to 3D representations. Common variants:
- Multi-view diffusion: denoise several view images jointly to keep them consistent (Zero123++, MVDream)
- Triplane diffusion: denoise three orthogonal feature planes that are decoded into geometry (Shap-E, GET3D)
- Point cloud diffusion: denoise positions of a fixed-size set of points (Point-E)
- Latent 3D diffusion: denoise in a learned compressed 3D latent space (Trellis, Hunyuan3D-2)
Why it matters
Most current production-quality text-to-3D and image-to-3D systems are diffusion-based. Diffusion models scale, generalize, and condition on text or images cleanly — properties that older score-distillation methods lacked. The leading open models in 2025–2026 (Hunyuan3D, Trellis, Stable Zero123) all use diffusion in some form.
Common confusion
A 3D diffusion model rarely outputs a finished mesh directly. The diffusion stage produces an intermediate representation — views, points, voxels — and a separate stage extracts a mesh. Quality and printability depend heavily on this second stage.