HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation

1Technical University of Munich   3Imperial College London
     2Beihang University       4Zhongguancun Laboratory
CVPR 2024
HyperSDFusion learns the hierarchical representation for text and shape in Hyperbolic space, bridging hierarchical structures in language and geometry for enhanced 3D Text2Shape generation. Given a set of general texts and detailed texts, HyperSDFusion generates 3D shapes that are faithful to text descriptions and exhibit the hierarchy structure.

Figure 1: (a) The text-shape hierarchy. (b) The synatic tree of text. (c) The hierarchical structure of 3D shape. (d) Visualizing the text-shape hierarchical structure of generated 3D shapes.

Abstract

3D shape generation from text is a fundamental task in 3D representation learning. The text-shape pairs exhibit a hierarchical structure, where a general text like ``chair" covers all 3D shapes of the chair, while more detailed prompts refer to more specific shapes. Furthermore, both text and 3D shapes are inherently hierarchical structures. However, existing Text2Shape methods, such as SDFusion, do not exploit that. In this work, we propose HyperSDFusion, a dual-branch diffusion model that generates 3D shapes from a given text. Since hyperbolic space is suitable for handling hierarchical data, we propose to learn the hierarchical representations of text and 3D shapes in hyperbolic space. First, we introduce a hyperbolic text-image encoder to learn the sequential and multi-modal hierarchical features of text in hyperbolic space. In addition, we design a hyperbolic text-graph convolution module to learn the hierarchical features of text in hyperbolic space. In order to fully utilize these text features, we introduce a dual-branch structure to embed text features in 3D feature space. At last, to endow the generated 3D shapes with a hierarchical structure, we devise a hyperbolic hierarchical loss. Our method is the first to explore the hyperbolic hierarchical representation for text-to-shape generation. Experimental results on the existing text-to-shape paired dataset, Text2Shape, achieved state-of-the-art results.

The proposed HyperSDFusion

Text-to-shape generation results


Figure 4: (a) The showcase of text-to-shape generation results compared to SDFusion. (b) More generation results, especially generated from long and complex text.


Visualizations of Hierarchical Learning


Figure 5: More visualizations for capturing text-shape hierarchy.

Figure 6: (a) 2D text embeddings learned by SDFsuion in Euclidean space. (b) 2D text embeddings learned by our method in hyperbolic space. (c) is the magnified view of (b). (d) 3D Shape features generated from general and detailed texts in Hyperbolic space.


BibTeX


    @InProceedings{HyperSDFusion_2023_CVPR,
    author    = {Leng, Zhiying and Birdal, Tolga and Liang, Xiaohui and Tombari, Federico},
    title     = {HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
}

Contact us