Skip to content Skip to footer

Meet SceneTex: A Novel AI Method for High-Quality, Style-Consistent Texture Generation in Indoor Scenes


High-quality 3D content synthesis is a crucial yet challenging problem for many applications, such as autonomous driving, robotic simulation, gaming, filmmaking, and future VR/AR situations. The topic of 3D geometry generation has seen a surge in research interest from the computer vision and graphics community due to the availability of more and more 3D content datasets. Though 3D geometric modeling has come a long way, creating item looks or textures still requires a lot of human labor. It usually takes significant time to develop and edit, and it demands much experience with 3D modeling using programs like Blender. 

As such, the high demand for human skill and associated costs have prevented autonomous texture design and augmentation from reaching complete industrialization. Much progress has been achieved in text-to-3D creation by utilizing the latest developments in 2D diffusion models, particularly in textural synthesis for predefined forms. Text2Tex and Latent-Paint, two seminal works, have produced high-quality object appearances and enabled high-fidelity texture synthesis from input prompts. Although these approaches yield interesting results for individual items, scaling them up to generate textures for a scene still presents several difficulties. 

On the one hand, texture seams, accumulated artifacts, and loop closure problems are common problems with autoregressive algorithms that project 2D views onto 3D object surfaces. Maintaining uniformity in style across the picture can be challenging when each object has its texture. Conversely, texture optimization is carried out in the low-resolution latent space using score-distillation-based methods, frequently leading to erroneous geometry details and hazy RGB textures. Because of this, prior text-driven approaches are unable to produce 3D scene textures of a high caliber. 

The research team from Technical University of Munich and Snap Research suggest SceneTex, a unique design that uses depth-to-image diffusion priors to produce high-quality and style-consistent texture for interior scene meshes to overcome the abovementioned issues. The research team takes a distinct strategy by framing the texture creation as a texture optimization challenge in RGB space using diffusion priors, in contrast to existing techniques that repeatedly warp 2D views onto mesh surfaces. Fundamentally, the research group introduces a multiresolution texture field to depict the mesh’s look subtly. The research team uses a multiresolution texture to hold texture elements at several sizes to depict the texture details accurately. As a result, their design can now adaptably learn appearance information at low and high frequencies. The research team uses a cross-attention decoder to lessen the style incoherence caused by self-occlusion to ensure the created texture’s stylistic consistency. 

In practical terms, every decoded RGB value is generated by cross-referencing to the pre-sampled reference surface locations dispersed across every object. Because every visible place receives a global reference to the whole instance look, the research team can further ensure the global style uniformity inside each model. The research team demonstrates that SceneTex may enable accurate and flexible texture creation for interior scenes based on provided linguistic signals. The research team shows that style and geometric consistency are highly valued in SceneTex through comprehensive trials. Based on user studies on a portion of the 3DFRONT dataset, the suggested technique outperforms alternative text-driven texture creation algorithms regarding 2D metrics like CLIP and Inception scores. 

The research team’s technical contributions are summed up as follows: 

• Using depth-to-image diffusion priors, the research team creates a unique framework for producing high-quality scene textures at high resolution. 

• The research team uses a multiresolution texture to capture rich texture features accurately by proposing an implicit texture field to record the object’s appearance at several scales. 

• Compared to earlier synthesis techniques, the research team produces more aesthetically pleasing and style-consistent textures for 3D-FRONT sceneries using a cross-attention texture decoder to ensure global style consistency for each instance. 


Check out the Paper, Github, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..


Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.




Source link