TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation

Official implementation for the paper: TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation.

Mingwei Li^1,2, Hehe Fan¹, Yi Yang¹

¹Zhejiang University, ²Zhongguancun Academy

News

[2026-02-06]: TransNormal-Synthetic dataset released on HuggingFace. [Dataset]
[2026-02-03]: arXiv paper released. [arXiv]
[2026-01-30]: Project page updated. Code and dataset will be released soon.

Teaser

Qualitative comparisons on transparent object normal estimation with multiple baselines.

Method Overview

Overview of TransNormal: dense visual semantics guide diffusion-based single-step normal prediction with wavelet regularization.

Installation

Requirements

Python >= 3.8
PyTorch >= 2.0.0
CUDA >= 11.8 (recommended for GPU inference)

Tested Environment:

NVIDIA Driver: 580.65.06
CUDA: 13.0
PyTorch: 2.4.0+cu121
Python: 3.10

Install Dependencies

# Clone the repository
git clone https://github.com/longxiang-ai/TransNormal.git
cd TransNormal

# Create and activate conda environment
conda create -n TransNormal python=3.10 -y
conda activate TransNormal

# Install dependencies
pip install -r requirements.txt

Download Model Weights

1. TransNormal Weights

pip install huggingface_hub

# Download TransNormal model
python -c "from huggingface_hub import snapshot_download; snapshot_download('Longxiang-ai/TransNormal', local_dir='./weights/transnormal')"

2. DINOv3 Weights (Requires Access Request)

⚠️ Important: DINOv3 weights require access approval from Meta AI.

Visit Meta AI DINOv3 Downloads to request access
After approval, download the ViT-H+/16 distilled model
Or use HuggingFace Transformers (version >= 4.56.0):

python -c "from huggingface_hub import snapshot_download; snapshot_download('facebook/dinov3-vith16plus-pretrain-lvd1689m', local_dir='./weights/dinov3_vith16plus')"

See weights/README.md for detailed instructions.

Quick Start

Python API

from transnormal import TransNormalPipeline, create_dino_encoder
import torch

# Create DINO encoder
# Note: Use bfloat16 instead of float16 to avoid NaN issues with DINOv3
dino_encoder = create_dino_encoder(
    model_name="dinov3_vith16plus",
    weights_path="./weights/dinov3_vith16plus",
    projector_path="./weights/transnormal/cross_attention_projector.pt",
    device="cuda",
    dtype=torch.bfloat16,
)

# Load pipeline
pipe = TransNormalPipeline.from_pretrained(
    "./weights/transnormal",
    dino_encoder=dino_encoder,
    torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")

# Run inference
normal_map = pipe(
    image="path/to/image.jpg",
    output_type="np",  # "np", "pil", or "pt"
)

# Save result
from transnormal import save_normal_map
save_normal_map(normal_map, "output_normal.png")

Command Line Interface

Single Image:

python inference.py \
    --image path/to/image.jpg \
    --output normal.png \
    --model_path ./weights/transnormal \
    --dino_path ./weights/dinov3_vith16plus \
    --projector_path ./weights/cross_attention_projector.pt

Batch Processing:

python inference_batch.py \
    --input_dir ./examples/input \
    --output_dir ./examples/output \
    --model_path ./weights/transnormal \
    --dino_path ./weights/dinov3_vith16plus

Gradio Web UI

Launch an interactive web interface:

python gradio_app.py --port 7860

Then open http://localhost:7860 in your browser. Use --share for a public link.

Output Format

The output normal map represents surface normals in camera coordinate system:

X (Red channel): Left direction (positive = left)
Y (Green channel): Up direction (positive = up)
Z (Blue channel): Out of screen (positive = towards viewer)

Output values are in range [0, 1] where 0.5 represents zero in each axis.

Inference Efficiency

Benchmark results on a single GPU (averaged over multiple runs):

Precision	Time (ms)	FPS	Peak Mem (MB)	Model Load (MB)
BF16	248	4.0	11098	7447
FP16	248	4.0	11098	7447
FP32	615	1.6	10468	8256

Note: BF16 is recommended over FP16 to avoid potential NaN issues with DINOv3.

Dataset

We introduce TransNormal-Synthetic, a physics-based dataset of transparent labware with rich annotations.

Download: HuggingFace

Property	Value
Total views	4,000
Scenes	10
Image resolution	800 x 800
Format	WebDataset (.tar shards)
Total size	~7.5 GB
License	CC BY-NC 4.0

Each sample contains paired RGB images (with/without transparent objects), surface normal maps, depth maps, object masks (all / transparent-only), material-changed RGB, and camera metadata (intrinsics).

import webdataset as wds

dataset = wds.WebDataset(
    "hf://datasets/Longxiang-ai/TransNormal-Synthetic/transnormal-{000000..000007}.tar"
).decode("pil")

for sample in dataset:
    rgb = sample["with_rgb.png"]
    normal = sample["with_normal.png"]
    mask = sample["with_mask_transparent.png"]
    break

Citation

If you find our work useful, please consider citing:

@misc{li2026transnormal,
      title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation}, 
      author={Mingwei Li and Hehe Fan and Yi Yang},
      year={2026},
      eprint={2602.00839},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.00839}, 
}

Acknowledgements

This work builds upon:

Lotus - Diffusion-based depth and normal estimation
DINOv3 - Self-supervised vision transformer from Meta AI
Stable Diffusion 2 - Base diffusion model

License

This project is licensed under CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0). See the LICENSE file for details.

For commercial licensing inquiries, please contact the authors.

Contact

For questions or issues, please open a GitHub issue or contact the authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation

News

Teaser

Method Overview

Installation

Requirements

Install Dependencies

Download Model Weights

1. TransNormal Weights

2. DINOv3 Weights (Requires Access Request)

Quick Start

Python API

Command Line Interface

Gradio Web UI

Output Format

Inference Efficiency

Dataset

Citation

Acknowledgements

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
examples		examples
transnormal		transnormal
weights		weights
LICENSE		LICENSE
README.md		README.md
gradio_app.py		gradio_app.py
inference.py		inference.py
inference_batch.py		inference_batch.py
requirements.txt		requirements.txt

License

longxiang-ai/TransNormal

Folders and files

Latest commit

History

Repository files navigation

TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation

News

Teaser

Method Overview

Installation

Requirements

Install Dependencies

Download Model Weights

1. TransNormal Weights

2. DINOv3 Weights (Requires Access Request)

Quick Start

Python API

Command Line Interface

Gradio Web UI

Output Format

Inference Efficiency

Dataset

Citation

Acknowledgements

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages