Chen Zhang1,* | Wencheng Han2,* | Yang Zhou1 | Jianbing Shen2,† | Cheng-zhong Xu2 | Wentao Liu1,†
1 SenseTime Research and Tetras.AI, 2 SKL-IOTSC, CIS, University of Macau
* Equal Contribution. † Corresponding Authors.
- We propose a new architecture for RAW video derendering. This architecture can efficiently de-render RAW video sequences using only one RAW frame and sRGB videos as input. By adopting this method, both storage and computation efficiency for RAW video capturing can be significantly improved.
- We propose a new benchmark for RAW video derendering to comprehensively evaluate the methods for this task. To our knowledge, this is the first benchmark specifically designed for the RAW video de-rendering task.
The framework consists of two main stages:
- Temporal Affinity Prior Extraction: This stage generates a reference RAW image by leveraging motion information between adjacent frames.
- Spatial Feature Fusion and Mapping: Using the reference RAW as the initial state, a pixel-level mapping function is learned to refine inaccurately predicted pixels from the first stage. This process incorporates guidance from the sRGB image and preceding frames.
- Dataset Release
- Model Release
- Code Release
Our model does not use any hard-to-configure packages. You only need to install torch and some simple dependencies (such as numpy, cv2). Of course, you can directly follow the steps below to configure the environment:
# git clone this repository
git clone https://github.com/zhangchen98/RAW_CVPR24.git
cd RAW_CVPR24
# create an environment
conda create -n videoRaw python=3.8
conda activate videoRaw
pip install -r requirements.txt
You can download our RVD dataset from here(code: dk5h). Then put the dataset into the folder ./RVD.
🚩 If you have trouble unzipping, you can use the following command:
sudo apt update
sudo apt install p7zip-full
7z x RVD.zipThe folder structure is as follows:
RVD
├── Part1
│ ├── test
│ │ ├── data.json
│ │ ├── DNG
│ │ ├── flow
│ │ ├── RAW
│ │ ├── sRGB
│ │ └── tags.json
│ └── train
│ ├── data.json
│ ├── DNG
│ ├── flow
│ ├── RAW
│ └── sRGB
└── Part2
├── test
│ ├── data.json
│ ├── flow
│ ├── RAW
│ └── sRGB
└── train
├── data.json
├── flow
├── RAW
└── sRGB
For both subsets, we provide optical flow data using the unimatch method. In addition, we also provide the original '.DNG' data for RVD-Part1.
Since different camera ISP pipelines are specific, we train the derendering model on each sub-dataset separately.
Train the model on the RVD-Part1 dataset:
python3 -u main.py \
--trainset_root='./RVD/Part1/train' \
--testset_root='./RVD/Part1/test' \
--input_size="900,1600" \
--save_dir='./checkpoints/RVD_Part1' \
--batch_size=2 \
--test_freq=20 \
--patch_size=256 \
--load_from='' \
--port=12355 \
--max_epoch=60 \
--num_worker=8 \
--init_lr=0.002 \
--lr_decay_epoch=20 \
--aux_loss_weight=0.5 \
--ssim_loss_weight=1.0 \
--localTrain the model on the RVD-Part2 dataset:
python3 -u main.py \
--trainset_root='./RVD/Part2/train' \
--testset_root='./RVD/Part2/test' \
--input_size="640,1440" \
--save_dir='./checkpoints/RVD_Part2' \
--batch_size=2 \
--test_freq=20 \
--patch_size=256 \
--load_from='' \
--port=12347 \
--max_epoch=60 \
--num_worker=8 \
--init_lr=0.002 \
--lr_decay_epoch=20 \
--aux_loss_weight=0.5 \
--ssim_loss_weight=1.0 \
--local \You can also amend the startup script in 'scripts' folder to use multi-GPU training.
-
Download the pretrained models (RVD_Part1.pth, RVD_Part2.pth) from BaiduYun (code: axh6).
-
Put the pretrained models in the './pretrain' folder.
-
Run the test script:
# test on RVD-Part1
python3 -u main.py \
--trainset_root='./RVD/Part1/train' \
--testset_root='./RVD/Part1/test' \
--input_size="900,1600" \
--save_dir='./checkpoints/RVD_Part1' \
--batch_size=8 \
--test_freq=20 \
--patch_size=256 \
--load_from='./pretrain/RVD_Part1.pth' \
--port=12355 \
--max_epoch=60 \
--num_worker=8 \
--init_lr=0.002 \
--lr_decay_epoch=20 \
--aux_loss_weight=0.5 \
--ssim_loss_weight=1.0 \
--local \
--test_only \
# --save_predict_raw # add this option to save the predicted raw images# test on RVD-Part2
python3 -u main.py \
--trainset_root='./RVD/Part2/train' \
--testset_root='./RVD/Part2/test' \
--input_size="640,1440" \
--save_dir='./checkpoints/RVD_Part2' \
--batch_size=8 \
--test_freq=20 \
--patch_size=256 \
--load_from='./pretrain/RVD_Part2.pth' \
--port=12347 \
--max_epoch=60 \
--num_worker=8 \
--init_lr=0.002 \
--lr_decay_epoch=20 \
--aux_loss_weight=0.5 \
--ssim_loss_weight=1.0 \
--local \
--test_only \
# --save_predict_raw # add this option to save the predicted raw imagesYou can find the testing results in the ./checkpoints/RVD_Part1 and ./checkpoints/RVD_Part2 directories.
Our dataset contains part of the data from Real-RawVSR Dataset(https://github.com/zmzhang1998/Real-RawVSR), thanks to the excellent work of Yue et al.
@inproceedings{zhang2024leveraging,
title={Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering},
author={Zhang, Chen and Han, Wencheng and Zhou, Yang and Shen, Jianbing and Xu, Cheng-zhong and Liu, Wentao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={25659--25668},
year={2024}
}
If you have any questions, please contact: zhangchen2@tetras.ai


