This paper proposes a Quantization-Aware Training (QAT) framework enhanced with object-scale-aware regularization to mitigate accuracy degradation in small-object detection caused by quantization noise, specifically targeting resource-constrained FPGA deployments for geoscience and remote sensing applications.
Visualization of the building detection task with true labels (above) and detected building footprints (below). Several small cabins in white rectangles are newly detected and absent from the dataset (BBD).
A 3D-printed case is designed to accommodate the Xilinx Kria KV260 FPGA with waterproof sealing, cable management, and air cooling. The board connects to an IP camera for video streaming through the RTSP protocol.
Some birds have been detected at Airport Oberpfaffenhofen, Weßling, Germany
-
Docker env
docker run -e UID=$(id -u) -e GID=$(id -g) --name qatdet --gpus device=0 -d -it --shm-size 32G --mount source=$(pwd),target=/workspace,type=bind tumbgd/vai-pt-cuda
docker exec -it qatdet bash -
Inside docker container
qatdetpython -m pip install --user -r requirements.txt cd code python -m pip install --user -v -e . cd ..
In this way, the
yoloxlibrary is installed. The installation is successful if you have the following output:Installed /workspace/code Successfully installed yoloxAll following steps shall be executed in this docker environment until we obtain a compiled
.xmodel.
Download the datasets and transform them into COCO format.
Download the dataset from here. We use bbd2k5-images-image.tar.bz2 and bbd2k5-images-umring.tar.bz2 in this project. Unzip them and put them into ./bbd/data. The resulting directory tree should be like
.
|-- LICENSE
|-- README.md
`-- bbd
`-- data
|-- bbd2k5-images-image
`-- bbd2k5-images-umring
Then we can generate COCO-format json files by
python ./bbd/Mask2COCO.pyFor detailed dataset description, please check here.
Citation:
@inproceedings{10.1145/3589132.3625658,
author = {Werner, Martin and Li, Hao and Zollner, Johann Maximilian and Teuscher, Balthasar and Deuser, Fabian},
title = {Bavaria Buildings - A Novel Dataset for Building Footprint Extraction, Instance Segmentation, and Data Quality Estimation},
year = {2023},
isbn = {9798400701689},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3589132.3625658},
doi = {10.1145/3589132.3625658},
booktitle = {Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems},
articleno = {108},
numpages = {4},
location = {Hamburg, Germany},
series = {SIGSPATIAL '23}
}
You can find the download link in the dataset repository here. We only use the drone2021 (~63 GB) part of this dataset. Put unzipped files into bird/data. The resulting directory tree should be like
|-- LICENSE
|-- README.md
`-- bird
|-- data
| |-- annotations
| | |
| | `--.json
| `-- images
| |--1
| | `--.jpg
| |--2
| `--...
`-- usable_images_updater.py
The annotations given in the original dataset is already COCO-format. We use the following script to remove images containing no birds or birds that are too small (< 40 pxl width for 4K images).
python bird/Filter.pyCitation:
@inproceedings{mva2023_sod_challenge,
title={{MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results}},
author={Yuki Kondo and Norimichi Ukita and Takayuki Yamaguchi and Hao-Yu Hou and Mu-Yi Shen and Chia-Chi Hsu and En-Ming Huang and Yu-Chen Huang and Yu-Cheng Xia and Chien-Yao Wang and Chun-Yi Lee and Da Huo and Marc A. Kastner and Tingwei Liu and Yasutomo Kawanishi and Takatsugu Hirayama and Takahiro Komamizu and Ichiro Ide and Yosuke Shinya and Xinyao Liu and Guang Liang and Syusuke Yasui},
booktitle={2023 18th International Conference on Machine Vision and Applications (MVA)},
note={\url{https://www.mva-org.jp/mva2023/challenge}},
year={2023}
}
Perform QAT and get .xmodel.
-
For BBD:
bash code/bbd.sh
After training, you should find
YOLOX_0_int.xmodelat./YOLOX_outputs/bbd/convert_qat_results. -
For MVA2023:
bash code/bird.sh
After training, you should find
YOLOX_0_int.xmodelat./YOLOX_outputs/bird/convert_qat_results.
The target deployment platform in this project is AMD (Xilinx) Kria KV260 FPGAs. After QAT, we compile the model obtained in the previous step for KV260.
vai_c_xir -x <PATH_TO/YOUR.xmodel> -a /opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json -o <EXPORT_PATH> -n <NEWNAME>If you are playing with another kind of board, you need to change the value for the -a option. Moreover, you can export the computation graph to default.svg of xmodel by
xdputil xmodel <PATH_TO_COMPILED.xmodel> -sYou can find more information in the official document here.
Switch to the README in onboard.
tbd
This repository is built based on pt_yolox-nano_3.5 in the Model Zoo of Vitis AI. Keep an eye on COCO path specifications in ./code/yolox/exp/yolox_base.py.
Copyright 2022-2023 Advanced Micro Devices Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
See also YOLOX.
Licenced under MPL-2.0 license (LICENSE or https://opensource.org/license/mpl-2-0).


