This repository is for a project in RAS 598 calss at Arizona State University
- Midterm Project
I implemented a semantic segmentation by using an encoder-decoder neural network based on the U-Net architecture with VGG16 as the encoder. This model was specifically developed for rover-based lunar terrain segmentation. Midterm Report PDF - Final Project
I integrated semantic segmentation with ORB-SLAM3 to address the limitations of both semantic segmentation and the 3D point cloud generated by ORB-SLAM3. Final Report PDF
FinalProject_video.2.1.mp4
Click here to watch in higher resolution.
ORBSLAM_Semantic_Mapping is based on ros2_orb_slam3. ORB SLAM3 is a great SLAM method that has been applied robot application. However, this method can not provide semantic information in environmental mapping. In this project, I present a method to build a 3D semantic map, which utilize both 2D semantic images from semantic segmentation model (U-Net with a VGG16 as encoder) and 3D ponit cloud map from ORB SLAM3.
- Stereo images are processed by ORB-SLAM3 to estimate camera pose and generate a 3D point cloud.
- The left stereo image is segmented using U-Net model to produce a semantic mask.
- 3D points are projected onto the semantic mask using coordinate transformations.
- Semantic labels (RGB colors) are assigned to each 3D point based on the corresponding pixel's class.
(a) is a raw image and (b) is a corresponding semantic mask. (c) is a 3d point cloud results of ORB-SLAM3 and (d) is the results of semantic 3d point cloud.
- The U-Net architecture with VGG16 as the encoder is relatively slow, producing 2D semantic images at only 4 FPS. This issue can be addressed by replacing the encoder with a more efficient backbone, such as EfficientNet or MobileNet.
- The generated point cloud contains some noise. This can be addressed by applying filtering techniques to each ray. For example, sampling points along each ray and removing outliers could result in a cleaner and more accurate 3D point cloud.
- In this approach, I assign class labels to 3D points based on the semantic segmentation of each frame and visualize the resulting semantic point cloud in RViz2. However, this semantic point cloud is not updated when loop closure occurs in ORB-SLAM, and therefore it does not reflect the pose corrections resulting from loop closure.
- Tested on Ubuntu 22.04 / ROS2 humble / Tensorflow 2.12.0 / Keras 2.12.0 / CUDA 11.8
- Install all dependencies to run
# Terminal 1
cd ~/ORBSLAM_Semantic_Mapping/LunarAutonomyChallenge
./RunLunarSimulator.sh# Terminal 2
# Publishing Stereo Camera images
cd ~/ORBSLAM_Semantic_Mapping/LunarAutonomyChallenge
./RunLeaderboard.sh# Terminal 3
# Publishing Semantic Mask generated by U-Net model (vgg16 encoder)
cd ~/ORBSLAM_Semantic_Mapping/scripts
python3 semantic_id_pub.py --model vgg16# Terminal 3
# Publishing Semantic Mask generated by U-Net model (mobilenetV2 encoder)
cd ~/ORBSLAM_Semantic_Mapping/scripts
python3 semantic_id_pub.py --model mobilenet# Terminal 4
# ORB SLAM3 Driver
cd ~/ros2_ws
source install/setup.bash
colcon build --packages-select ros2_orb_slam3
ros2 run ros2_orb_slam3 stereo_driver_node.py --ros-args -p settings_name:=Lunar_Stereo_Cam -p left_image_topic:=/frontleft_camera/image -p right_image_topic:=/frontright_camera/image# Terminal 5
# Publishing Semantic 3D Point Cloud
cd ~/ros2_ws
source install/setup.bash
ros2 run ros2_orb_slam3 stereo_node_cpp --ros-args -p node_name_arg:=stereo_node_cpp -p voc_file:=~/ros2_ws/src/ros2_orb_slam3/orb_slam3/Vocabulary/ORBvoc.txt.bin -p settings_file_path:=~/ros2_ws/src/ros2_orb_slam3/orb_slam3/config/Stereo/# Visualizing Published Images
rqt# Visualizing Published Semantic 3D Cloud Point
rviz2
