The Trajectron++ paper suggests to maybe "add further additional information (e.g., raw LIDAR data, camera images, pedestrian skeleton or gaze direction estimates) in this framework by encoding it as a vector and adding it to this backbone of representation vectors, ex".
Has anyone tried this? If so, where did you add it in the code and did the results improve?
Thanks and Best Regards