I find the forward method of DepthPredModel quite strange. self.backbone returns a dictionary in forward. but the input of SceneUnderstandingModule is a tensor, not a dictionary.