In your paper, you said that you sample 6 random frames in ascending order of each clip as training data. However, in your code, you set GLOBAL_MAX_LEN = 1492, which means that you put the whole clip to the model as input data. I think this will cause different results. I feel uncertain about the presetting. I am looking forward to your reply. Thank you!