[Suggestion] Support/Provide global video features

@eric-xw @zzxslp 
So far, each video is represented by a NumPy array of size (1, num_of_segments, 1024). 
Since many of the original videos are no longer available, would it be possible for you to provide a pooled/global feature for each video (size of [1, D])?

Such a pooled representation is widely used in image-guided NMT such as Multi30K, and I believe it will also benefit research in VMT.