Dear Author:
I am interested in your work. I would like to inquire about the issue of parallelism. In your code, I found that although I used multiple cards when loading the gpt neox-20b model, I actually only used one card when calculating. How can I solve this problem to speed up prediction speed

