-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Hi,
I have two questions regarding your code:
- What is the bath size for the pretraining? I found that it is set to 512 in the table 5 in the original paper, but it is 64 and the micro batch size is 16 in the code.
- How many pretraining steps are taken in Table 1, 10k or 50k? If we use more pretraining steps, like 50k steps, how many steps are taken in the first pretraining stage on the random-selected data and how many steps are taken in the last pretraining stage on selected data.
Hope you can help me to figure them out. Thank you very much.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels