本仓库用于大家完成作业一的leaderboard后更新成绩,参考原课程方式,大家需要对本仓库提交pull request来更新成绩。 注意,你需要提交你的wandb链接,并且需要在pull request的描述中写明你的最终验证loss,以及你做了什么改动来达到这个结果。
以下为原课程的readme内容:
Note
If you're a non-Stanford student and interested in submitting to the leaderboard, please create a pull request adding your result to the second table. To remain in the top 5, your submission must be verified, for which you should invite marcelroed to a minimal repo containing a uv project with pyproject.toml, uv.lock and main.py. Your script should be able to be reproduced on a single H100 by running uv run main.py.
To submit to the leaderboard, submit a pull request that adds your results to the Markdown table below. The table should be sorted by increasing loss.
Note that your submission can run for at most 1.5 hours on an H100, and that you may only use the OpenWebText training dataset that we provide. The code must clearly be your own work, and you can't use external implementations for systems-critical aspects of your model.
The top 3 submissions will receive a prize at the end of the quarter, and the external top 3 submissions will receive a T-shirt. To make this fair, we will reorder the top 5 scoring students based on our reproduced training runs. Make sure you save a snapshot of your best code so it can be reproduced by us! We will reach out to the top few students after results have stabilized. Leading submissions that cannot be verified will be removed.
In your pull request description, you should include:
- The final validation loss that was recorded
- A link to an associated learning curve that clearly shows a wallclock-time x-axis that is less than 1.5 hours. You may either upload an image directly to the repo (use the ./images) folder or link to a publicly-viewable plot from a service like Weights and Biases.
- A description of what you did
We are considering adding an automated validation loss check, considering it's easy to measure your metrics wrong in a way that will place you higher on the leaderboard than you should be. If your loss seems too good to be true, make sure to validate your training and valdation datasets are correct, by checking decoded samples, and making sure your vocab is correct with 32k tokens. It should not be easy to get a validation loss better than 3.3. We validate at context length 512, so your reported validation loss should also be calculated with this setting.
Note
请将你的成绩修改到下面的三个表格中
| Name | Validation Loss | Link | Verification status (leave empty) |
|---|---|---|---|