Skip to content

Practical eval: accuracy, perplexity, simple task probes; harness cmds.

License

Notifications You must be signed in to change notification settings

Blue-No1/evaluation-metrics-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluation Metrics

Semi-hands-on evaluation notes + commands (perplexity, exact-match, task probes). Ghi chú & lệnh chạy đánh giá.

Plan

  • lm-eval-harness basic runs.
  • Tiny custom probes (math/code snippets).
  • Compare base vs LoRA checkpoints (later).

Quick Start (conceptual)

pip install lm-eval
lm-eval --model hf \
  --model_args pretrained=meta-llama/Llama-3-8B-Instruct \
  --tasks hellaswag,boolq \
  --device cuda:0 --batch_size 4

About

Practical eval: accuracy, perplexity, simple task probes; harness cmds.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages