Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 12 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ A wrapper around [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-h

### To Install

- Python 3.9 or newer
- Python 3.10 or newer

### To Run

Expand All @@ -15,7 +15,7 @@ A wrapper around [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-h

### To Develop

- [poetry](https://python-poetry.org/docs/#installation)
- [PDM](https://pdm-project.org/en/latest/)

## Getting Started

Expand All @@ -34,14 +34,6 @@ llm-eval-test run --help
## Download Usage

``` sh

# Create dataset directory
DATASETS_DIR=$(pwd)/datasets
mkdir $DATASETS_DIR

#To download the datasets:

llm-eval-test download -h
usage: llm-eval-test download [-h] [--catalog-path PATH] [--tasks-path PATH] [--offline | --no-offline] [-v | -q] -t TASKS [-d DATASETS] [-f | --force-download | --no-force-download]

download datasets for open-llm-v1 tasks
Expand All @@ -50,16 +42,11 @@ options:
-h, --help show this help message and exit

-t TASKS, --tasks TASKS
comma separated tasks to download for example: arc_challenge,hellaswag (default: None)
comma separated tasks to download for example: arc_challenge,hellaswag
-d DATASETS, --datasets DATASETS
Dataset directory (default: ./datasets)
Dataset directory
-f, --force-download, --no-force-download
Force download datasets even it already exist (default: False)


llm-eval-test download --tasks arc_challenge,GSM8K,HellaSwag
llm-eval-test download --tasks leaderboard
llm-eval-test download --tasks arc_challenge,GSM8K,HellaSwag -f (to overwrite the previously downloaded datasets)
Force download datasets even it already exist
```

## Run Usage
Expand Down Expand Up @@ -104,17 +91,17 @@ prompt parameters:
### Example: MMLU-Pro Benchmark

``` sh
# Create dataset directory
DATASETS_DIR=$(pwd)/datasets
mkdir $DATASETS_DIR

# Download the MMLU-Pro dataset
DATASET=TIGER-Lab/MMLU-Pro
llm-eval-test download --datasets $DATASETS_DIR --tasks mmlu_pro

# Run the benchmark
ENDPOINT=http://127.0.0.1:8080/v1/completions # An OpenAI API-compatable completions endpoint
MODEL_NAME=meta-llama/Llama-3.1-8B # Name of the model hosted on the inference server
TOKENIZER=ibm-granite/granite-3.1-8b-instruct
llm-eval-test run --endpoint $ENDPOINT --model $MODEL_NAME --datasets $DATASETS_DIR --tasks mmlu_pro

Examples:
llm-eval-test run -H ENDPOINT --model /mnt/models/ --tokenizer TOKENIZER --datasets ./datasets --tasks arc_challenge;
llm-eval-test run -H ENDPOINT --model /mnt/models/ --tokenizer TOKENIZER --datasets ./datasets --tasks arc_challenge,gsm8k,arc_challenge,hellaswag,mmlu_pro,truthfulqa,winogrande
llm-eval-test run -H ENDPOINT --model /mnt/models/ --tokenizer TOKENIZER --datasets ./datasets --tasks leaderboard

```

22 changes: 13 additions & 9 deletions src/llm_eval_test/downloader.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
import os
import tempfile

from lm_eval.api.group import ConfigurableGroup
from lm_eval.api.task import ConfigurableTask
Expand All @@ -14,7 +15,7 @@ def download_datasets(datasets_dir: str, tasks: list[str], tasks_path: str, forc
# TaskManager
tm = TaskManager(
include_path=tasks_path,
include_defaults=True,
include_defaults=False,
verbosity=logging.getLevelName(logger.level),
)
# Load tasks and groups
Expand Down Expand Up @@ -50,14 +51,17 @@ def download_datasets(datasets_dir: str, tasks: list[str], tasks_path: str, forc
if force_download or not os.path.exists(target_dir):
from huggingface_hub import snapshot_download

snapshot_download(
repo_id=dataset_repo,
repo_type="dataset",
local_dir=target_dir,
local_dir_use_symlinks=False,
use_auth_token=os.getenv("HF_TOKEN"),
)
local_paths[task_name] = target_dir
with tempfile.TemporaryDirectory() as tmpdir:
snapshot_download(
repo_id=dataset_repo,
repo_type="dataset",
cache_dir=tmpdir,
local_dir=target_dir,
local_dir_use_symlinks=False, # TODO: Remove as depercated
force_download=True,
token=os.getenv("HF_TOKEN", True), # Str or True
)
local_paths[task_name] = target_dir
except Exception as e:
logger.error(f"Failed to download '{task_name}' from {dataset_repo}: {e}")

Expand Down