Neural Networks Compression Evaluation Bot

This repository provides a system to submit, run, and evaluate machine learning models on a private testset. It integrates a Discord bot user interface, a Flask API for leaderboards, a Celery task queue for asynchronous evaluation, Docker sandboxing for secure model execution, and a small database to store submissions and team scores.

The bot was developed for Haick Datathon competition organized by School of AI scientific club, where I currently hold the position of technical manager. This README replaces the previous brief description with a full developer-focused guide: development setup, technical architecture, Celery and Docker usage, deployment notes, testing, and troubleshooting.

Project overview
Quick start (local, without Docker)
Docker & docker-compose (recommended)
Celery (workers, scheduling, monitoring)
Database and migrations
Running inference and scoring
API & Discord bot
Testing
Deployment notes
Troubleshooting and tips
File layout and responsibilities
Security and sandboxing
Contributing

Project overview

Core responsibilities:

Accept user submissions (model files + optional inference script) via Discord commands
Run inference securely on a private testset inside containers
Score outputs and update public/private leaderboards
Persist submissions, teams, and results to the database
Provide a REST API for leaderboard consumption

Key files and modules

main.py — Discord bot entrypoint, defines slash commands and submission flow.
api_server.py — Flask app exposing leaderboard endpoints.
celery_app.py — Celery application configuration.
celery_tasks.py — Celery tasks for scheduling and running evaluations.
run_inference_job.py — Entrypoint used to perform inference inside a sandbox/container.
scoring.py — Scoring logic for predictions vs. ground truth.
inference.py — Helpers used by inference scripts to run a model on the testset.
utils.py — Misc utilities.
create_teams.py — DB bootstrap utility for inserting initial teams/participants.
database/ — SQLAlchemy models, operations and session wiring.
models/ — Directory where user-provided model artifacts (.pth, .onnx) are stored.
testset/ — Private test dataset (not distributed).

Quick start — development (local, without Docker)

Prereqs:

Python 3.8+
A virtual environment (venv/conda)
Redis (for Celery) — local or containerized

Dataset

Place your dataset under the testset/ directory at the project root before running evaluations. The repository does not include the dataset by default.
You can use the provided helper script to split your testset/ into public and private subsets (the script creates a data_split/ folder and saves index files):

python helpers/create_data_split.py

The script performs stratified sampling, preserves class directories, and saves public_idx_...npy and private_idx_...npy files in data_split/ for reproducibility.

Create and activate a virtualenv

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Create .env in project root with at least the following keys (example):

DISCORD_TOKEN=your_discord_token_here
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=sqlite:///./evaluation.db

Initialize the SQLite DB and seed teams/participants

python create_teams.py

Start Redis if you don't already have it running (for example with Docker):

docker run -d --name redis -p 6379:6379 redis:7

Start a Celery worker (see Celery details below for recommended flags):

# from project root
celery -A celery_app.celery worker --loglevel=info -Q default

Run the Discord bot locally

python main.py

(Optional) Run the Flask API server for leaderboards

python api_server.py

Notes: When running locally without Dockerized sandboxing, do not run untrusted user code. The repository's Docker-based sandbox is designed for secure execution of third-party models.

Docker & docker-compose (recommended for isolation)

There is a docker-compose.yml included to wire up services (Redis, optional db, API) and to help build worker images for sandboxing model runs.

Common commands:

# Build and start redis + api + any defined services
docker-compose up --build

# Start detached
docker-compose up -d --build

# Stop
docker-compose down

Container recommendataions:

Use the provided Dockerfiles to build the evaluation runner image (see Dockerfile and Dockerfile.celery). The runner image includes required Python packages and a minimal runtime to execute run_inference_job.py inside an isolated container.

Security note: The Docker container used for running inference should mount only the model and the testset artifacts required to produce predictions, never the host root or secrets.

Celery (task queue)

This project uses Celery to manage asynchronous evaluation jobs. Celery configuration is located in celery_app.py. Worker tasks are defined in celery_tasks.py and call the job runner (run_inference_job.py) inside a sandbox.

How tasks flow:

A user submits a model via Discord. main.py uploads/places the model into models/ and enqueues a Celery task to evaluate it.
The Celery worker receives the evaluation task, launches a Docker sandbox (or runs a local runner) that executes run_inference_job.py with the submission context.
The runner writes predictions to disk. scoring.py is invoked to compute metrics. Results are saved to DB and leaderboard updated.

Starting workers

# Start a worker
celery -A celery_app.celery worker --loglevel=info -Q default

# Start multiple workers (or use --concurrency=N). Example:
celery -A celery_app.celery worker --loglevel=info -Q default -c 4

Scheduling and periodic tasks

If you use periodic scheduling (beat), run:

celery -A celery_app.celery beat --loglevel=info
# Or run worker + beat in separate terminals/containers

Monitoring

Use Flower (optional) for monitoring tasks: pip install flower then run

celery -A celery_app.celery flower --port=5555

Redis

Celery requires a broker. The project expects REDIS_URL or similar environment variable. The default docker-compose provides Redis. For local development you can run Redis via Docker as shown above.

Database and migrations

The project uses SQLAlchemy in database/ and a small operations layer in database/operations.py to interact with submissions, teams and scores. By default the database is SQLite, controlled via DATABASE_URL environment variable.

Bootstrap the DB and seed data:

python create_teams.py

If you move to Postgres for production, update DATABASE_URL and adjust docker-compose.yml to include a Postgres service, then run normal SQLAlchemy migrations (if you integrate Alembic).

Running inference and scoring

The actual per-submission evaluation happens in run_inference_job.py. It expects arguments describing:

which model file to load (path under models/)
which dataset split to evaluate (private/public)
an output directory for predictions

High-level runner contract (inputs/outputs):

Inputs: model path, dataset split id, device (cpu/cuda optional), inference timeout
Outputs: predictions file(s) in a structured format (CSV/JSON/NPY), a metrics JSON written by scoring.py or the task wrapper

Scoring is implemented in scoring.py. The runner should produce outputs compatible with the scorer.

Example (local run for testing):

python run_inference_job.py --model models/example.pth --split private --output tmp/outdir
python scoring.py --predictions tmp/outdir/preds.json --split private

Notes on timeouts and resource limits

The Celery task wrapper should set a hard timeout for evaluation tasks to avoid long-running or hung jobs.
In Docker-based sandboxing, enforce CPU/memory limits (docker run flags or compose deploy.resources) so that user models cannot exhaust host resources.

API & Discord bot

API:

api_server.py exposes endpoints under /api/leaderboard/* to fetch public/private leaderboards and team stats.

Discord bot:

main.py contains the Discord bot implementation. It registers slash commands such as /register_participant, /evaluate_submission, and /leaderboard.
The bot should be run with DISCORD_TOKEN in the environment.

Submitting a model via Discord (high-level):

The user uploads a .pth or .onnx file and triggers /evaluate_submission.
The bot stores the artifact in models/ under a unique name and enqueues a Celery evaluation task.
The user receives updates when the evaluation finishes and a link to the leaderboard entry.

Testing

There are a few tests under tests/ (for example tests/test_parser.py). Run them with pytest:

pytest -q

Add tests for new functionality — especially for scoring and runner I/O, and for the API endpoints.

Deployment notes

Small-scale production deployment suggestions:

Use Docker Compose or Kubernetes. For higher scale prefer k8s.
Use a managed Redis (or a resilient cluster) as Celery broker.
Use Postgres for DB in production and run migrations with Alembic.
Run Celery workers with autoscaling (K8s HPA or a horizontal worker autoscaler) depending on queue depth.
Run the Discord bot in a separate deployment and add health checks.
Secure the API (authentication/authorization) if exposing beyond internal usage.

Example docker-compose production snippet (conceptual):

services:
  redis:
    image: redis:7
  api:
    build: .
    command: python api_server.py
    environment:
      - DATABASE_URL=${DATABASE_URL}
  worker:
    build: .
    command: celery -A celery_app.celery worker --loglevel=info
    environment:
      - REDIS_URL=${REDIS_URL}

Troubleshooting & tips

If Celery tasks are not running: confirm Redis is reachable and that worker is started with correct app (A flag) and queue.
If Docker runner cannot access model/testset: ensure proper mounts in docker run or compose volumes, avoid mounting any sensitive host path.
If scoring results mismatch expectations: check that predictions format matches scoring.py expectations (IDs, ordering, label format).

Logs

Worker logs: stdout where Celery worker runs.
API logs: stdout of api_server.py.
Runner logs: captured by the task wrapper — ensure each run writes a run-specific log file under tmp/job-<uuid>/.

File layout (high level)

main.py — Discord bot
api_server.py — Flask API
celery_app.py, celery_tasks.py — Celery config + tasks
run_inference_job.py — Runner invoked by tasks
scoring.py — Evaluation metrics
inference.py — Inference utilities
create_teams.py — DB seeding
database/ — models, operations
models/ — uploaded model artifacts
testset/ — private dataset (not to be shared)
tmp/ — job-specific temporary outputs and logs

Security & sandboxing

Always run user-submitted code in an isolated environment (Docker container with strict resource limits and no host mounts except necessary inputs/outputs).
Never run untrusted code as root inside containers.
Validate model artifacts and uploaded files (size limits, file type checks) before enqueuing tasks.

Contributing

Fork, create a feature branch, add tests for new behavior, and open a pull request.
Keep changes small, add documentation updates in README.md or the docs/ directory if created.

Suggested follow-ups

Add CI (GitHub Actions) to run tests and flake/lint on PRs.
Add Alembic migrations if you move to a RDBMS other than SQLite.
Add a minimal integration test that runs a small model inside a container to validate the runner + scoring pipeline.

Contact

If you need help or want to extend the project, open an issue on the repo with details about your use-case and environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Networks Compression Evaluation Bot

Table of Contents

Project overview

Quick start — development (local, without Docker)

Docker & docker-compose (recommended for isolation)

Celery (task queue)

Database and migrations

Running inference and scoring

API & Discord bot

Testing

Deployment notes

Troubleshooting & tips

File layout (high level)

Security & sandboxing

Contributing

Suggested follow-ups

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
data_split		data_split
database		database
helpers		helpers
tests		tests
testset		testset
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.celery		Dockerfile.celery
LICENSE		LICENSE
README.md		README.md
api_server.py		api_server.py
celery_app.py		celery_app.py
celery_tasks.py		celery_tasks.py
create_teams.py		create_teams.py
discord.log		discord.log
docker-compose.yml		docker-compose.yml
evaluation_bot.db		evaluation_bot.db
evaluation_bot_backup.db		evaluation_bot_backup.db
inference.py		inference.py
leaderboard.py		leaderboard.py
main.py		main.py
requirements.txt		requirements.txt
run_inference_job.py		run_inference_job.py
scoring.py		scoring.py
start.sh		start.sh
start_with_celery.sh		start_with_celery.sh
test.py		test.py
test_baseline.py		test_baseline.py
utils.py		utils.py

License

YXlh-64/Neural_Networks_Compression_Evaluation_Bot

Folders and files

Latest commit

History

Repository files navigation

Neural Networks Compression Evaluation Bot

Table of Contents

Project overview

Quick start — development (local, without Docker)

Docker & docker-compose (recommended for isolation)

Celery (task queue)

Database and migrations

Running inference and scoring

API & Discord bot

Testing

Deployment notes

Troubleshooting & tips

File layout (high level)

Security & sandboxing

Contributing

Suggested follow-ups

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages