Skip to content

This repository contains the source code of a Discord bot system for evaluating machine learning models on a private testset, with secure Docker sandboxing, asynchronous job processing via Celery, and leaderboard APIs.

License

Notifications You must be signed in to change notification settings

YXlh-64/Neural_Networks_Compression_Evaluation_Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Networks Compression Evaluation Bot

This repository provides a system to submit, run, and evaluate machine learning models on a private testset. It integrates a Discord bot user interface, a Flask API for leaderboards, a Celery task queue for asynchronous evaluation, Docker sandboxing for secure model execution, and a small database to store submissions and team scores.

The bot was developed for Haick Datathon competition organized by School of AI scientific club, where I currently hold the position of technical manager. This README replaces the previous brief description with a full developer-focused guide: development setup, technical architecture, Celery and Docker usage, deployment notes, testing, and troubleshooting.

Table of Contents

  • Project overview
  • Quick start (local, without Docker)
  • Docker & docker-compose (recommended)
  • Celery (workers, scheduling, monitoring)
  • Database and migrations
  • Running inference and scoring
  • API & Discord bot
  • Testing
  • Deployment notes
  • Troubleshooting and tips
  • File layout and responsibilities
  • Security and sandboxing
  • Contributing

Project overview

Core responsibilities:

  • Accept user submissions (model files + optional inference script) via Discord commands
  • Run inference securely on a private testset inside containers
  • Score outputs and update public/private leaderboards
  • Persist submissions, teams, and results to the database
  • Provide a REST API for leaderboard consumption

Key files and modules

  • main.py — Discord bot entrypoint, defines slash commands and submission flow.
  • api_server.py — Flask app exposing leaderboard endpoints.
  • celery_app.py — Celery application configuration.
  • celery_tasks.py — Celery tasks for scheduling and running evaluations.
  • run_inference_job.py — Entrypoint used to perform inference inside a sandbox/container.
  • scoring.py — Scoring logic for predictions vs. ground truth.
  • inference.py — Helpers used by inference scripts to run a model on the testset.
  • utils.py — Misc utilities.
  • create_teams.py — DB bootstrap utility for inserting initial teams/participants.
  • database/ — SQLAlchemy models, operations and session wiring.
  • models/ — Directory where user-provided model artifacts (.pth, .onnx) are stored.
  • testset/ — Private test dataset (not distributed).

Quick start — development (local, without Docker)

Prereqs:

  • Python 3.8+
  • A virtual environment (venv/conda)
  • Redis (for Celery) — local or containerized

Dataset

  • Place your dataset under the testset/ directory at the project root before running evaluations. The repository does not include the dataset by default.
  • You can use the provided helper script to split your testset/ into public and private subsets (the script creates a data_split/ folder and saves index files):
python helpers/create_data_split.py

The script performs stratified sampling, preserves class directories, and saves public_idx_...npy and private_idx_...npy files in data_split/ for reproducibility.

  1. Create and activate a virtualenv
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
  1. Create .env in project root with at least the following keys (example):
DISCORD_TOKEN=your_discord_token_here
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=sqlite:///./evaluation.db
  1. Initialize the SQLite DB and seed teams/participants
python create_teams.py
  1. Start Redis if you don't already have it running (for example with Docker):
docker run -d --name redis -p 6379:6379 redis:7
  1. Start a Celery worker (see Celery details below for recommended flags):
# from project root
celery -A celery_app.celery worker --loglevel=info -Q default
  1. Run the Discord bot locally
python main.py
  1. (Optional) Run the Flask API server for leaderboards
python api_server.py

Notes: When running locally without Dockerized sandboxing, do not run untrusted user code. The repository's Docker-based sandbox is designed for secure execution of third-party models.

Docker & docker-compose (recommended for isolation)

There is a docker-compose.yml included to wire up services (Redis, optional db, API) and to help build worker images for sandboxing model runs.

Common commands:

# Build and start redis + api + any defined services
docker-compose up --build

# Start detached
docker-compose up -d --build

# Stop
docker-compose down

Container recommendataions:

  • Use the provided Dockerfiles to build the evaluation runner image (see Dockerfile and Dockerfile.celery). The runner image includes required Python packages and a minimal runtime to execute run_inference_job.py inside an isolated container.

Security note: The Docker container used for running inference should mount only the model and the testset artifacts required to produce predictions, never the host root or secrets.

Celery (task queue)

This project uses Celery to manage asynchronous evaluation jobs. Celery configuration is located in celery_app.py. Worker tasks are defined in celery_tasks.py and call the job runner (run_inference_job.py) inside a sandbox.

How tasks flow:

  1. A user submits a model via Discord. main.py uploads/places the model into models/ and enqueues a Celery task to evaluate it.
  2. The Celery worker receives the evaluation task, launches a Docker sandbox (or runs a local runner) that executes run_inference_job.py with the submission context.
  3. The runner writes predictions to disk. scoring.py is invoked to compute metrics. Results are saved to DB and leaderboard updated.

Starting workers

# Start a worker
celery -A celery_app.celery worker --loglevel=info -Q default

# Start multiple workers (or use --concurrency=N). Example:
celery -A celery_app.celery worker --loglevel=info -Q default -c 4

Scheduling and periodic tasks

If you use periodic scheduling (beat), run:

celery -A celery_app.celery beat --loglevel=info
# Or run worker + beat in separate terminals/containers

Monitoring

  • Use Flower (optional) for monitoring tasks: pip install flower then run
celery -A celery_app.celery flower --port=5555

Redis

Celery requires a broker. The project expects REDIS_URL or similar environment variable. The default docker-compose provides Redis. For local development you can run Redis via Docker as shown above.

Database and migrations

The project uses SQLAlchemy in database/ and a small operations layer in database/operations.py to interact with submissions, teams and scores. By default the database is SQLite, controlled via DATABASE_URL environment variable.

Bootstrap the DB and seed data:

python create_teams.py

If you move to Postgres for production, update DATABASE_URL and adjust docker-compose.yml to include a Postgres service, then run normal SQLAlchemy migrations (if you integrate Alembic).

Running inference and scoring

The actual per-submission evaluation happens in run_inference_job.py. It expects arguments describing:

  • which model file to load (path under models/)
  • which dataset split to evaluate (private/public)
  • an output directory for predictions

High-level runner contract (inputs/outputs):

  • Inputs: model path, dataset split id, device (cpu/cuda optional), inference timeout
  • Outputs: predictions file(s) in a structured format (CSV/JSON/NPY), a metrics JSON written by scoring.py or the task wrapper

Scoring is implemented in scoring.py. The runner should produce outputs compatible with the scorer.

Example (local run for testing):

python run_inference_job.py --model models/example.pth --split private --output tmp/outdir
python scoring.py --predictions tmp/outdir/preds.json --split private

Notes on timeouts and resource limits

  • The Celery task wrapper should set a hard timeout for evaluation tasks to avoid long-running or hung jobs.
  • In Docker-based sandboxing, enforce CPU/memory limits (docker run flags or compose deploy.resources) so that user models cannot exhaust host resources.

API & Discord bot

API:

  • api_server.py exposes endpoints under /api/leaderboard/* to fetch public/private leaderboards and team stats.

Discord bot:

  • main.py contains the Discord bot implementation. It registers slash commands such as /register_participant, /evaluate_submission, and /leaderboard.
  • The bot should be run with DISCORD_TOKEN in the environment.

Submitting a model via Discord (high-level):

  1. The user uploads a .pth or .onnx file and triggers /evaluate_submission.
  2. The bot stores the artifact in models/ under a unique name and enqueues a Celery evaluation task.
  3. The user receives updates when the evaluation finishes and a link to the leaderboard entry.

Testing

There are a few tests under tests/ (for example tests/test_parser.py). Run them with pytest:

pytest -q

Add tests for new functionality — especially for scoring and runner I/O, and for the API endpoints.

Deployment notes

Small-scale production deployment suggestions:

  • Use Docker Compose or Kubernetes. For higher scale prefer k8s.
  • Use a managed Redis (or a resilient cluster) as Celery broker.
  • Use Postgres for DB in production and run migrations with Alembic.
  • Run Celery workers with autoscaling (K8s HPA or a horizontal worker autoscaler) depending on queue depth.
  • Run the Discord bot in a separate deployment and add health checks.
  • Secure the API (authentication/authorization) if exposing beyond internal usage.

Example docker-compose production snippet (conceptual):

services:
  redis:
    image: redis:7
  api:
    build: .
    command: python api_server.py
    environment:
      - DATABASE_URL=${DATABASE_URL}
  worker:
    build: .
    command: celery -A celery_app.celery worker --loglevel=info
    environment:
      - REDIS_URL=${REDIS_URL}

Troubleshooting & tips

  • If Celery tasks are not running: confirm Redis is reachable and that worker is started with correct app (A flag) and queue.
  • If Docker runner cannot access model/testset: ensure proper mounts in docker run or compose volumes, avoid mounting any sensitive host path.
  • If scoring results mismatch expectations: check that predictions format matches scoring.py expectations (IDs, ordering, label format).

Logs

  • Worker logs: stdout where Celery worker runs.
  • API logs: stdout of api_server.py.
  • Runner logs: captured by the task wrapper — ensure each run writes a run-specific log file under tmp/job-<uuid>/.

File layout (high level)

  • main.py — Discord bot
  • api_server.py — Flask API
  • celery_app.py, celery_tasks.py — Celery config + tasks
  • run_inference_job.py — Runner invoked by tasks
  • scoring.py — Evaluation metrics
  • inference.py — Inference utilities
  • create_teams.py — DB seeding
  • database/ — models, operations
  • models/ — uploaded model artifacts
  • testset/ — private dataset (not to be shared)
  • tmp/ — job-specific temporary outputs and logs

Security & sandboxing

  • Always run user-submitted code in an isolated environment (Docker container with strict resource limits and no host mounts except necessary inputs/outputs).
  • Never run untrusted code as root inside containers.
  • Validate model artifacts and uploaded files (size limits, file type checks) before enqueuing tasks.

Contributing

  • Fork, create a feature branch, add tests for new behavior, and open a pull request.
  • Keep changes small, add documentation updates in README.md or the docs/ directory if created.

Suggested follow-ups

  1. Add CI (GitHub Actions) to run tests and flake/lint on PRs.
  2. Add Alembic migrations if you move to a RDBMS other than SQLite.
  3. Add a minimal integration test that runs a small model inside a container to validate the runner + scoring pipeline.

Contact

If you need help or want to extend the project, open an issue on the repo with details about your use-case and environment.

About

This repository contains the source code of a Discord bot system for evaluating machine learning models on a private testset, with secure Docker sandboxing, asynchronous job processing via Celery, and leaderboard APIs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published