Welcome to the runtime repository for the On Top of Pasketti: Childrenβs Speech Recognition Challenge on DrivenData! This repository contains a few things:
- Runtime environment specification (
runtime/) β the definition of the environment where your code will run. - Example submissions (
examples/) β example submissions for both the Word and Phonetic tracks. Each will run successfully in the code execution runtime and output a valid submission. - Metric code (
metrics/) β code to compute the evaluation metrics for both tracks.
You can use this repository to:
π§ Test your submission: Test your submission using a locally running version of the competition runtime to discover errors before submitting to the competition website.
π¦ Request new packages in the official runtime: Since your submission will not have general access to the internet, all dependencies must be pre-installed. If you want to use a package that is not in the runtime environment, make a pull request to this repository. Make sure to test out adding the new package to both official environments, CPU and GPU.
Changes to the repository are documented in CHANGELOG.md.
3. Testing a submission locally
- Prerequisites
- Data directory
- Code submission format
- Running your submission locally
- Metric scoring code
- Running one of the example submissions
- Smoke tests
- Runtime network access
4. Updating runtime dependencies
The runtime specification can be found in the runtime/ directory.
- Abstract Python dependencies are declared in
pyproject.toml - The uv lockfile specifying the Python environment is at
uv.lock - The test harness script is
entrypoint.sh. This is the what the container runs that calls your submitted code. - The Docker image specification is given by
Dockerfile
This repository contains the following example submissions:
- Word Track
word/minimalβ a minimal example submission with a fake model (reads a hard-coded value from file and predicts it). You can use this as a template for your own submission.word/parakeetβΒ a more substantial example using NVIDIA's Parakeet TDT 0.6B V2. See the README for more details.
- Phonetic Track
phonetic/minimalβΒ a minimal example submission with a fake model (reads a hard-coded value from file and predicts it). You can use this as a template for your own submission.phonetic/parakeet-cmudictβΒ a more substantial example using NVIDIA's Parakeet TDT 0.6B V2 for automatic speech recognition and CMUdict for to transform to IPA. See the README for more details.
When you make a submission on the DrivenData competition site, we run your submission inside a Docker container, a virtual operating system that allows for a consistent software environment across machines. The best way to make sure your submission to the site will run is to first run it successfully in the container on your local machine.
This is what a typical solution development flow looks like. You do all your work in /submission_src/ and then package that up testing it locally, with smoke tests, and then doing a full run.
flowchart TB
%% --- Style tokens (DrivenData palette) ---
classDef action fill:#C5D8F3,stroke:#17344A,stroke-width:1.5px,color:#17344A;
classDef artifact fill:#F2F2F2,stroke:#617182,stroke-width:1.5px,color:#17344A;
develop(Develop inference script<br/><code>submission_src/main.py</code>):::action
pack(Package submission<br/><code>just pack-submission</code>):::action
zip[[Creates<br/><code>submission/submission.zip</code>]]:::artifact
word(Word track run<br/><code>just track=word run</code>):::action
phon(Phonetic track run<br/><code>just track=phon run</code>):::action
smoke(Recommended: submit smoke test):::action
submit(Submit normal submission):::action
develop --> pack --> zip
zip --> word --> smoke --> submit
zip --> phon --> smoke --> submit
Using this repository requires the following:
- A local clone of the repository
- Just (>=1.40.0) βΒ a command runner to run various predefined tasks
- Docker
- At least 12 GB of free space for the Docker image
Tip
This repository uses the Just task runner. You use it on the command line with commands like just pull. You can run just by itself to see documentation for all available commands.
Note
If you are installing Just on Ubuntu 24.04 LTS, we recommend installing with snap or with the pre-built binaries. apt install just does not provide a recent enough version (>=1.40.0) for Ubuntu 24.04.
Additional requirements to test submissions with the GPU:
- NVIDIA drivers with CUDA 12
- NVIDIA container toolkit
In the official code execution platform, /code_execution/data will contain data provided for the test set. When testing your submission locally, we've provided a small demo sample of data formatted like the test data that will be used by default. You can find the demo data in data-demo β one of the data-demo/word or data-demo/phonetic directories will be mounted into the container as /code_execution/data, depending on the active track. If you want to mount a different data directory, you can override this by setting the KIDSASR_DATA_DIR environment variable to an arbitrary directory path. This can be useful if, for example, you want to test your submission on a validation set that you've created from the training data. Be sure to match the expected directory structure documented on the challenge website.
Tip
You can set the KIDSASR_DATA_DIR environment variable either with the export command or using a .env file. See .env.example.
Your final submission should be a ZIP archive named with the extension .zip (for example, submission.zip). The root level of the submission.zip file must contain a main.py which will be run by the container. For local testing, the justfile commands expect this file to be located at submission/submission.zip within this repository.
This section provides instructions on how to run the your submission in the code execution container from your local machine. Key steps in the process have been defined as Just recipes in the justfile. Commands are run with just {command_name}. Some commands need you to specify a track, like `just track={track_name} {command_name}. For example, to test a submission for the Word track:
# Pull latest tag of official image
just pull
# Make a `submission.zip` file and put it at `submission/submission.zip`. Remember it must contain a `main.py` at the root.
# [Optional] Put your files in submission_src/ and use our convenience just recipe
just pack-submission
# Run the code execution test harness using your submission.
just track=word run
# OR
just track=phonetic runTip
Some commands require specifying a track. If you want to avoid typing track=... every time, you can set an environment variable KIDSASR_TRACK instead. You can do that either with the export command or using a .env file. See .env.example.
Run just help for more information about the available commands as well as information on the official and built images that are available locally.
Here's the process in a bit more detail:
-
First, make sure you have set up the prerequisites.
-
Download the official competition Docker image:
just pull
-
Create a
submission.zipfile.
-
[Optional] You can move the files (code, model weights) you want to include your submission into the
submission_srcfolder of the runtime repository. Then run:just pack-submission
[!IMPORTANT] Note that the required
main.pyfile should be in the archive root. Be careful if you're zipping up a folder that the contents are not nested inside the folder in the ZIP archive.- β
CORRECT β
main.py - β INCORRECT β
my_submission/main.py
- β
CORRECT β
-
Launch the test harness in a Docker container, and run the same inference process that will take place in the official runtime:
just track=word run # OR just track=phonetic run
This runs the container entrypoint script. First, it unzips submission/submission.zip into /code_execution/src/ in the container. Then, it runs the main.py script you've provided. In the local testing setting, the final submission is saved out to submission/submission.jsonl on your local machine. Logs will be printed out to console and saved to submission/log.txt.
Note
If you're trying to test a local version of your runtime image that you've built with just build, you can use the dev-prefixed command just dev-run instead.
We also provide a way to run the metric scoring code locally to evaluate your models. This implementation uses the same normalization as the competition so should produce similar scores. The code is in the metric directory: metric/score.py.
You can run the scoring script with the path to your predictions and the path to the ground truth data. The script will automatically determine whether to calculate Word Error Rate (WER) or IPA Character Error Rate (CER) based on the contents of the ground truth file.
It can be run from the command line with uv, which will automatically set up the environment with the required dependencies::
uv run ./metric/score.py <path_to_predictions.jsonl> <path_to_ground_truth.jsonl>You can also import the contents of score.py and use them in your own code. The methods and data we provide are:
| Name | Type | Description |
|---|---|---|
score_wer |
Function | Calculates Word Error Rate (WER) between predicted and actual sequences. Uses english_spelling_normalizer. |
score_ipa_cer |
Function | Calculates IPA Character Error Rate (CER) between predicted and actual IPA sequences. Normalizes IPA strings first. |
score_jsonl |
Function | Calculates WER or IPA-CER between predicted and actual transcriptions stored in JSONL files. |
normalize_ipa |
Function | Normalizes IPA strings: NFC normalization, removing tie bars/stress, decomposing nasals, etc. |
validate_ipa_characters |
Function | Checks if IPA string contains only characters in VALID_IPA_CHARS. |
english_spelling_normalizer |
Dictionary | Mapping for normalizing English spelling (e.g., British to American), used in WER calculation. |
VALID_IPA_CHARS |
List | Set of valid IPA characters allowed in the phonetic track. |
In general, you can just use score_jsonl to evaluate your predictions against ground truth data, and it will handle the rest for you. You can use score_wer and score_ipa_cer if you want to calculate the metrics directly, and they will apply the same normalization that the competition uses.
The example submissions listed above can also be packed into a submission.zip that can be tested locally or submitted. You can use the just command:
just pack-example {example relative path}e.g.,
just pack-example word/minimalfor the example located at examples/word/minimal.
When submitting on the platform, you will have the ability to submit "smoke tests". Smoke tests run on a small portion of the training set that is set up to emulate the test set in order to run quickly. They will not be considered for prize evaluation and are intended to let you test your code for correctness.
In the real competition runtime, all internet access is blocked. The justfile commands similarly disable internet access from the container. This is controlled by the block_internet variable. To run with internet access, you can for example do just block_internet=false run to run a submission.
If you want to use a package that is not in the environment, you are welcome to make a pull request to this repository. If you're new to the GitHub contribution workflow, check out this guide by GitHub.
The runtime manages dependencies using uv.
To submit a pull request for a new package:
-
Fork this repository.
-
Install uv. See here for installation options.
-
Edit the
dependenciesarray inruntime/pyproject.toml. -
Run
just lockto update the lockfile. -
Locally test that the Docker image builds and passes tests:
just test-build just test-run
-
Locally test that your code runs as you expect in the container:
just dev-build just dev-run # run inference on a submission in container # OR just dev-interact # interactive bash shell in container
-
Commit the changes to your forked repository. Ensure that your branch includes updated versions of both of the following:
runtime/pyproject.tomlruntime/uv.lock
-
Open a pull request from your branch to the
mainbranch of this repository. Navigate to the Pull requests tab in this repository, and click the "New pull request" button. For more detailed instructions, check out GitHub's help page. -
Once you open the pull request, we will use Github Actions to build the Docker images with your changes and run the tests in
runtime/tests. For security reasons, administrators may need to approve the workflow run before it happens. Once it starts, the process can take up to 10 minutes, and may take longer if your build is queued behind others. You will see a section on the pull request page that shows the status of the tests and links to the logs. -
You may be asked to submit revisions to your pull request if the tests fail or if a DrivenData staff member has feedback. Pull requests won't be merged until all tests pass and the team has reviewed and approved the changes.