🎮 NitroGen BizHawk Dataset Generator

This project provides tools to create training datasets for NitroGen using the BizHawk emulator.

It consists of two parts:

Lua Script (export_dataset.lua): Runs inside BizHawk to export gameplay frames and controller input.
Python Script (convert_dataset.py): Converts the exported data into a Parquet file compatible with NitroGen training and pre-processes images (saves as Hugging Face datasets Image type).

📋 Prerequisites

BizHawk Emulator (Version 2.9+ recommended)
Python 3.8+
Git (optional, for cloning)

📦 Installation

Clone this repository or download the files.
Install Python dependencies:

pip install -r requirements.txt

🚀 Usage

Phase 1: 🎞️ Exporting from BizHawk

Open BizHawk.
Load your ROM (NES or SNES recommended).
Load a Movie file (.bk2) that you want to convert to a dataset.
- Tip: Ensure the movie mode is set to "Play".
Open the Lua Console (Tools > Lua Console).
Click Script > Open Script and select export_dataset.lua.
The script will automatically create a nitrogen_dataset/ folder and start exporting.
The script will automatically stop when the movie finishes.

Note: The script creates three items in your output directory:

frames/: Folder containing raw frame_XXXXXX.png images.

actions.csv: Raw CSV file with input data.

dataset_config.json: Configuration file containing the detected logic (e.g., resize mode based on console).

Phase 2: 🖼️ Converting and Processing

Once the Lua export is complete, use the Python script to package the data and process the images.

Open a terminal in the project directory.
Run the converter:

# Default usage 
# Reads from 'nitrogen_dataset/'
# Saves parquet to 'nitrogen_dataset/train.parquet' (images embedded)
python convert_dataset.py

# Specify custom input directory
python convert_dataset.py --input /path/to/my_export

# Skip image processing (only convert CSV)
python convert_dataset.py --skip-images

The output will contain:
- train.parquet: The single-file dataset containing both actions and embedded images (Hugging Face datasets compatible format).

🐳 Functionality via Docker

You can also run the converter using Docker, which handles all dependencies (including OpenCV) for you.

Build the Image:
```
docker build -t nitrogen-converter .
```

Run the Container: You need to mount your local dataset folder into the container.

# Run against the 'nitrogen_dataset' folder in your current directory
docker run --rm -v $(pwd)/nitrogen_dataset:/app/dataset nitrogen-converter --input /app/dataset --output /app/dataset/train.parquet

🧩 Image Processing Logic

The scripts automatically detect the best resize mode based on the console:

NES: Uses Crop mode (centers and crops to 256x256) to remove overscan borders.
SNES: Uses Pad mode (adds black borders) to maintain aspect ratio within 256x256.

This configuration is saved in dataset_config.json by the Lua script and applied by the Python script.

🧪 Testing

This project includes tests for both the Python and Lua components.

🐍 Python Tests

The Python tests cover image preprocessing and dataset conversion logic.

Calculated dependencies are required (installed via requirements.txt), plus pytest.
```
pip install pytest
```
Run the tests:
```
pytest tests/
```

🌙 Lua Tests

The Lua tests validation the input mapping logic and ensure the script structure is correct.

Requires a standard Lua 5.4 interpreter.
Run the tests:
```
lua tests/test_export_dataset.lua
```

🌍 Generated Datasets

Check out a real-world example of a dataset created with this tool:

🎮 Felix the Cat (NES) - World 1

A complete gameplay dataset of World 1, formatted for training vision-to-action models like NitroGen.

Game: Felix the Cat (NES)
Format: Parquet (images + controller inputs)
Size: ~25,000 frames
Source: Recorded via BizHawk, processed with this generator.

from datasets import load_dataset

# Load the dataset directly from Hugging Face
dataset = load_dataset("artryazanov/nitrogen-bizhawk-nes-felix-the-cat-world-1", split="train")

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
convert_dataset.py		convert_dataset.py
export_dataset.lua		export_dataset.lua
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎮 NitroGen BizHawk Dataset Generator

📋 Prerequisites

📦 Installation

🚀 Usage

Phase 1: 🎞️ Exporting from BizHawk

Phase 2: 🖼️ Converting and Processing

🐳 Functionality via Docker

🧩 Image Processing Logic

🧪 Testing

🐍 Python Tests

🌙 Lua Tests

🌍 Generated Datasets

🎮 Felix the Cat (NES) - World 1

📄 License

About

Uh oh!

Releases

Packages

Languages

License

artryazanov/nitrogen-bizhawk-dataset-generator

Folders and files

Latest commit

History

Repository files navigation

🎮 NitroGen BizHawk Dataset Generator

📋 Prerequisites

📦 Installation

🚀 Usage

Phase 1: 🎞️ Exporting from BizHawk

Phase 2: 🖼️ Converting and Processing

🐳 Functionality via Docker

🧩 Image Processing Logic

🧪 Testing

🐍 Python Tests

🌙 Lua Tests

🌍 Generated Datasets

🎮 Felix the Cat (NES) - World 1

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages