Example Python scripts showing how to connect to and use DuckLake Container.
DuckLake Container is a data lakehouse in a box. It combines PostgreSQL (for catalog storage) with DuckDB's DuckLake extension to give you a fully functional lakehouse that stores data as Parquet files in S3.
- 5 minute setup
- Your data stays in your cloud
- Query millions of records in seconds
Available on AWS Marketplace - search "DuckLake" or "Lokryn".
- Python 3.10+
- uv (recommended) or pip
- A running DuckLake Container (local via Docker or on AWS)
- AWS credentials configured (for S3 access)
git clone https://github.com/lokryn-suite/ducklake-examples.git
cd ducklake-examples
# Using uv (recommended)
uv sync
# Or using pip
pip install -r requirements.txtcp .env.example .envEdit .env with your DuckLake connection details:
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=admin
POSTGRES_PASSWORD=your-password
POSTGRES_DB=lakehouse
S3_BUCKET_URL=s3://your-bucket
AWS_REGION=us-east-1
AWS_PROFILE=your-profile # Optional, for local dev with SSO# Basic connection test
uv run python basic_example.py
# Load and query large dataset (1.4 million rows)
uv run python large_dataset_example.pyWe provide free sample datasets for testing:
| Dataset | Rows | URL |
|---|---|---|
| people.csv | 1.4 million | https://sample.ondoriya.com/people.csv |
The large_dataset_example.py script uses this by default. You can also use your own CSV:
uv run python large_dataset_example.py /path/to/your/data.csvSimple connection test - creates a table, inserts rows, queries them back.
Loads a CSV (local file or URL) into DuckLake and runs aggregation queries with timing. Uses the 1.4 million row sample dataset by default.
MIT License - see LICENSE
Built by Lokryn LLC