Skip to content

"Example Python scripts for connecting to DuckLake Container - a data lakehouse in a box"

License

Notifications You must be signed in to change notification settings

lokryn-llc/ducklake-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuckLake Container - Python Examples

Example Python scripts showing how to connect to and use DuckLake Container.

What is DuckLake?

DuckLake Container is a data lakehouse in a box. It combines PostgreSQL (for catalog storage) with DuckDB's DuckLake extension to give you a fully functional lakehouse that stores data as Parquet files in S3.

  • 5 minute setup
  • Your data stays in your cloud
  • Query millions of records in seconds

Available on AWS Marketplace - search "DuckLake" or "Lokryn".

Prerequisites

  • Python 3.10+
  • uv (recommended) or pip
  • A running DuckLake Container (local via Docker or on AWS)
  • AWS credentials configured (for S3 access)

Quick Start

1. Clone and install dependencies

git clone https://github.com/lokryn-suite/ducklake-examples.git
cd ducklake-examples

# Using uv (recommended)
uv sync

# Or using pip
pip install -r requirements.txt

2. Configure environment

cp .env.example .env

Edit .env with your DuckLake connection details:

POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=admin
POSTGRES_PASSWORD=your-password
POSTGRES_DB=lakehouse

S3_BUCKET_URL=s3://your-bucket
AWS_REGION=us-east-1
AWS_PROFILE=your-profile  # Optional, for local dev with SSO

3. Run the examples

# Basic connection test
uv run python basic_example.py

# Load and query large dataset (1.4 million rows)
uv run python large_dataset_example.py

Sample Data

We provide free sample datasets for testing:

Dataset Rows URL
people.csv 1.4 million https://sample.ondoriya.com/people.csv

The large_dataset_example.py script uses this by default. You can also use your own CSV:

uv run python large_dataset_example.py /path/to/your/data.csv

Examples

basic_example.py

Simple connection test - creates a table, inserts rows, queries them back.

large_dataset_example.py

Loads a CSV (local file or URL) into DuckLake and runs aggregation queries with timing. Uses the 1.4 million row sample dataset by default.

Documentation

License

MIT License - see LICENSE


Built by Lokryn LLC

About

"Example Python scripts for connecting to DuckLake Container - a data lakehouse in a box"

Topics

Resources

License

Stars

Watchers

Forks

Languages