postgres2parquet-go

Efficiently snapshot PostgreSQL tables to Parquet files in S3 — written in Go.
postgres2parquet-go is a lightweight, high-performance tool designed to export entire PostgreSQL tables into columnar Parquet format and store them directly in S3. It was originally built to support reliable pre-replication snapshots for CDC workflows, especially for large tables where traditional snapshot mechanisms can be slow or resource-intensive.

Unlike heavier frameworks that require complex cluster setups, this tool runs as a simple binary with minimal dependencies—making it ideal for containerized environments, CI/CD pipelines, or one-off data exports.

Why This Tool?
While solutions like Apache Spark (via JDBC) or Apache Flink CDC can achieve similar goals, they often come with operational overhead. postgres2parquet-go offers a focused, zero-friction alternative:

Native Parquet output using efficient Go libraries (parquet-go)
Direct streaming to S3 without intermediate disk storage
Optimized for large tables with configurable chunking and concurrency (pgx)
Minimal runtime footprint — no JVM, no cluster coordination
Produces consistent, point-in-time snapshots suitable for initializing CDC pipelines

Whether you're bootstrapping a change data capture pipeline, archiving historical data, or preparing datasets for analytics, postgres2parquet-go delivers speed, simplicity, and reliability.

Usage

Mode	Description
default	This mode is useful for fetching simple postgres table. If `partition-column` is set then table will be splited into chunks and each chunk will be fetched concurrently. Otherwise the whole table will be fetched in a single select
partitioned	This mode allows you to fetch each partition of partitioned table concurrently. Moreover, if you specify `partition-column` than each partition will be additionally split into chunks. This allows you to also control parallelism more granular
hypertable	Special mode for hypertable (timescaledb). Logic is very similar to partitioned table. The only difference is how partitions are represented. Each partition also can be split into additional chunks if `partition-column` is specified

ENVS ./postgres2parquet-go {mode} {flags}

Important

Keep in mind that partition-column should be numeric or temporal type. Also, this column should be indexed or be a part of a composite index (in case of partitioned and hypertable tables). Otherwise each query will cause a table full-scan which will lead to degraded performance and negatively affect database.

Example:

PG2P_DB_PASSWORD=postgres PG2P_DB_USER=postgres PG2P_S3_ACCESS_KEY=admin PG2P_S3_SECRET_ACCESS_KEY=password ./postgres2parquet-go snapshot default \
  --schema public \
  --table all_simple_types \
  --partition-column primary_key \
  --parallelism 200 \
  --bucket warehouse \
  --fetch-size 10000 \
  --file-path /schema/table/ \
  --file-size-threshold 52428800 \
  --db-host localhost \
  --db-port 5432 \
  --db-name postgres \
  --db-pool-size 20 \
  --s3-endpoint localhost:9000 \
  --s3-region none \
  --s3-secure false

Options

Option	Required	Default	Description
db-host	false	localhost	DB host
db-port	false	5432	DB port
db-name	false	postgres	DB name
db-pool-size	false	1	DB connection pool size
s3-endpoint	false	localhost:9000	S3-like storage endpoint
s3-region	false	none	S3 region
s3-secure	false	false	If true then `https` will be used to connect to storage. Otherwise `http`
schema	true		Schema of snapshotted table
table	true		Name of snapshotted table
partition-column	false		Column which should be used to split table into chunks. Type of column should be numeric or timestamp. Of specified then table will be split into chunks and chunks will be fetched concurrently. If not specified then whole table (default) or each partition (partitioned, hypertable) will be fetched without additional splitting
parallelism	false	1	Amount of chunks
fetch-size	false	10000	Rows per single fetch from each chunk
file-size-threshold	false	1 Mb	Soft-limit of parquet file size
file-path	false	/	Target file path location in S3
bucket	true		Target bucket
(env) PG2P_DB_PASSWORD	true		DB password
(env) PG2P_DB_USER	true		DB user
(env) PG2P_S3_ACCESS_KEY	true		S3 access-key
(env) PG2P_S3_SECRET_ACCESS_KEY	true		S3 secret-access-key

Types mappings

Postgres	Parquet
smallint (int2)	Integer
integer, serial (int4)	Integer
bigint, bigserial (int8)	Bigint
text	Varchar
char	Varchar
varchar	Varchar
xml	Varchar
json	Varchar
jsonb	Varchar
inet	Varchar
macaddr	Varchar
decimal(p, s)	Decimal(p, s). Defaults if no specified: precision=18, scale=4
numeric(p, s)	Decimal(p, s). Defaults if no specified: precision=18, scale=4
real (float4)	Float
double precision (float8)	Double
bool	Boolean
timestamp without timezone	Timestamp (millis)
timestamp with timezone	TimestampTZ (millis)
time without timezone	Time (millis)
date	Date
uuid	UUID
custom type	Varchar
[]T (except custom types)	[]T

All arrays represented as:

<repeat type> group <column> (LIST) {
    repeated group list {
        required <type> element;
    }
}

Limitations

TIMETZ is not supported
HASH partition type is not supported
Arrays of custom types are not supported
If array contain NULL values then default value will be used (zero value for primitives and nil for pointers)

Examples

Some usage examples can be found in examples directory:

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
.run		.run
cmd		cmd
development		development
examples		examples
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

postgres2parquet-go

Usage

Options

Types mappings

Limitations

Examples

About

Uh oh!

Releases 1

Packages

Languages

License

nryanov/postgres2parquet-go

Folders and files

Latest commit

History

Repository files navigation

postgres2parquet-go

Usage

Options

Types mappings

Limitations

Examples

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages