Skip to content

Commit 84c8986

Browse files
author
DuckGuard
committed
feat: enterprise-first positioning + platform docs + community outreach
README: Lead with S3/Snowflake/Databricks, not CSV. 'Any data source' tagline. Docs: New Platforms section — Snowflake, Databricks, Kaggle/Colab guides Notebooks: Updated getting_started.ipynb + new kaggle_data_quality.ipynb Launch: Community outreach strategy (Kaggle, HF, Streamlit, dbt, Reddit) mkdocs.yml: Added Platforms nav section
1 parent db91933 commit 84c8986

File tree

9 files changed

+1909
-1792
lines changed

9 files changed

+1909
-1792
lines changed

README.md

Lines changed: 41 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
<img src="docs/assets/duckguard-logo.svg" alt="DuckGuard" width="420">
33

44
<h3>Data Quality That Just Works</h3>
5-
<p><strong>3 lines of code</strong> &bull; <strong>10x faster</strong> &bull; <strong>20x less memory</strong></p>
5+
<p><strong>3 lines of code</strong> &bull; <strong>Any data source</strong> &bull; <strong>10x faster</strong></p>
66

7-
<p><em>Stop wrestling with 50+ lines of boilerplate. Start validating data in seconds.</em></p>
7+
<p><em>One API for CSV, Parquet, Snowflake, Databricks, BigQuery, and 15+ sources. No boilerplate.</em></p>
88

99
[![PyPI version](https://img.shields.io/pypi/v/duckguard.svg)](https://pypi.org/project/duckguard/)
1010
[![Downloads](https://static.pepy.tech/badge/duckguard)](https://pepy.tech/project/duckguard)
@@ -15,7 +15,7 @@
1515
[![Docs](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://xdatahubai.github.io/duckguard/)
1616

1717
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/XDataHubAI/duckguard/blob/main/examples/getting_started.ipynb)
18-
[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/XDataHubAI/duckguard/blob/main/examples/getting_started.ipynb)
18+
[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/XDataHubAI/duckguard/blob/main/examples/kaggle_data_quality.ipynb)
1919
</div>
2020

2121
---
@@ -29,16 +29,47 @@ pip install duckguard
2929
```python
3030
from duckguard import connect
3131

32-
orders = connect("orders.csv") # CSV, Parquet, JSON, S3, databases...
32+
orders = connect("s3://warehouse/orders.parquet") # Cloud, local, or warehouse
3333
assert orders.customer_id.is_not_null() # Just like pytest!
34-
assert orders.total_amount.between(0, 10000) # Readable validations
34+
assert orders.total_amount.between(0, 10000) # Readable validations
3535
assert orders.status.isin(["pending", "shipped", "delivered"])
3636

3737
quality = orders.score()
3838
print(f"Grade: {quality.grade}") # A, B, C, D, or F
3939
```
4040

41-
**That's it.** No context. No datasource. No validator. No expectation suite. Just data quality.
41+
**That's it.** Same 3 lines whether your data lives in S3, Snowflake, Databricks, or a local CSV. No context. No datasource. No validator. No expectation suite. Just data quality.
42+
43+
### Works with Your Data Stack
44+
45+
```python
46+
from duckguard import connect
47+
48+
# Data Lakes
49+
orders = connect("s3://bucket/orders.parquet") # AWS S3
50+
orders = connect("gs://bucket/orders.parquet") # Google Cloud
51+
orders = connect("az://container/orders.parquet") # Azure Blob
52+
53+
# Data Warehouses
54+
orders = connect("snowflake://account/db", table="orders") # Snowflake
55+
orders = connect("databricks://host/catalog", table="orders") # Databricks
56+
orders = connect("bigquery://project", table="orders") # BigQuery
57+
orders = connect("redshift://cluster/db", table="orders") # Redshift
58+
59+
# Modern Table Formats
60+
orders = connect("delta://path/to/delta_table") # Delta Lake
61+
orders = connect("iceberg://path/to/iceberg_table") # Apache Iceberg
62+
63+
# Databases
64+
orders = connect("postgres://localhost/db", table="orders") # PostgreSQL
65+
orders = connect("mysql://localhost/db", table="orders") # MySQL
66+
67+
# Files & DataFrames
68+
orders = connect("orders.parquet") # Parquet, CSV, JSON, Excel
69+
orders = connect(pandas_dataframe) # pandas DataFrame
70+
```
71+
72+
> **15+ connectors.** Install what you need: `pip install duckguard[snowflake]`, `duckguard[databricks]`, or `duckguard[all]`
4273
4374
---
4475

@@ -93,7 +124,10 @@ validator.expect_column_values_to_be_between(
93124
```python
94125
from duckguard import connect
95126

96-
orders = connect("orders.csv")
127+
orders = connect(
128+
"snowflake://account/db",
129+
table="orders"
130+
)
97131

98132
assert orders.customer_id.is_not_null()
99133
assert orders.total_amount.between(0, 10000)
@@ -247,40 +281,6 @@ pip install duckguard[all] # Everything
247281

248282
---
249283

250-
## Connect to Anything
251-
252-
```python
253-
from duckguard import connect
254-
255-
# Files
256-
orders = connect("orders.csv")
257-
orders = connect("orders.parquet")
258-
orders = connect("orders.json")
259-
260-
# Cloud Storage
261-
orders = connect("s3://bucket/orders.parquet")
262-
orders = connect("gs://bucket/orders.parquet")
263-
orders = connect("az://container/orders.parquet")
264-
265-
# Databases
266-
orders = connect("postgres://localhost/db", table="orders")
267-
orders = connect("mysql://localhost/db", table="orders")
268-
orders = connect("snowflake://account/db", table="orders")
269-
orders = connect("bigquery://project/dataset", table="orders")
270-
orders = connect("databricks://workspace/catalog/schema", table="orders")
271-
orders = connect("redshift://cluster/db", table="orders")
272-
273-
# Modern Formats
274-
orders = connect("delta://path/to/delta_table")
275-
orders = connect("iceberg://path/to/iceberg_table")
276-
277-
# pandas DataFrame
278-
import pandas as pd
279-
orders = connect(pd.read_csv("orders.csv"))
280-
```
281-
282-
**Supported:** CSV, Parquet, JSON, Excel | S3, GCS, Azure Blob | PostgreSQL, MySQL, SQLite, Snowflake, BigQuery, Redshift, Databricks, SQL Server, Oracle, MongoDB | Delta Lake, Apache Iceberg | pandas DataFrames
283-
284284
---
285285

286286
## Cookbook

docs/index.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ hide:
77

88
## Data Quality That Just Works
99

10-
**3 lines of code** · **10x faster** · **20x less memory**
10+
**3 lines of code** · **Any data source** · **10x faster**
1111

12-
Stop wrestling with 50+ lines of boilerplate. Start validating data in seconds.
12+
One API for CSV, Parquet, Snowflake, Databricks, BigQuery, and 15+ sources. No boilerplate.
1313

1414
```bash
1515
pip install duckguard
@@ -18,16 +18,16 @@ pip install duckguard
1818
```python
1919
from duckguard import connect
2020

21-
orders = connect("orders.csv")
22-
assert orders.customer_id.is_not_null()
21+
orders = connect("s3://warehouse/orders.parquet") # or Snowflake, Databricks, CSV...
22+
assert orders.customer_id.is_not_null() # Just like pytest!
2323
assert orders.total_amount.between(0, 10000)
2424
assert orders.status.isin(["pending", "shipped", "delivered"])
2525

2626
quality = orders.score()
2727
print(f"Grade: {quality.grade}") # A, B, C, D, or F
2828
```
2929

30-
That's it. No context. No datasource. No validator. No expectation suite. Just data quality.
30+
Same 3 lines whether your data lives in S3, Snowflake, Databricks, or a local CSV.
3131

3232
---
3333

@@ -65,6 +65,10 @@ Every data quality tool asks you to write **50+ lines of boilerplate** before yo
6565

6666
CSV, Parquet, S3, PostgreSQL, Snowflake, BigQuery, and more
6767

68+
- :material-snowflake: **[Snowflake](platforms/snowflake.md)** · :material-fire: **[Databricks](platforms/databricks.md)** · :material-notebook: **[Kaggle](platforms/kaggle.md)**
69+
70+
Platform-specific guides for your data stack
71+
6872
- :material-puzzle: **[Integrations](integrations/pytest.md)**
6973

7074
pytest, dbt, Airflow, GitHub Actions, Slack, Teams

0 commit comments

Comments
 (0)