This project demonstrates a general workflow for provenance-aware metadata on digital assets:
- Model metadata in JSON-LD (Dublin Core, PROV-O, Schema.org)
- Validate with SHACL (custom policy rules: EDM, PREMIS)
- Sign assets with C2PA (actions + rights)
- Serve via FastAPI and a dynamic IIIF Presentation 3.0 manifest
- Fetch from Wikimedia Commons and auto-populate
metadata/source.yml
→python src/cli.py build-from-commons --title "File:Leibniz_University_Hannover.jpg" - Richer standards & policies: extended SHACL shapes to cover EDM and PREMIS (e.g., event typing & dateTime checks)
- Verification endpoint:
/verifyreturns c2patool --detailed output for the signed image - Publish container to GHCR: GitHub Actions builds & pushes on
v*tags →ghcr.io/<owner>/<repo>:v0.3 - Tests in CI: basic pytest to ensure build + validation keep passing
- Builder normalizations: clean creator (no HTML), ISO date, CC license URL trailing
/, clearerprov:wasDerivedFrom(binary)
data/ # Input/output assets
image.jpg # Original image (fetched or provided)
image.c2pa.jpg # Signed image (generated)
metadata/
source.yml # Source fields (auto from Commons or manual)
record.jsonld # Built JSON-LD (do not hand-edit)
shacl.ttl # SHACL shapes (EDM + PREMIS rules)
claim.json # C2PA claim (actions + rights)
src/
fetch_commons.py # Fetch & map Commons metadata + download image
build_metadata.py # Build JSON-LD from source.yml (normalizes fields)
validate_metadata.py # SHACL validation
sign_c2pa.sh # Signing (env-driven with dev fallback)
api.py # FastAPI app (record, image, dynamic IIIF, /verify)
cli.py # CLI: build/validate/sign/serve/info/build-from-commons
.github/
workflows/validate.yml # CI: SHACL + tests on push/PR
workflows/docker-publish.yml # CI: publish Docker image to GHCR on tags
tests/
test_build_and_validate.py # Basic build+validate test
Makefile # make build | validate | sign | serve | info | all
requirements.txt # Runtime deps (incl. pytest)
environment.yml # Conda environment (local dev)
Dockerfile # Containerized API service
conda env create -f environment.yml
conda activate Provenance-Aware-MetadataChoose one:
Prebuilt (recommended)
# download release for your OS (e.g., v0.9.12), then:
sudo mv c2patool /usr/local/bin/
c2patool -VCargo (pin version)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
cargo install --locked c2patool --version 0.9.12
~/.cargo/bin/c2patool -VA) Build from Commons (Phase 3)
# fetch & normalize source.yml and image.jpg, then build & validate
python src/cli.py build-from-commons --title "File:Leibniz_University_Hannover.jpg"If your network blocks the image CDN, you can skip the binary fetch:
SKIP_DOWNLOAD=1 python src/cli.py build-from-commons --title "File:Leibniz_University_Hannover.jpg"
# or prefetch with IPv4:
wget -4 -O data/image.jpg "https://upload.wikimedia.org/wikipedia/commons/e/ea/Leibniz_University_Hannover.jpg"B) Manual workflow (Phase 2 style)
python src/cli.py build
python src/cli.py validate # expect: Conforms: True
python src/cli.py sign # or: bash src/sign_c2pa.sh
python src/cli.py serve # open http://127.0.0.1:8000Endpoints:
/record→ JSON-LD metadata/image→ signed image (falls back to unsigned if missing)/iiif/manifest→ dynamic IIIF Presentation 3.0/verify→ C2PA verification report (c2patool--detailed)
Makefile shortcuts
make build
make validate
make sign
make serve
# or everything (pipeline except docker):
make allBuild
docker build -t provenance-metadata:dev .Run
# image baked in container
docker run --rm -p 8000:8000 provenance-metadata:dev
# OR mount host data folder (to use local signed file)
docker run --rm -p 8000:8000 -v "$(pwd)/data:/app/data" provenance-metadata:devRun from GHCR (after tagging v0.3)
docker run --rm -p 8000:8000 ghcr.io/<owner>/<repo>:v0.3- Validation workflow:
.github/workflows/validate.ymlruns SHACL + tests on push/PR tomain/dev. - Publish to GHCR:
.github/workflows/docker-publish.ymlbuilds & pushes on tagsv*.
-
Phase 1 (done)
- Manual JSON-LD modeling
- SHACL validation
- C2PA (dev key)
- Static IIIF
-
Phase 2 (done)
- CLI workflow (build/validate/sign/serve/info)
- YAML→JSON-LD
- Dynamic IIIF
- CI validation
- Env-driven signing (fallback)
- Dockerized API
-
Phase 3 (this release)
- Fetch from Commons
- Extended EDM/PREMIS SHACL
/verifyendpoint- GHCR publish
- Tests in CI
- Builder normalizations (creator/date/license/provenance)
-
Phase 4 (planned)
- Integrations (Europeana/Zenodo)
- Secure key mgmt (remote signer/KMS)
- More datasets & examples
- Packaging for production
- Code: MIT
- Image: Leibniz University Hannover (CC BY-SA 3.0), from Wikimedia Commons.
- v0.1 — Manual prototype (metadata, SHACL, C2PA, static IIIF).
- v0.2 — Automation & containerization (CLI, YAML→JSON-LD, dynamic IIIF, API fallback, CI, Docker, env-driven signing).
- v0.3 — Integrations & policies (Commons fetch, PREMIS/EDM,
/verify, GHCR, tests)