Think of this as a production line for accessibility: you feed it a sprawling PDF archive, and it spits back a compliant set without wrecking your folder structure. Under the hood it wires veraPDF and PDFix together to validate and remediate thousands of files fast, with the original layout intact.
-
Install uv
-
macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Windows PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"(For Windows, you may need to install VC++ Redistributable: https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170#latest-supported-redistributable-version)
-
-
Install Java (required for veraPDF validation).
-
Set the PDFix license in
.env:PDFIX_LICENSE_NAME="your-name" PDFIX_LICENSE_KEY="your-key"Check if the license is valid:
uv run -m pdf_remediation.license -
Install Docker Desktop (required for Callas/PDFix Docker font-fix steps). The tool now attempts to launch Docker Desktop automatically if it is not running.
-
Save the Callas license in
resources/font/.env
uv run -m pdf_remediation.go delnortego.py orchestrates fix, font_fix, font_fix_pdfix, and a final
validate --full --skip-page-count run. It also initializes missing
projects automatically.
If source/ is empty, go.py can automatically download and extract the
live files backup from Pantheon into source/.
Requirement: Terminus must be installed and already configured/authenticated.
uv run -m pdf_remediation.readyset delnorte alameda sonomareadyset.py runs go.py once per project in the order provided, prints a
high-visibility banner for each project, and stops on the first non-zero
exit code.
Here's an example walkthrough of remediating the Del Norte trial court.
-
Initialize a project:
uv run -m pdf_remediation.init delnorte -
Copy PDFs into
resources/projects/delnorte/source. -
Validate the PDFs to establish a baseline.
uv run -m pdf_remediation.validate delnorte -
Remediate PDFs:
uv run -m pdf_remediation.fix delnorte -
If font issues are flagged, run Callas font remediation:
uv run -m pdf_remediation.font_fix delnorte -
After Callas, run the PDFix missing-unicode font fix on any remaining font issues:
uv run -m pdf_remediation.font_fix_pdfix delnorte -
Run the fallback remediation on pdf's that were not remediated in #4.
a. Queue the files for re-processing (default scans all workspace subfolders that contain a
processed/directory):uv run -m pdf_remediation.reprocess delnorteb. Remediate with the fallback configuration.
uv run -m pdf_remediation.fix delnorte --config-file=default-fallback.json -
Run the fallback remediation on the files with remaining font issues.
a. Queue the files for re-processing:
uv run -m pdf_remediation.reprocess delnorte default font-issuesUse
font-issues-missing-unicodeinstead offont-issuesif you are reprocessing the PDFix font pass.b. Remediate with the fallback configuration (
reprocessreturns files toactive/files, so runfixonactive, which is the default):uv run -m pdf_remediation.fix delnorte --config-file=default-fallback.json -
Check the workspace status:
uv run -m pdf_remediation.status delnorte -
Review the reports:
- Standard validation/remediation runs:
resources/projects/<project>/workspace/<workspace>/<folder>/reports/<timestamp>-<directory> - Full workspace validation runs (
validate --full):resources/projects/<project>/workspace/<workspace>/reports/<timestamp>-full
- Standard validation/remediation runs:
-
Remediated files will be located in:
resources/projects/<project>/workspace/remediated/<folder>
High-level view of the end-to-end pipeline: initialize, validate, remediate, re-validate, and route results into the right workspace folders.
Bootstrap a project and get a clean baseline before remediation begins.
uv run -m pdf_remediation.init <project_name>
Copy PDFs into the printed resources/projects/<project>/source directory.
uv run -m pdf_remediation.validate <project_name> [workspace] [folder] [directory] [--full] [--skip-page-count]
Defaults:
workspace=defaultfolder=activedirectory=filesYou can target any workspace/subfolder/directory by passing these arguments. By default, validation runs against<workspace>/<folder>/<directory>.
Use --full to validate all PDFs in every workspace subfolder's files/ and
processed/ directories in one pass. This mode writes reports to
workspace/<workspace>/reports/<timestamp>-full and prints the scanned folders.
--full skips these operational subfolders: pdfix-cannot-process,
secured-cannot-process, secured-needs-approval, reports,
pdfix-unable-to-open, unable-to-validate, and unable-to-process.
Use --skip-page-count to skip the PDFix page count pass and run only veraPDF.
Validation runs both PDF/UA (veraPDF ua1) and WCAG 2.2 profiles by default.
Results include ua1 and wcag columns in vera_validation_results.csv, and
per-profile report folders under reports/<timestamp>-<directory> (for example,
xml/ua1, xml/wcag, summary/ua1, summary/wcag). To change profiles, edit
the profiles list in src/pdf_remediation/utilities/verapdf.py.
If the active/files folder is empty, the system copies PDFs from source/
into active/files once and creates .remediation.lock.
Run remediation, then loop back for another pass when you have a better config.
uv run -m pdf_remediation.fix <project_name> [workspace] [folder]
Use workspace and folder to remediate a specific subfolder in the project.
For verbose progress and file-level visibility (useful for spotting blocking files), run:
uv run -m pdf_remediation.fix <project_name> [workspace] [folder] --verbose
Tune processing with:
--chunk-size <n>to control batch size (default: 500)--n-cpu <n>to control parallel workers (default: 4)--debugto set--verboseand--chunk-size 1so you can spot a slow file
Steps executed:
- Apply the skip lists (
skipped_files.txtandpdfix-cannot-process-files.csv) to exclude problematic files. - Count pages for each PDF (PDFix).
- Check for secured PDFs; classify and route them, then exclude them from remediation.
secured-cannot-process/files: secured PDFs with font violations that cannot be remediated.secured-needs-approval/files: secured PDFs without blocking font violations (manual approval needed).pdfix-unable-to-open/files: PDFs that PDFix cannot open.
- Split files into size buckets for parallel remediation.
- Remediate with PDFix, write to
active/processed/. - Validate all processed files with veraPDF.
- Move compliant files into
remediated/files. - Move validation-error files into
unable-to-validate/filesand log them tounable-to-validate.csvin the project root. - Move font-violation failures into
font-issues/files.
If remediation is interrupted, rerunning Fix resumes from the remaining files.
Runs end with a workspace summary showing totals plus files/processed
breakdowns.
uv run -m pdf_remediation.font_fix <project_name> [workspace] [folder]
FontFix targets the font-issues folder by default, runs Callas pdfToolbox
inside Docker on those files, then re-validates and routes results into
remediated/ or unable-to-validate/.
Runs end with a workspace summary showing totals plus files/processed
breakdowns.
Missing-unicode violations detected after validation are moved to
font-issues-missing-unicode/ for the PDFix pass.
Callas file-level failures (error codes 104-107) are logged to
callas_font_fix_errors.csv in the project root.
Options:
--chunk-size <n>to control batch size (default: 500)--verboseto list files in each chunk--debugto set--verboseand--chunk-size 1so you can spot a slow file
uv run -m pdf_remediation.font_fix_pdfix <project_name> [workspace] [folder]
Run this after FontFix to process files moved into font-issues-missing-unicode.
It uses PDFix font remediation via Docker, re-validates, and routes results into
remediated/ or unable-to-validate/.
PDFix file-level failures are logged to pdfix-font-errors.csv in the project root.
Options:
--chunk-size <n>to control batch size (default: 500)--n-cpu <n>to control parallel workers (default: all cores)--verboseto list files in each chunk--debugto set--verboseand--chunk-size 1so you can spot a slow file
uv run -m pdf_remediation.reprocess <project_name> [workspace] [folder]
Defaults:
workspace=defaultfolder=all
reprocess scans <workspace>/<folder>/processed and moves any PDFs back to
active/files. When folder is all, it scans every workspace subfolder with
a processed/ directory.
Update
resources/configuration/default.json (or swap in a new config),
then re-run Fix.
uv run -m pdf_remediation.fix <project_name> [workspace] active --config-file [new-config.json]
new-config.json is located in resources/configuration
For font-issue retries, run reprocess with font-issues as the folder,
update the config, then re-run Fix on active (default folder). Run FontFix to
attempt automatic font remediation with Callas pdfToolbox, then follow with
font_fix_pdfix on font-issues-missing-unicode.
To skip a blocking file before reprocessing, run:
uv run -m pdf_remediation.skip <project_name> <relative_file_path>
Use these controls to reset or fork clean workspaces without touching your originals.
uv run -m pdf_remediation.reset <project_name> [workspace] [folder]
Clears active/files and active/processed, then re-copies files from source/
and resets .remediation.lock.
Use a new workspace name here to create a fresh workspace seeded from
source/ without affecting existing workspaces.
- Python package targeting
>=3.14(seepyproject.toml). - Java runtime is required for veraPDF validation (used by the JAR in
lib/). - PDFix SDK (
pdfix-sdk) provides remediation and license operations. parallelbaris used for multiprocessing progress and job dispatch.pandasis used to summarize validation results and write CSV reports.- Callas pdfToolbox runs in Docker for
FontFixfont remediation.
lib/greenfield-apps-1.28.0.jar: veraPDF validation tool invoked bysrc/pdf_remediation/utilities/verapdf.py.resources/configuration/default.json: PDFix command profile applied during remediation.resources/configuration/WCAG-2-2-Complete.xml: veraPDF WCAG 2.2 profile used alongsideua1by default (adjust theprofileslist insrc/pdf_remediation/utilities/verapdf.pyto change this).resources/configuration/UA1-Font.xml: optional narrowed veraPDF profile for font-only checks.resources/font/.env: Callas pdfToolbox license config forFontFix.
src/pdf_remediation/: CLI entry points and orchestration scripts.src/pdf_remediation/utilities/: shared functions for remediation, validation, project paths, and report generation.resources/projects/: per-project workspace root (default, can be overridden withPROJECT_BASE_PATH).
To store projects on a different disk, set PROJECT_BASE_PATH in .env:
PROJECT_BASE_PATH="/Volumes/ExternalDrive/pdf-remediation-projects"
The workspace structure is created on demand by resources.py:
resources/projects/<project>/
source/ # user-provided original PDFs
workspace/<workspace>/ # defaults to "default"
reports/<ts>-full # optional consolidated reports from "validate --full"
active/
files/ # working set copied from source
processed/ # remediation output
reports/<ts>-<directory> # validation reports for a run
.remediation.lock # semaphore to avoid repeated copy
remediated/
files/ # validated, compliant PDFs
font-issues/
files/ # font-related validation failures
font-issues-missing-unicode/
files/ # missing-unicode font issues after Callas validation
unable-to-validate/
files/ # PDFs that failed validation after remediation
debug/
<clause>/... # copies of failed active/files PDFs grouped by clause
secured-cannot-process/
files/ # secured PDFs with blocking font violations
secured-needs-approval/
files/ # secured PDFs without blocking font violations
pdfix-unable-to-open/
files/ # PDFs that PDFix cannot open
Subfolder names are not fixed. Fix and Validate accept a workspace_folder
argument so you can run separate workflows in different subfolders (for example,
active, remediated, or a custom name).
go.pyruns the remediation pipeline in sequence:- pre-fix validate (
--skip-page-count, init-only) fixonactivefont_fixonfont-issuesfont_fix_pdfixonfont-issues-missing-unicode- final
validate --full --skip-page-count
- pre-fix validate (
- Syntax:
uv run -m pdf_remediation.go <project_name> [workspace] [--config-file <file>] [--chunk-size <n>] [--n-cpu <n>] [--verbose] [--debug] - If the project does not exist,
go.pyrunsinitautomatically. - If
source/is empty and Terminus is installed/configured,go.pycan download and extract the live files backup intosource/. readyset.pyrunsgo.pysequentially across multiple projects.- Syntax:
uv run -m pdf_remediation.readyset <project_name> [project_name ...] readyset.pyexits immediately if any project run fails and returns that same exit code.
init.pybootstraps a project workspace and prints the source path for ingest.
validate.pyruns page counting (PDFix) and veraPDF validation for PDF/UA (ua1) plus WCAG 2.2.- Default mode validates one directory (
<workspace>/<folder>/<directory>). --fullmode validates every<subfolder>/filesand<subfolder>/processeddirectory in the workspace and writes a consolidated report underworkspace/<workspace>/reports/<timestamp>-full.- In
--fullmode, these subfolders are ignored:pdfix-cannot-process,secured-cannot-process,secured-needs-approval,reports,pdfix-unable-to-open,unable-to-validate, andunable-to-process. --fullprints aFOLDERS SCANNEDlist before validation starts.--skip-page-countskips PDFix page counting and runs only veraPDF validation.- Results feed the reporting pipeline in
reports/<timestamp>-<directory>.
debug.pyvalidatesactive/files, then copies every non-compliant file into clause-specific folders underworkspace/<workspace>/debug/<clause>/.- Debug copies are flattened by filename (source relative folders are not preserved).
- Files with multiple failing clauses are copied into each matching clause folder.
- Files with validation errors but no clause metadata are copied into
workspace/<workspace>/debug/unknown/. - Existing contents of
workspace/<workspace>/debug/are cleared before each run. - Syntax:
uv run -m pdf_remediation.debug <project_name> [workspace]
fix.pyruns the PDFix remediation profile (e.g.,default.json) with multiprocessing and preserves folder structure.- Post-validation routes outputs to
remediated/and moves font-issue files tofont-issues/.
font_fix.pyruns Callas pdfToolbox via Docker onfont-issues/, re-validates, then moves results toremediated/orunable-to-validate/. Missing-unicode files move tofont-issues-missing-unicode/.font_fix_pdfix.pyruns PDFix font remediation via Docker onfont-issues-missing-unicode/, re-validates, then moves results toremediated/orunable-to-validate/.
utilities/report.pygenerates CSV/TXT/HTML report artifacts from veraPDF XML output.- Every Validate and Fix run generates reports under
reports/<timestamp>-<directory>(orworkspace/<workspace>/reports/<timestamp>-fullforvalidate --full). - Report outputs include:
vera_validation_results.csv: per-fileua1/wcagpass/fail status and rule counts.xml/<profile>/: raw veraPDF XML reports per file (for example,xml/ua1).summary/<profile>/verapdf-compliance-report.txt: compliant vs non-compliant file list.summary/<profile>/verapdf-clause-summary.csv: clause-level rollup across the run.summary/<profile>/verapdf-file-summary.csv: per-file summary of violations.summary/<profile>/output.txt: synthetic log used by HTML report generation.summary/<profile>/*.html: human-readable compliance report.
reprocess.pyreturns processed PDFs toactive/filesso you can iterate with a revised configuration file.- Defaults:
workspace=default,folder=all. - You can target one source subfolder (for example,
font-issues) or scan all subfolders withprocessed/and return them toactive/files.
skip.pyappends a problematic file toskipped_files.txtso it is ignored during processing.- Syntax:
uv run -m pdf_remediation.skip <project_name> <relative_file_path>
- Files that PDFix cannot open/process are recorded in
pdfix-cannot-process-files.csvat the project root and are skipped on subsequent runs.
- Secured PDFs are logged to
secured-files.csvwith a status column (secured-cannot-processorsecured-needs-approval). - PDFs that PDFix cannot open are logged to
pdfix-unable-to-open.csv. - PDFs that cannot be validated after remediation are logged to
unable-to-validate.csvand moved tounable-to-validate/files. - Secured classification runs an in-memory veraPDF pass using the WCAG 2.2
profile and treats font violations (
7.21.4.1,7.21.3.2,7.21.4.2) as blocking.
status.pyprints a summary of the source PDF count and per-workspace file counts, including totals plusfiles/processedbreakdowns.- Workspace totals and summaries skip the workspace-level
reports/folder. - Syntax:
uv run -m pdf_remediation.status <project_name>
scripts/check_pdf_headers.pyrecursively checks file headers for%PDF-.- It prints total valid/invalid/unreadable counts plus up to 3 sample valid and 3 sample invalid files.
- Invalid samples include the first 32 bytes (printable + hex) to aid triage.
- Syntax:
python3 scripts/check_pdf_headers.py <folder_path>
reset.pyrefreshes a workspace fromsource/and resets the copy semaphore.
license.pyreads license state from PDFix.license_activate.pyactivates a license key.license_deactivate.pydeactivates an active license..envsupportsPDFIX_LICENSE_NAMEandPDFIX_LICENSE_KEYfor remediation.
- Press Ctrl+C to stop the current run.
- Re-run the fix command with
--debug(or-d). - When a file hangs, copy the file path and press Ctrl+C again.
- Skip the file:
uv run -m pdf_remediation.skip <project_name> <file_path> - Run
fixagain without--debug/-d.
- Remediation deletes the original file in
active/filesafter successful save (seePDFix.fix), soResetis the canonical way to restore originals. - Validation and remediation use multiprocessing;
fix.pysets spawn mode for compatibility.




