-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Placement-based spatial verification for generated slides
Problem
When Clean Slides extends beyond tables to complex element types (waterfall charts, icon layouts, rich formatting), the agent can't reliably verify its own visual output. Vision models lose 40-60% of spatial structure during encoding (QVLM, Jan 2026) — they can't spot a label 0.1" off-center or detect uneven bar spacing.
The existing LayoutVerifier in verification.py handles tables (overlap, boundary, text-fit on the cell grid), but isn't extensible to other element types.
Solution: renderer-emitted placements
The renderer already computes every shape's position, size, role, and text. Instead of reconstructing relationships from the output file, capture them at creation time as list[Placement] and run geometric checks.
See docs/VISUAL-VERIFICATION-RESEARCH.md for full research (VLM limitations, DRC/Galen analogies, 6 approaches ranked).
Current state
Done
-
placement.py—Placementdataclass + standalone check functions:check_overlaps— pairwise bbox intersection (skips dividers/background/connectors)check_bounds— shapes within content areacheck_group_alignment— labels centered over anchors by sharedgroupkeycheck_uniform_spacing— even gaps between shapes of a given rolecheck_all— convenience runner
-
PlacementIssuedataclass with severity/category/message/details - 22 tests in
test_placement.py, pyright clean - Research doc:
docs/VISUAL-VERIFICATION-RESEARCH.md
TODO: integration
- Tables (Option 2):
placements_from_layout()bridge that convertsTableLayoutcells →list[Placement], merged intoLayoutReport. Low priority — existingLayoutVerifieralready covers tables. - New element types (Option 1): When building waterfall charts / icon layouts, have the renderer return
list[Placement]. The CLI runscheck_all()and reports issues. -
check_text_fits: UsesTextMetricsto verify text fits within its bounding box. Not yet implemented — the table solver already handles this via font auto-reduction. - CLI output: Wire placement issues into
pptx generateoutput (with--detailflag)
Design principles
- No heuristics: Renderer knows intent → writes it down. No proximity-based guessing.
- Standalone functions, not a framework: Each check is
list[Placement] → list[PlacementIssue]. Adding a check = write a function. groupfor relationships: Shared string key (e.g."Revenue") links related shapes (label + bar). Only needed for relationship checks; generic checks ignore it.- Generation first, edit later: Edit workflow (from
inspect_slide()output) is a separate problem.