Skip to content

Placement-based spatial verification for generated slides #13

@tmustier

Description

@tmustier

Placement-based spatial verification for generated slides

Problem

When Clean Slides extends beyond tables to complex element types (waterfall charts, icon layouts, rich formatting), the agent can't reliably verify its own visual output. Vision models lose 40-60% of spatial structure during encoding (QVLM, Jan 2026) — they can't spot a label 0.1" off-center or detect uneven bar spacing.

The existing LayoutVerifier in verification.py handles tables (overlap, boundary, text-fit on the cell grid), but isn't extensible to other element types.

Solution: renderer-emitted placements

The renderer already computes every shape's position, size, role, and text. Instead of reconstructing relationships from the output file, capture them at creation time as list[Placement] and run geometric checks.

See docs/VISUAL-VERIFICATION-RESEARCH.md for full research (VLM limitations, DRC/Galen analogies, 6 approaches ranked).

Current state

Done

  • placement.pyPlacement dataclass + standalone check functions:
    • check_overlaps — pairwise bbox intersection (skips dividers/background/connectors)
    • check_bounds — shapes within content area
    • check_group_alignment — labels centered over anchors by shared group key
    • check_uniform_spacing — even gaps between shapes of a given role
    • check_all — convenience runner
  • PlacementIssue dataclass with severity/category/message/details
  • 22 tests in test_placement.py, pyright clean
  • Research doc: docs/VISUAL-VERIFICATION-RESEARCH.md

TODO: integration

  • Tables (Option 2): placements_from_layout() bridge that converts TableLayout cells → list[Placement], merged into LayoutReport. Low priority — existing LayoutVerifier already covers tables.
  • New element types (Option 1): When building waterfall charts / icon layouts, have the renderer return list[Placement]. The CLI runs check_all() and reports issues.
  • check_text_fits: Uses TextMetrics to verify text fits within its bounding box. Not yet implemented — the table solver already handles this via font auto-reduction.
  • CLI output: Wire placement issues into pptx generate output (with --detail flag)

Design principles

  • No heuristics: Renderer knows intent → writes it down. No proximity-based guessing.
  • Standalone functions, not a framework: Each check is list[Placement] → list[PlacementIssue]. Adding a check = write a function.
  • group for relationships: Shared string key (e.g. "Revenue") links related shapes (label + bar). Only needed for relationship checks; generic checks ignore it.
  • Generation first, edit later: Edit workflow (from inspect_slide() output) is a separate problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions