-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Orphaned lines: detect and mitigate typographic widows/orphans across slide elements
Problem
Generated slides frequently produce "orphaned" lines — a single short word or fragment dangling on the last line of a text block. This is a well-known typographic problem (widows and orphans) that makes slides look unpolished and wastes vertical space.
Examples from a real deck build:
- A title wrapping to 3 lines where the third line is a single word ("visual.")
- A cell where the last line contains only "slide" or "needed"
- A subtitle that wraps with just a preposition on the second line
Context from other domains
Publishing / LaTeX: Penalises bad breaks with \widowpenalty, \clubpenalty, \looseness. The typesetter adjusts spacing or reflows to avoid orphans. LaTeX also has \parfillskip control for minimum last-line length.
CSS / web: text-wrap: balance redistributes text evenly across lines (useful for headings). text-wrap: pretty specifically targets orphans on the last line. Both are increasingly supported and address the same visual problem.
Slide style guides (McKinsey, BCG): Typically say "reword to avoid" — which is what a human would do, and what an agent could be prompted to do if the tool detects the issue.
Proposed direction
This likely needs different treatment per element type:
| Element | Tolerance | Approach |
|---|---|---|
| Titles | Zero — should never orphan | Reflow detection + agent guidance to reword |
| Subtitles | Low | Same as titles |
| Row headers | Low — visually prominent | Detect, suggest rewording |
| Column headers | Medium — usually short | Detect only |
| Body cells | Higher — more text, more tolerance | Detect when < N chars on final line |
| Sidebar paragraphs | Medium | Detect |
Detection (lint/verify step)
A new check in pptx verify (or a dedicated pptx lint) that:
- Estimates line breaks for each text element given its container width and font size
- Flags elements where the last line is below a threshold (e.g. < 15% of available width, or < N characters)
- Reports severity based on element type (error for titles, warning for body cells)
Mitigation options (research needed)
Several approaches worth investigating — not mutually exclusive:
- Balanced reflow: Like CSS
text-wrap: balance, redistribute text so lines are roughly equal length. Could be done at the text-box level by the layout engine. Most impactful for titles and short text. - Agent guidance: When orphans are detected, emit a structured suggestion ("Title wraps to 3 lines with 1 word on line 3 — consider rewording to fit 2 lines"). The agent can then reword.
- Minimum last-line threshold: During layout solving, if the last line of a paragraph is below a threshold, slightly adjust the available width or font size to trigger a reflow. This is closer to the LaTeX approach.
- Soft hyphenation / break hints: Less relevant for slides (hyphenation looks bad at large font sizes), but break hints could help the engine make better choices.
Research needed
- Survey how CSS
text-wrap: balanceis implemented (the algorithm) — is it applicable to fixed-width text boxes? - Check if python-pptx exposes any text-fit or autofit controls that could help
- Look at how professional slide tools (think-cell, etc.) handle this
- Determine the right thresholds per element type — what "feels" orphaned at 14pt vs 24pt?
Acceptance criteria
-
pptx verify(or similar) detects orphaned lines across all text elements - Severity is element-type-aware (stricter for titles, lenient for body)
- At least one mitigation approach implemented for titles (balanced reflow or agent guidance)
- Body cell orphan detection available as a warning