Skip to content

Improve citations and quote selection#119

Draft
shabrf wants to merge 12 commits intodevfrom
citation_quotes
Draft

Improve citations and quote selection#119
shabrf wants to merge 12 commits intodevfrom
citation_quotes

Conversation

@shabrf
Copy link
Contributor

@shabrf shabrf commented Feb 13, 2026

Description

Improves citations and quote selections from source documents. Adds a new frontend component to view quotes in context when a citation is clicked. Modifies validator to perform more robust, quote-based grounding.

Changes

Backend: grounding, retrieval, and citation data model

  • Reworked generation validator step from soft verification to claim-level grounding with retries:
    • Extracts claim/citation pairs from generated sections.
    • Grounds each claim against source chunks and extraction quotes.
    • Stores per-claim supporting_quote, chunk_id, and attribution.
    • Uses targeted retrieve_evidence calls when unsupported claims are found.
  • Normalises and parses when compound citations (e.g. [1, 3; 5]) are generated.
  • Added citation renumbering-by-appearance to align numbering with frontend render order.
  • Improved interventions table generation:
    • Row-scoped evidence gathering and structured row generation.
    • Deterministic rendering and cleaner markdown handling.
  • Retrieval improvements:
    • Stopped truncating key chunk content in paths where full text is needed for grounding.
  • Added helper utilities:
    • fetch_chunk_texts for batch chunk-content retrieval.
  • Logging/observability improvements:
    • Better RCS input/result diagnostics.
    • Langfuse metadata/session handling updates.

Backend: persistence + API

  • Added DB migration:
    • synthesis_citations.attribution column with documented meaning.
  • Updated synthesis schemas:
    • CitationInfo now includes document_type, evidence_score, impact_score, and claim_quotes.
    • New ClaimQuote schema.
  • Updated logbook read/write:
    • Persists both document-level fallback citation rows and per-claim citation rows.
    • Backward-compatible attribution derivation from legacy confidence values.
    • Enriches citation map with document metadata and evidence/impact scores.
  • Added new endpoint for citation context inspection:
    • GET /api/analysis-projects/{project_id}/chunks/{chunk_id}/context
    • Returns target chunk, adjacent chunks, and document metadata for sidebar inspection.

Backend: guardrails for tool calls

  • Tightened tool argument validation in the orchestrator to avoid unsafe/ambiguous execution (for example, skipping calls when required params like citation_number/query are missing rather than guessing).
  • Enforced clearer phase boundaries so verification/grounding tools are not used during evidence-gathering loops.
  • Added safer fallback behaviour when tool outputs are partial or missing:
    • preserves section generation with soft-grounding warnings instead of hard-failing the run,
    • captures retry feedback for unsupported claims and regathers targeted evidence (retrieve_evidence) before regeneration.
  • Added defensive handling around grounding/tool exceptions so failures degrade gracefully (issues logged, claims flagged, content still returned for review).
  • Extended observability (including Langfuse warning emission and richer logs) to make tool-call failure modes easier to diagnose.

Frontend: citation UX and context panel

  • ExecutiveBriefing now accepts projectId and supports citation inspection state.
  • Added a new CitationContextPanel component:
    • Right-hand side panel with chunk context, metadata, and quote highlighting.
    • Includes fuzzy matching for quote-to-chunk span highlighting.
    • Provides quick actions to open the original source or view all evidence.
  • Citation rendering enhancements:
    • Better claim-context matching to choose the most relevant claim-level quote.
    • Visual attribution indicators in citation tooltips.
  • Added collapsible main sidebar behaviour in main layout.
  • Type updates in frontend/types/search.ts to support new citation payload shape.
  • Added fuzzball dependency to support fuzzy quote matching.

Documentation

  • Updated synthesis documentation to reflect grounding and citation flow changes.

Database Migration

  • Added: backend/supabase/migrations/20260213000000_add_synthesis_citation_attribution.sql
  • Change: synthesis_citations.attribution (varchar(20))
  • Allowed values (semantic): direct, synthesised, inferred

Notes / Risks

  • Grounding now depends more heavily on chunk-text availability and LLM grounding outputs; this improves fidelity but can increase latency/cost for some sections. On some test searches that I ran, the synthesis step took ~45 mins to run
  • Legacy rows without explicit attribution are mapped via confidence thresholds for compatibility.

@shabrf shabrf self-assigned this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments