Skip to content

feat: improve errors diagnostics#3537

Open
astandrik wants to merge 7 commits intomainfrom
3487-2
Open

feat: improve errors diagnostics#3537
astandrik wants to merge 7 commits intomainfrom
3487-2

Conversation

@astandrik
Copy link
Collaborator

@astandrik astandrik commented Mar 2, 2026

Stand

Closes #3487


Note

Medium Risk
Touches shared error rendering and streaming query error propagation; mistakes could hide errors or break auth/forbidden routing across multiple pages.

Overview
Improves error diagnostics and presentation across the UI. ResponseError is redesigned to render a structured alert with extracted metadata (URL/method, status/code, trace/request IDs, proxy/worker headers, network context) and optional expandable sections for response body and issues.

Refactors full-page error handling. PageError now distinguishes 401 vs 403 (new Unauthenticated state), supports a "Report a problem" action via uiFactory.getReportProblemUrl, and reuses the new useErrorInfo/ResponseErrorMessage flow.

Hardens streaming query errors. StreamingAPI now converts non-OK responses and mid-stream/connection failures into enriched Error objects (including useful headers and an errorPhase) so they display consistently in the new diagnostics UI.

Unifies illustrations and removes custom SVG loading. Deletes the dynamic Illustration component and bundled SVG assets, replacing usage with getIllustration() backed by @gravity-ui/illustrations with optional overrides via uiFactory.illustrations.

Adds extensive unit tests for error parsing/extraction and a new Playwright suite covering inline/full-page error scenarios; also tweaks EmptyState layout options and Playwright screenshot expectations.

Written by Cursor Bugbot for commit e1f1f29. This will update automatically on new commits. Configure here.

CI Results

Test Status: ❌ FAILED

📊 Full Report

Total Passed Failed Flaky Skipped
456 452 2 0 2
Test Changes Summary ✨21

✨ New Tests (21)

  1. Cluster — 400 plain text with traceresponse and x-request-id (errorDisplay/errorDisplay.test.ts)
  2. Node — network error shows ERR_NETWORK code (errorDisplay/errorDisplay.test.ts)
  3. PDisk — 503 empty body with x-worker-name (errorDisplay/errorDisplay.test.ts)
  4. VDisk — 429 JSON with expandable issues (errorDisplay/errorDisplay.test.ts)
  5. StorageGroup — 429 HTML body with x-proxy-name and x-trace-id (errorDisplay/errorDisplay.test.ts)
  6. Tablet — 400 JSON with only code field shows code as message (errorDisplay/errorDisplay.test.ts)
  7. StorageGroup — toggle button says "Response" and expands/collapses response body (errorDisplay/errorDisplay.test.ts)
  8. VDisk — toggle button says "Issues" and expands/collapses issues (errorDisplay/errorDisplay.test.ts)
  9. Full-page — whoami 500 blocks entire page with trace headers (errorDisplay/errorDisplay.test.ts)
  10. Full-page — whoami 503 text body with x-worker-name (errorDisplay/errorDisplay.test.ts)
  11. Full-page — whoami network error blocks entire page (errorDisplay/errorDisplay.test.ts)
  12. Full-page — whoami 502 HTML body with x-proxy-name (errorDisplay/errorDisplay.test.ts)
  13. Tenant — 403 describe shows AccessDenied (errorDisplay/errorDisplay.test.ts)
  14. Cluster — 401 capabilities shows Unauthenticated (errorDisplay/errorDisplay.test.ts)
  15. Query result — 500 with trace headers shows ResponseError with details (errorDisplay/errorDisplay.test.ts)
  16. Query result — network error shows ResponseError (errorDisplay/errorDisplay.test.ts)
  17. Streaming query — 500 with trace headers shows ResponseError with details (errorDisplay/errorDisplay.test.ts)
  18. Streaming query — mid-stream network error preserves headers from initial response (errorDisplay/errorDisplay.test.ts)
  19. Streaming query — pre-connection network error shows URL from enriched error (errorDisplay/errorDisplay.test.ts)
  20. Error on non-Result tab shows Query Failed message for failed run (tenant/queryEditor/queryEditor.test.ts)
  21. Streaming non-JSON HTTP error shows actual error body, not empty object (tenant/queryEditor/queryEditor.test.ts)

Bundle Size: 🔺

Current: 62.98 MB | Main: 62.93 MB
Diff: +0.05 MB (0.08%)

⚠️ Bundle size increased. Please review.

ℹ️ CI Information
  • Test recordings for failed tests are available in the full report.
  • Bundle size is measured for the entire 'dist' directory.
  • 📊 indicates links to detailed reports.
  • 🔺 indicates increase, 🔽 decrease, and ✅ no change in bundle size.

Greptile Summary

This PR is a substantial overhaul of error diagnostics across the UI: ResponseError and PageError are redesigned to surface structured title/subtitle/expandable-details panels; a new extractErrorDetails pipeline extracts trace IDs, proxy headers, request URL, network context, and issues from heterogeneous error shapes; and the streaming fetch path is hardened to preserve non-JSON bodies, enriched errors with phase metadata, and real Error instances with stack traces.

Key changes:

  • New error pipelineextractErrorDetails (+ extractDataMessage, extractHeaders, extractConfig) converts any error shape into a typed ErrorDetails bag; prepareCommonErrorMessage and useErrorInfo consume it for consistent title/subtitle rendering across all error surfaces.
  • Auth error splitisAccessError is replaced by isUnauthenticatedError (401) and isForbiddenError (403); a new Unauthenticated empty-state is added; isRedirectToAuth correctly short-circuits before both.
  • Illustration abstraction — local SVG imports and the custom Illustration component are removed; getIllustration(name) resolves from @gravity-ui/illustrations defaults with uiFactory.illustrations overrides.
  • Streaming improvements — pre-connection fetch failures, non-OK responses, and mid-stream parse errors are all now enriched with config, headers, and errorPhase; text()JSON.parse chain replaces the previous response.json() call to preserve plain-text bodies.

Two correctness issues found:

  1. lib.ts missing exportGetReportProblemUrl is added to UIFactory but not re-exported from lib.ts, preventing library consumers from typing their getReportProblemUrl implementation without reaching into internal paths.
  2. pageTitle prop override in PageErrorpageTitle={errorPageTitle} is placed after {...restProps} in all three branches (Unauthenticated, AccessDenied, PageErrorContent), so any pageTitle a caller passes is silently dropped when errorPageTitle is undefined.

Confidence Score: 2/5

  • Two confirmed correctness bugs (missing type export, prop override) prevent safe merge without fixes. Core error pipeline is sound but library API and component prop handling need fixes.
  • Score reduced from 3 to 2 because the two bugs are confirmed and concrete: (1) library consumers cannot properly type their getReportProblemUrl implementation without internal imports, and (2) the pageTitle prop override affects all three error state branches, silently breaking caller-provided page titles. Both are straightforward to fix but must be addressed before merge. The streaming and error-extraction logic itself is well-tested and sound.
  • src/lib.ts (missing type export) and src/components/Errors/PageError/PageError.tsx (prop override in all three branches)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Error received] --> B{isRedirectToAuth?}
    B -- yes --> C[return null\nredirect happens elsewhere]
    B -- no --> D{isUnauthenticatedError\nstatus 401?}
    D -- yes --> E[Unauthenticated\nempty state]
    D -- no --> F{isForbiddenError\nstatus 403?}
    F -- yes --> G[AccessDenied\nempty state]
    F -- no --> H[PageErrorContent /\nResponseError]

    H --> I[useErrorInfo]
    I --> J[prepareCommonErrorMessage\nfallbackMessage]
    I --> K[extractErrorDetails]
    K --> L{Error shape?}
    L -- HTTP response --> M[extractBasicProperties\nstatus, statusText, dataMessage, responseBody]
    L -- streaming error --> N[extractDiagnosticFields\nerrorPhase, message]
    L -- any --> O[extractHeaders\ntraceId, requestId, proxyName, workerName]
    L -- any --> P[extractConfig\nrequestUrl, method]
    L -- issues error --> Q[extractIssues\nhasIssues, issues array]

    M & N & O & P & Q --> R[ErrorDetails object]
    R --> S[enrichWithNetworkContext\nadd networkOnline/effectiveType\nfor network errors]
    S --> T[Render title + subtitle +\nErrorDetailsContent]

    T --> U{hasVisibleFields?}
    U -- yes --> V[ErrorFieldsList\nURL, traceId, requestId, etc.]
    U -- no --> W[skip fields]
    T --> X{has response body\nor issues?}
    X -- yes --> Y[Toggle button\nResponse / Issues]
    Y -- expanded --> Z[ResponseBodySection\nor IssuesSection]
Loading

Comments Outside Diff (2)

  1. src/lib.ts, line 36 (link)

    GetReportProblemUrl type not exported

    GetReportProblemUrl was added to UIFactory as the type for the new getReportProblemUrl field, but it is not re-exported from lib.ts. Library consumers who want to implement uiFactory.getReportProblemUrl will not be able to correctly type their implementation without reaching into the internal uiFactory/types path.

    IllustrationComponent and IllustrationName are already exported here — GetReportProblemUrl should follow the same pattern:

  2. src/components/Errors/PageError/PageError.tsx, line 40-85 (link)

    Caller-provided pageTitle silently overridden when errorPageTitle is undefined

    pageTitle is part of EmptyStateProps and flows into restProps when callers provide it. However, pageTitle={errorPageTitle} is specified after the spread, so when errorPageTitle is undefined (the default), it overwrites the caller's value.

    This occurs in three branches:

    1. Unauthenticated (line 50): pageTitle={errorPageTitle} after {...restProps}
    2. AccessDenied (line 62): pageTitle={errorPageTitle} after {...restProps}
    3. PageErrorContent (line 170): pageTitle={errorPageTitle} after {...restProps}

    Example: <PageError error={err} pageTitle="Cluster" /> — the "Cluster" title is lost.

    Fix: Use the caller's pageTitle as a fallback when errorPageTitle is not provided:

    Or, for cleaner code, explicitly destructure pageTitle:

    const { pageTitle: callerPageTitle, ...otherProps } = restProps;
    const resolvedPageTitle = errorPageTitle ?? callerPageTitle;
    // Pass pageTitle={resolvedPageTitle}

Last reviewed commit: e1f1f29

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

46 files reviewed, 13 comments

Edit Code Review Agent Settings | Greptile

@astandrik astandrik marked this pull request as ready for review March 3, 2026 14:48
Copilot AI review requested due to automatic review settings March 3, 2026 14:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves how API/network errors are parsed and rendered across the UI by extracting structured error metadata (status/title, request/trace IDs, URL/method, response body, issues) and displaying it consistently in ResponseError and page-level PageError, with added unit + Playwright coverage for common error scenarios.

Changes:

  • Add extractErrorDetails + refactor prepareCommonErrorMessage to derive consistent primary messages and structured metadata from errors.
  • Refactor ResponseError/PageError UI to show titles, optional subtitles, and expandable details (fields/issues/response).
  • Add a new Playwright errorDisplay suite (+ snapshots) and expand unit tests for response/query error guards and error parsing.

Reviewed changes

Copilot reviewed 22 out of 45 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/suites/tenant/diagnostics/tabs/OperationsModel.ts Updates E2E locator to new .response-error markup.
tests/suites/errorDisplay/errorDisplayMocks.ts Adds Playwright route mocks to simulate varied error shapes (HTTP, network, HTML, issues).
tests/suites/errorDisplay/errorDisplay.test.ts Adds E2E coverage for inline/full-page/access-denied error rendering + screenshots.
tests/suites/errorDisplay/ErrorDisplayModel.ts Adds a page-object model for interacting with the new error UI.
tests/suites/errorDisplay/errorDisplay.test.ts-snapshots/error-tablet-400-code-only-chromium-linux.png New snapshot for tablet 400 (code-only) error case.
tests/suites/errorDisplay/errorDisplay.test.ts-snapshots/error-pdisk-503-empty-safari-linux.png New snapshot for PDisk 503 empty-body error case (Safari).
tests/suites/errorDisplay/errorDisplay.test.ts-snapshots/error-pdisk-503-empty-chromium-linux.png New snapshot for PDisk 503 empty-body error case (Chromium).
tests/suites/errorDisplay/errorDisplay.test.ts-snapshots/error-node-network-safari-linux.png New snapshot for node network error case (Safari).
tests/suites/errorDisplay/errorDisplay.test.ts-snapshots/error-node-network-chromium-linux.png New snapshot for node network error case (Chromium).
tests/suites/errorDisplay/errorDisplay.test.ts-snapshots/error-cluster-400-plain-text-chromium-linux.png New snapshot for cluster 400 plain-text error case.
src/utils/errors/index.ts Exposes extractErrorDetails and updates common error message preparation logic.
src/utils/errors/extractErrorDetails.ts Introduces structured metadata extraction from error objects (headers/config/body/issues).
src/utils/errors/test/prepareCommonErrorMessage.test.ts Adds unit coverage for message selection behavior across error types.
src/utils/errors/test/extractErrorDetails.test.ts Adds unit coverage for header/config/body/issues extraction behavior.
src/utils/test/response.test.ts Adds tests for additional response/network/access type-guards.
src/utils/test/query.test.ts Adds tests for query error response type-guards.
src/components/Errors/shared/IssuesSection.tsx Adds a reusable disclosure section for rendering issues lists.
src/components/Errors/shared/ErrorFieldsList.tsx Adds a reusable definition-list for rendering extracted error metadata fields.
src/components/Errors/i18n/ru.json Removes Russian keyset file for Errors component (ru disabled globally).
src/components/Errors/i18n/index.ts Registers Errors i18n keyset with en only.
src/components/Errors/i18n/en.json Extends English Errors keyset with new error-details labels.
src/components/Errors/ResponseError/ResponseError.tsx Refactors ResponseError to use common message + extracted details UI.
src/components/Errors/ResponseError/ResponseError.scss Adds styles for new ResponseError layout, fields, disclosures, and body block.
src/components/Errors/ResponseError/ErrorDetails.tsx Adds expandable error details rendering (fields/response body/issues).
src/components/Errors/PageError/PageError.tsx Forwards optional defaultMessage and fixes className composition for PageError.
src/components/Errors/PageError/PageError.scss Adjusts PageError layout to better fit detailed error content.
src/components/EmptyState/EmptyState.scss Adds size_l and layout tweaks to accommodate detailed error content.
scripts/playwright-docker.sh Minor whitespace/formatting tweak.
.gitignore Adds plans to ignored files and normalizes formatting.

@astandrik astandrik mentioned this pull request Mar 3, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 3, 2026

Additional Comments (1)

tests/suites/errorDisplay/errorDisplayMocks.ts
VDisk mock intercepts storage groups endpoint, not a VDisk-specific route

setupVDisk429WithIssuesMock intercepts ROUTES.storageGroups (/storage/groups*), yet the test navigates to the VDisk page (vDisk?nodeId=1&vDiskId=1-0-0-0-0). If the VDisk page loads data from a dedicated endpoint (e.g. /vdisk/info*), that call goes unmocked and may resolve or fail silently.

The test currently passes (likely because VDisk internally fetches storage group data), but this is fragile — a future routing change in the VDisk page could silently break this test's intent while the mock continues to intercept the wrong call. Consider adding a vdiskInfo route to the ROUTES map and mocking it explicitly for VDisk-specific tests.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/suites/errorDisplay/errorDisplayMocks.ts
Line: 844-853

Comment:
**VDisk mock intercepts storage groups endpoint, not a VDisk-specific route**

`setupVDisk429WithIssuesMock` intercepts `ROUTES.storageGroups` (`/storage/groups*`), yet the test navigates to the VDisk page (`vDisk?nodeId=1&vDiskId=1-0-0-0-0`). If the VDisk page loads data from a dedicated endpoint (e.g. `/vdisk/info*`), that call goes unmocked and may resolve or fail silently.

The test currently passes (likely because VDisk internally fetches storage group data), but this is fragile — a future routing change in the VDisk page could silently break this test's intent while the mock continues to intercept the wrong call. Consider adding a `vdiskInfo` route to the `ROUTES` map and mocking it explicitly for VDisk-specific tests.

How can I resolve this? If you propose a fix, please make it concise.

@astandrik astandrik marked this pull request as draft March 6, 2026 10:59
@astandrik
Copy link
Collaborator Author

@greptile review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 89 changed files in this pull request and generated 8 comments.

@astandrik
Copy link
Collaborator Author

@greptile review

@astandrik
Copy link
Collaborator Author

@greptile review

@astandrik astandrik marked this pull request as ready for review March 6, 2026 12:42
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

The isUnauthenticatedError check was catching all 401 errors including
those with an authUrl that indicate a redirect-to-auth flow. Previously,
PageError handled this by checking isRedirectToAuth first and returning
null to allow the browser redirect to proceed.

Without that guard, a 401 response carrying an authUrl incorrectly
rendered the Unauthenticated component instead of letting the redirect
happen.

Add the isRedirectToAuth check before the 401 handling in both
Operations.tsx and renderPaginatedTableErrorMessage.tsx, matching the
pattern already used in PageError and Content components.
… on non-Result tabs

For execute queries with errors, non-Result tabs (Schema, Explain, Stats) should
show a simple 'Query Failed' message with a redirect to the Result tab, instead
of rendering the full QueryResultError with ResponseError inline.

The previous commit (13fa015) simplified the error branching and lost the
isExecute check, causing all non-cancelled errors on non-Result tabs to render
QueryResultError regardless of query type. This was confusing because the Result
tab already shows the detailed error — seeing a full error panel on Schema/Stats
tabs instead of the 'see the Result tab' message lost the helpful redirect.

Changes:
- Restore renderCommonErrorView with isExecute/isStopped branching
- Restore i18n keys error.title and error.description
- Update E2E test name to reflect correct behavior
…Message

A 403 response with data containing a code field (e.g. {code: 'FORBIDDEN'})
would return the raw code string instead of the localized 'Access forbidden'
message because extractDataMessage ran before the 403 check.

Move the 403 check above extractDataMessage so the localized message always
takes priority for 403 responses.
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve errors diagnostics

3 participants