fix: dataset schema validation and display in Studio#13175
fix: dataset schema validation and display in Studio#13175abhiaiyer91 merged 14 commits intomainfrom
Conversation
🦋 Changeset detectedLatest commit: 0384c37 The changes in this PR will be included in the next version bump. This PR includes changesets to release 22 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a SuperJSON-aware parser for workflow schemas; surfaces and preserves dataset input/groundTruth schemas in the Edit Dataset dialog and prevents external edits from overwriting local JSON edits; allows explicit null for dataset schemas end-to-end; normalizes schema null/undefined across storage layers; improves Hono error responses and several UI/SDK adjustments. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.changeset/quiet-masks-glow.md:
- Line 7: Update the changeset text in .changeset/quiet-masks-glow.md to replace
the terse line "Fix dataset schema validation and display in Studio" with a
user-facing summary that lists concrete outcomes (what was broken and what now
works), e.g., mention that dataset schemas now appear in the Edit Dataset
dialog, schema edits no longer cause the editor cursor to jump, and validation
errors are surfaced in the dialog instead of being dropped; ensure the changeset
body uses present-tense user-impact sentences and includes at least one short
example of the visible behavior change so reviewers and release notes can
clearly communicate the user-facing fix.
In `@client-sdks/client-js/src/utils/index.ts`:
- Around line 122-128: The function parseSuperJsonString currently claims to
return Record<string, unknown> but may return null, arrays, or primitives and it
also can throw a SyntaxError from JSON.parse; update parseSuperJsonString to
return a safe, accurate type (e.g., unknown) and stop casting parsed or
parsed.json to Record<string, unknown> — simply return the parsed value (which
may be null/array/primitive/object) and let callers validate/cast to
Record<string, unknown> where appropriate; also add JSDoc documenting that
JSON.parse may throw (e.g., `@throws` {SyntaxError}) so callers are aware to
handle malformed JSON, and keep references to the function name
parseSuperJsonString when making these changes.
In
`@packages/playground-ui/src/domains/datasets/components/edit-dataset-dialog.tsx`:
- Around line 81-88: The current check treats an empty failingItems array as
truthy and can show "0 existing item(s) fail validation"; update the guard in
the error handling for body (the variable `body` derived from `err`) to verify
failingItems is a non-empty array (e.g., check
Array.isArray(body.cause.failingItems) && body.cause.failingItems.length > 0)
before calling setValidationError, otherwise fall back to the toast.error path
(using `toast.error` and the `error?.message` fallback).
In `@server-adapters/hono/src/index.ts`:
- Around line 441-445: The current code forwards the arbitrary error "cause" to
clients via the "cause" variable in the c.json response; restrict this to only
expose a known-safe shape (e.g., an object with a "failingItems" array) and omit
any other cause values. Change the logic around the "cause" variable and the
c.json call so you detect if cause is a plain object with a "failingItems"
property that is an Array (and optionally validate items are
primitives/serializable), and only then include { failingItems } in the
response; for all other cause types (including non-plain objects, circular
structures, or anything unexpected) omit the cause entirely to avoid leakage and
serialization errors. Also wrap any serialization/structuredClone attempts in
try/catch so a serialization failure never prevents sending the intended HTTP
status and message. Use the existing "cause" variable and the c.json response
site as the change points.
| const body = (err as { body?: { cause?: { failingItems?: unknown[] } } })?.body; | ||
| if (body?.cause?.failingItems) { | ||
| const count = body.cause.failingItems.length; | ||
| setValidationError(`${count} existing item(s) fail validation. Fix items or adjust schema.`); | ||
| } else { | ||
| const error = err as { message?: string }; | ||
| toast.error(`Failed to update dataset: ${error?.message || 'Unknown error'}`); | ||
| } |
There was a problem hiding this comment.
Defensive guard missing for empty failingItems array
body?.cause?.failingItems evaluates to truthy even when it is an empty array ([]), which would display the misleading message "0 existing item(s) fail validation. Fix items or adjust schema." — although the server is unlikely to send an empty array here, a length guard prevents the confusion entirely.
🛡️ Proposed defensive fix
- const body = (err as { body?: { cause?: { failingItems?: unknown[] } } })?.body;
- if (body?.cause?.failingItems) {
- const count = body.cause.failingItems.length;
+ const body = (err as { body?: { cause?: { failingItems?: unknown[] } } })?.body;
+ const failingItems = body?.cause?.failingItems;
+ if (Array.isArray(failingItems) && failingItems.length > 0) {
+ const count = failingItems.length;
setValidationError(`${count} existing item(s) fail validation. Fix items or adjust schema.`);
} else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/playground-ui/src/domains/datasets/components/edit-dataset-dialog.tsx`
around lines 81 - 88, The current check treats an empty failingItems array as
truthy and can show "0 existing item(s) fail validation"; update the guard in
the error handling for body (the variable `body` derived from `err`) to verify
failingItems is a non-empty array (e.g., check
Array.isArray(body.cause.failingItems) && body.cause.failingItems.length > 0)
before calling setValidationError, otherwise fall back to the toast.error path
(using `toast.error` and the `error?.message` fallback).
server-adapters/hono/src/index.ts
Outdated
| const cause = error instanceof Error && error.cause ? error.cause : undefined; | ||
| return c.json( | ||
| { error: error instanceof Error ? error.message : 'Unknown error', ...(cause ? { cause } : {}) }, | ||
| status, | ||
| ); |
There was a problem hiding this comment.
error.cause is forwarded to clients without sanitization — security risk and potential serialization crash
HTTPException.cause accepts cause?: unknown, meaning it can hold any arbitrary data — stack traces, database connection errors, JWT payloads, etc. This change forwards whatever cause value happens to be attached to any HTTPException thrown anywhere in the system to the API consumer.
Two concrete risks:
- Information disclosure: any
HTTPExceptionthrown with a sensitivecause(e.g., a database error wrapping the originalError, or internal state) now leaks that detail to clients. - Serialization crash: if
causecontains a circular reference or a non-serializable value,c.json()throws internally, producing an unhandled error instead of the intended HTTP status response.
The intent is narrow (propagating failingItems for dataset validation), but the implementation is broad. Consider restricting the exposure to a known-safe shape:
🛡️ Proposed narrower fix
- const cause = error instanceof Error && error.cause ? error.cause : undefined;
- return c.json(
- { error: error instanceof Error ? error.message : 'Unknown error', ...(cause ? { cause } : {}) },
- status,
- );
+ // Only propagate cause if it's a plain serializable object (e.g. { failingItems: [...] })
+ let cause: unknown;
+ if (error instanceof Error && error.cause && typeof error.cause === 'object' && !Array.isArray(error.cause)) {
+ try {
+ // Verify serializability before including
+ JSON.stringify(error.cause);
+ cause = error.cause;
+ } catch {
+ // Swallow non-serializable causes
+ }
+ }
+ return c.json(
+ { error: error instanceof Error ? error.message : 'Unknown error', ...(cause ? { cause } : {}) },
+ status,
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const cause = error instanceof Error && error.cause ? error.cause : undefined; | |
| return c.json( | |
| { error: error instanceof Error ? error.message : 'Unknown error', ...(cause ? { cause } : {}) }, | |
| status, | |
| ); | |
| // Only propagate cause if it's a plain serializable object (e.g. { failingItems: [...] }) | |
| let cause: unknown; | |
| if (error instanceof Error && error.cause && typeof error.cause === 'object' && !Array.isArray(error.cause)) { | |
| try { | |
| // Verify serializability before including | |
| JSON.stringify(error.cause); | |
| cause = error.cause; | |
| } catch { | |
| // Swallow non-serializable causes | |
| } | |
| } | |
| return c.json( | |
| { error: error instanceof Error ? error.message : 'Unknown error', ...(cause ? { cause } : {}) }, | |
| status, | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server-adapters/hono/src/index.ts` around lines 441 - 445, The current code
forwards the arbitrary error "cause" to clients via the "cause" variable in the
c.json response; restrict this to only expose a known-safe shape (e.g., an
object with a "failingItems" array) and omit any other cause values. Change the
logic around the "cause" variable and the c.json call so you detect if cause is
a plain object with a "failingItems" property that is an Array (and optionally
validate items are primitives/serializable), and only then include {
failingItems } in the response; for all other cause types (including non-plain
objects, circular structures, or anything unexpected) omit the cause entirely to
avoid leakage and serialization errors. Also wrap any
serialization/structuredClone attempts in try/catch so a serialization failure
never prevents sending the intended HTTP status and message. Use the existing
"cause" variable and the c.json response site as the change points.
- Pass inputSchema/groundTruthSchema to EditDatasetDialog - Unwrap superjson envelope in client workflow schema parsing - Fix SchemaField cursor reset during editing - Include HTTPException cause in Hono error responses - Read failingItems from MastraClientError.body in edit dialog Co-Authored-By: Mastra Code <noreply@mastra.ai>
9098c33 to
7cc9aab
Compare
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@client-sdks/client-js/src/utils/index.ts`:
- Around line 122-128: parseSuperJsonString currently declares return type
Record<string, unknown> but may return any JSON value and can throw an
undocumented SyntaxError; change the signature to return unknown (export
function parseSuperJsonString(value: string): unknown) and either wrap
JSON.parse in a try/catch that rethrows a new SyntaxError with contextual
message (including the incoming value or a hint) or add a JSDoc `@throws`
SyntaxError explaining JSON.parse can throw — update uses of
parseSuperJsonString accordingly to handle unknown/handle the thrown error;
reference: parseSuperJsonString, JSON.parse.
In `@server-adapters/hono/src/index.ts`:
- Around line 441-445: The code currently forwards the raw "cause" into the
response (see the local variable cause and the c.json call), which risks leaking
sensitive data and can crash on non-serializable/circular values; change this to
sanitize the cause before including it in the JSON by extracting a safe string
only (e.g., if cause instanceof Error use cause.message, else if typeof cause
=== 'string' use it, otherwise omit or replace with a generic marker), and then
pass that sanitized cause string (not the raw object) into the c.json payload so
c.json only ever receives primitives/strings for "cause".
The server handler converted null (disable schema) to undefined (don't change) via `inputSchema ?? undefined`, so toggling off a schema in Studio had no effect. Now null flows through: server handler → Dataset.update → DatasetsStorage → DB. Co-Authored-By: Mastra Code <noreply@mastra.ai>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/core/src/datasets/dataset.ts (1)
132-133: Consider a lightweight runtime guard for the passthrough path.The type cast
as Record<string, unknown> | null | undefinedis safe for the expected call patterns (Zod schema → converted, plainRecord→ passthrough,null/undefined→ preserved), but a non-Zod, non-object value (e.g., a string literal passed by a JS caller) would silently slip through to storage without a compile-time error.♻️ Optional: narrow the passthrough with a runtime guard
- return store.updateDataset({ - id: this.id, - ...rest, - inputSchema: inputSchema as Record<string, unknown> | null | undefined, - groundTruthSchema: groundTruthSchema as Record<string, unknown> | null | undefined, - }); + const toSchemaRecord = (v: unknown): Record<string, unknown> | null | undefined => { + if (v === null || v === undefined) return v; + if (typeof v === 'object') return v as Record<string, unknown>; + throw new MastraError({ + id: 'INVALID_SCHEMA_VALUE', + text: `Schema must be a plain object or null, got: ${typeof v}`, + domain: 'STORAGE', + category: 'USER', + }); + }; + return store.updateDataset({ + id: this.id, + ...rest, + inputSchema: toSchemaRecord(inputSchema), + groundTruthSchema: toSchemaRecord(groundTruthSchema), + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/datasets/dataset.ts` around lines 132 - 133, The current casting of inputSchema and groundTruthSchema to Record<string, unknown> | null | undefined can let non-object values slip through; add a lightweight runtime guard in the data path that sets the passthrough only if the value is an object (e.g., typeof val === "object" && val !== null && !Array.isArray(val)) or preserves null/undefined otherwise, and throw or coerce for other types; update wherever inputSchema and groundTruthSchema are assigned/consumed (references: inputSchema, groundTruthSchema in dataset.ts and any factory/constructor that accepts them) to perform this check before the cast so non-object primitives from JS callers are rejected instead of silently stored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/core/src/datasets/dataset.ts`:
- Around line 132-133: The current casting of inputSchema and groundTruthSchema
to Record<string, unknown> | null | undefined can let non-object values slip
through; add a lightweight runtime guard in the data path that sets the
passthrough only if the value is an object (e.g., typeof val === "object" && val
!== null && !Array.isArray(val)) or preserves null/undefined otherwise, and
throw or coerce for other types; update wherever inputSchema and
groundTruthSchema are assigned/consumed (references: inputSchema,
groundTruthSchema in dataset.ts and any factory/constructor that accepts them)
to perform this check before the cast so non-object primitives from JS callers
are rejected instead of silently stored.
Server handler was converting null (disable schema) to undefined (no change). Now null flows through to DB on write, and stores normalize null → undefined on the read path only. - Remove ?? undefined coercion in UPDATE_DATASET_ROUTE handler - Add | null to CreateDatasetInput and UpdateDatasetInput (not DatasetRecord) - Normalize null → undefined in inmemory, PG, and LibSQL store read paths - Update response schema and client type to match - Skip Zod conversion for null schemas in Dataset.update Co-Authored-By: Mastra Code <noreply@mastra.ai>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/storage/domains/datasets/inmemory.ts (1)
70-83: 🛠️ Refactor suggestion | 🟠 MajorRemove
as DatasetRecordcasts and annotate withInternalDatasetRecordtype.Both
CreateDatasetInputandUpdateDatasetInputexplicitly allowinputSchemaandgroundTruthSchemato benull, but theas DatasetRecordcast suppresses this. TheInternalDatasetRecordtype was designed precisely to capture the nullable distinction but isn't being used. Additionally,InMemoryDB.datasetsis typed asMap<string, DatasetRecord>but storesInternalDatasetRecordvalues, creating a type mismatch that could lead future callers to omit null checks.Suggested fix
For
createDataset(lines 70–80):- const dataset = { + const dataset: InternalDatasetRecord = { id, name: input.name, description: input.description, metadata: input.metadata, inputSchema: input.inputSchema, groundTruthSchema: input.groundTruthSchema, version: 0, createdAt: now, updatedAt: now, - } as DatasetRecord; + };For
_doUpdateDataset(lines 96–104):- const updated = { + const updated: InternalDatasetRecord = { ...existing, name: args.name ?? existing.name, description: args.description ?? existing.description, metadata: args.metadata ?? existing.metadata, inputSchema: args.inputSchema !== undefined ? args.inputSchema : existing.inputSchema, groundTruthSchema: args.groundTruthSchema !== undefined ? args.groundTruthSchema : existing.groundTruthSchema, updatedAt: new Date(), - } as DatasetRecord; + };Also re-type
InMemoryDB.datasetsfromMap<string, DatasetRecord>toMap<string, InternalDatasetRecord>so all internal accesses carry the correct nullable signature.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/storage/domains/datasets/inmemory.ts` around lines 70 - 83, The current code casts objects to DatasetRecord which hides nullable schema fields; instead construct and annotate the stored objects as InternalDatasetRecord in createDataset and _doUpdateDataset (use the InternalDatasetRecord type for the created/updated dataset objects), remove the "as DatasetRecord" casts, and change the InMemoryDB.datasets declaration from Map<string, DatasetRecord> to Map<string, InternalDatasetRecord> so internal storage preserves nullable inputSchema/groundTruthSchema and callers must handle nulls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/core/src/storage/domains/datasets/inmemory.ts`:
- Around line 70-83: The current code casts objects to DatasetRecord which hides
nullable schema fields; instead construct and annotate the stored objects as
InternalDatasetRecord in createDataset and _doUpdateDataset (use the
InternalDatasetRecord type for the created/updated dataset objects), remove the
"as DatasetRecord" casts, and change the InMemoryDB.datasets declaration from
Map<string, DatasetRecord> to Map<string, InternalDatasetRecord> so internal
storage preserves nullable inputSchema/groundTruthSchema and callers must handle
nulls.
Only forward cause when it contains a failingItems array. All other cause values are omitted to prevent leaking internals. Wrapped in try/catch so serialization failures never break the response. Co-Authored-By: Mastra Code <noreply@mastra.ai>
Co-Authored-By: Mastra Code <noreply@mastra.ai>
Callers cast at their own call site. Adds @throws JSDoc and array guard to avoid unwrapping non-superjson arrays. Co-Authored-By: Mastra Code <noreply@mastra.ai>
…ataset-workflow-schema
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dataset schemas weren't being validated or displayed properly in Studio. A few things were broken:
Workflow schema deserialization —
getWorkflowInfoserializes schemas withsuperjson.stringify, but the client SDK used plainJSON.parse, yielding a{json: {...}}wrapper instead of the actual JSON Schema. Added aparseSuperJsonStringhelper to unwrap the envelope.Edit dialog missing schemas —
EditDatasetDialogwasn't receivinginputSchema/groundTruthSchemaprops from the dataset page. Schemas now get passed through.Schema editor cursor reset — Typing in the schema JSON editor caused the cursor to jump to the top. The
SchemaFieldcomponent had auseEffectthat re-serialized the value on every keystroke. Added anisLocalEditRefto skip external sync during local edits.Validation error propagation — The Hono adapter dropped
HTTPException.causewhen serializing error responses, sofailingItemsfrom schema update validation never reached the UI. Now includescausein the response. Also fixed the edit dialog to read fromMastraClientError.body(where the parsed response lives) instead oferror.cause.Summary by CodeRabbit
New Features
Bug Fixes
Chores