Conversation
…nk/guardrail_as_func
…nk/guardrail_as_func
…nk/guardrail_as_func
📝 WalkthroughWalkthroughThe PR replaces the old guardrails implementation with a new Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Guardrails as Guardrails
participant Evaluator as Evaluator Service
participant TraceloopAPI as Traceloop API
App->>Guardrails: invoke run(func, input_mapper)
Guardrails->>App: call guarded function -> result
Guardrails->>Evaluator: build evaluator request(s) (condition_field, input mapping)
Evaluator->>TraceloopAPI: POST /v2/evaluators/.../execute or /execute-single
TraceloopAPI-->>Evaluator: execution_id / stream_url
Evaluator->>TraceloopAPI: GET events (poll)
TraceloopAPI-->>Evaluator: evaluator_result (wrapped evaluator_result)
Evaluator-->>Guardrails: evaluation outcome (pass/fail, fields)
Guardrails->>Guardrails: apply Condition(s) and OnFailure handler if needed
Guardrails-->>App: return original result or failure result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@packages/sample-app/sample_app/guardrails/custom_evaluator_guard.py`:
- Around line 7-10: The guard references the evaluator slug "medicaladvice" but
docs/prints use "medical-advice-detector", causing misconfiguration; update all
occurrences of the string "medical-advice-detector" in this file (console
prints, docstrings, example outputs and any helper text) to the slug
"medicaladvice" so they match the guard's usage, and verify the evaluator lookup
code (the usage in CustomEvaluatorGuard or similarly named class/function) still
resolves correctly to "medicaladvice".
In `@packages/sample-app/sample_app/guardrails/custom_function_guard.py`:
- Around line 50-51: The code returns completion.choices[0].message.content
directly which can be None (especially for tool calls); in each generate_*
function replace that direct access with a safe fallback: read content =
completion.choices[0].message.content and if content is None set content = ""
(or raise a clear error) before further processing and before returning; update
all occurrences that reference completion.choices[0].message.content in the
generate_* functions so downstream code (e.g., accesses like z["word_count"] or
slicing result[:100]) never receives None.
In `@packages/sample-app/sample_app/guardrails/validate_example.py`:
- Around line 96-109: The code prints "Output validation passed."
unconditionally after calling output_guardrail.run, which can return a failure
result depending on the guard's on_failure handler; remove the extra blank line
and change the logic after calling output_guardrail.run (the call to
output_guardrail.run that wraps generate_response) to check the returned result
and only print the success message when the result indicates success (or handle
the failure branch accordingly), referencing the result variable and the
output_guardrail.run call (and consider OnFailure.raise_exception()/return_value
behaviors) so failure paths are properly handled instead of always logging
success.
In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py`:
- Around line 198-244: The input values passed into InputExtractor are untyped
Any but InputExtractor.source expects a str; in run() convert each input value
to a string when building InputSchemaMapping (i.e. replace {k:
InputExtractor(source=v) for k,v in input.items()} with a mapping that uses
InputExtractor(source=str(v))) so non-string inputs won't break validation, and
keep or update the run() signature as needed (alternatively, change the type
hint to Dict[str, str] if you want to forbid non-string values); ensure this
change is applied where InputSchemaMapping and InputExtractor are constructed in
run().
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/default_mapper.py`:
- Around line 6-49: The default_input_mapper function currently returns repeated
references via [input_dict] * num_guards and [enriched] * num_guards which lets
one guard's mutation affect all others; change both return paths to produce
distinct copies per guard (e.g., construct a new dict per iteration or use
copy.deepcopy) so each guard gets an independent dict, and ensure any nested
mutable fields like the "context" list are also copied for each guard; update
references to input_dict and enriched accordingly.
- Around line 47-48: The code uses enriched.setdefault("context",
[output["context"]]) which won't convert an existing string to a list; change
the logic to normalize output["context"] into a list and assign it to enriched.
Specifically, in the block where "context" in output is checked, read val =
output["context"], if not isinstance(val, list) wrap it as [val], then set
enriched["context"] = that list (or use enriched.setdefault only after
converting); update the code around the existing enriched/output handling so
"context" is always a list.
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/guards.py`:
- Around line 130-141: The evaluator factory calls
(EvaluatorMadeByTraceloop.toxicity_detector, .sexism_detector, .pii_detector,
and .prompt_injection) currently pass threshold/probability_threshold even when
the argument is explicitly None, which overwrites factory defaults; change each
guard factory call to only include the threshold/probability_threshold kwarg
when the caller provided a non-None value (e.g., build a small kwargs dict or
use an if/else to call EvaluatorMadeByTraceloop.toxicity_detector(...) without
the threshold param when threshold is None) so the evaluator factory can use its
default 0.7 behavior.
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/model.py`:
- Around line 140-144: The __str__ method can raise AttributeError when
self.actual_type is None or not a class; update TraceloopGuardError.__str__ (the
__str__ method referencing self.actual_type, self.expected_type,
self.guard_index and self.args) to defensively obtain the type name using
getattr(self.actual_type, "__name__", None) and fall back to
str(self.actual_type) or "None" when __name__ is missing, and format the message
using that safe value instead of directly accessing actual_type.__name__.
🧹 Nitpick comments (16)
packages/traceloop-sdk/traceloop/sdk/evaluator/config.py (1)
6-25: Keep the docstring in sync with the new fields.Add
condition_fieldandoutput_schemato the Args section and update the example so the public API docs stay accurate.Proposed docstring update
class EvaluatorDetails(BaseModel): """ Details for configuring an evaluator. Args: slug: The evaluator slug/identifier + condition_field: Optional field name used to gate conditional evaluation version: Optional version of the evaluator config: Optional configuration dictionary for the evaluator + output_schema: Optional pydantic model class describing the evaluator output required_input_fields: Optional list of required fields to the evaluator input. These fields must be present in the task output. Example: >>> EvaluatorDetails(slug="pii-detector", config={"probability_threshold": 0.8}, required_input_fields=["text"]) + >>> EvaluatorDetails(slug="format-check", condition_field="needs_check", output_schema=MyOutputSchema) >>> EvaluatorDetails(slug="my-custom-evaluator", version="v2") """packages/sample-app/sample_app/guardrails/multiple_guards_example.py (2)
174-177: Consider removing unnecessaryasyncfrom static return function.
generate_problematic_content()doesn't perform any async operations—it just returns a static string. Since this is an example file meant to teach the API, keeping itasyncmay be intentional to show the guardrail can handle async functions, but a comment clarifying this intent would help.
219-230: Rename to avoid shadowing module-level function.
generate_content()defined here shadows the module-levelgenerate_content()at line 40. While it works correctly, this can be confusing in an example file meant to demonstrate patterns.Suggested rename
- async def generate_content() -> str: - """Generate content for sequential validation.""" + async def generate_tokyo_tip() -> str: + """Generate a Tokyo travel tip for sequential validation."""And update the call at line 262:
- result = await guardrail.run(generate_content, input_mapper=create_sequential_inputs) + result = await guardrail.run(generate_tokyo_tip, input_mapper=create_sequential_inputs)packages/sample-app/sample_app/guardrails/traceloop_evaluator_guard.py (2)
90-93: Minor style inconsistency withraise_exceptionargument.Line 61 uses
message="PII detected in response"(keyword), while line 92 uses positional argument. Consider using the keyword form consistently for clarity.🔧 Suggested change for consistency
guardrail = client.guardrails.create( guards=[Guards.toxicity_detector(threshold=0.7)], - on_failure=OnFailure.raise_exception("Content too toxic for family audience"), + on_failure=OnFailure.raise_exception(message="Content too toxic for family audience"), )
103-108: Consider adding type hints for clarity.Adding type hints to the state class would improve readability and IDE support, especially since this example serves as documentation for users learning the guardrails API.
📝 Suggested type hints
class TravelAgentState: """Track agent prompts and completions for trajectory evaluation.""" def __init__(self): - self.prompts = [] - self.completions = [] + self.prompts: list[str] = [] + self.completions: list[str] = []packages/sample-app/sample_app/guardrails/custom_evaluator_guard.py (1)
40-46: Make guard input mapping explicit (useMedicalAdviceInput).This avoids reliance on the default mapper and demonstrates the recommended Pydantic model usage.
♻️ Suggested refactor
class MedicalAdviceInput(BaseModel): """Input model for medical advice evaluator.""" text: str + +def map_to_medical_input(text: str) -> list[MedicalAdviceInput]: + return [MedicalAdviceInput(text=text)] @@ `@guardrail`( guards=[Guards.custom_evaluator_guard(evaluator_slug="medicaladvice")], + input_mapper=map_to_medical_input, on_failure=OnFailure.return_value(value="Sorry, I can't help you with that."), name="medical_advice_quality_check", ) @@ result = await guardrail.run( attempt_diagnosis_request, + input_mapper=map_to_medical_input, )Also applies to: 56-60, 132-134
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1)
70-112: Consider extracting common HTTP execution logic to reduce duplication.Both
_execute_evaluator_in_experiment_requestand_execute_evaluator_requestshare identical error handling and response parsing. A helper method could reduce this duplication.Suggested refactor
+ async def _post_evaluator_request( + self, + url: str, + body: dict, + evaluator_slug: str, + timeout_in_sec: int, + ) -> ExecuteEvaluatorResponse: + """Common HTTP POST logic for evaluator requests.""" + response = await self._async_http_client.post( + url, json=body, timeout=httpx.Timeout(timeout_in_sec) + ) + if response.status_code != 200: + error_detail = _extract_error_from_response(response) + raise Exception( + f"Failed to execute evaluator '{evaluator_slug}': " + f"{response.status_code} - {error_detail}" + ) + return ExecuteEvaluatorResponse(**response.json()) + async def _execute_evaluator_in_experiment_request( self, evaluator_slug: str, request: ExecuteEvaluatorInExperimentRequest, timeout_in_sec: int = 120, ) -> ExecuteEvaluatorResponse: - """Execute evaluator request and return response""" - body = request.model_dump() - client = self._async_http_client - full_url = f"/v2/evaluators/slug/{evaluator_slug}/execute" - response = await client.post( - full_url, json=body, timeout=httpx.Timeout(timeout_in_sec) - ) - if response.status_code != 200: - error_detail = _extract_error_from_response(response) - raise Exception( - f"Failed to execute evaluator '{evaluator_slug}': " - f"{response.status_code} - {error_detail}" - ) - result_data = response.json() - return ExecuteEvaluatorResponse(**result_data) + return await self._post_evaluator_request( + f"/v2/evaluators/slug/{evaluator_slug}/execute", + request.model_dump(), + evaluator_slug, + timeout_in_sec, + )packages/sample-app/sample_app/guardrails/decorator_example.py (1)
23-23: Consider using the standardOPENAI_API_KEYenvironment variable name.The OpenAI SDK conventionally uses
OPENAI_API_KEY. UsingOPENAI_KEYmay cause confusion or require additional documentation for users who expect the standard variable name.💡 Suggested change
-openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_KEY")) +openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))packages/sample-app/sample_app/guardrails/validate_example.py (2)
67-105: Guardrail instances recreated on every call - consider moving to module level.Creating
prompt_guardrailandoutput_guardrailinsidesecure_chat()means they're recreated on every function call. For a sample app this is fine for demonstrating the API, but for production code, consider initializing guardrails once at module level or in a setup function.
76-76: String slicing may fail on short inputs.
user_prompt[:50]is safe (Python handles short strings gracefully), butresponse[:200]at line 123 assumes the response is at least that long. This is minor since it's example code, but adding...suffix unconditionally could be misleading for short responses.Optional: Handle short strings
- print(f"Validating user input: '{user_prompt[:50]}...'") + preview = user_prompt[:50] + "..." if len(user_prompt) > 50 else user_prompt + print(f"Validating user input: '{preview}'")packages/traceloop-sdk/tests/guardrails/test_validate.py (1)
14-21: Helper function duplicated across test files.This
create_guardrails_with_guardshelper is nearly identical to the one intest_validate_inputs.py(lines 32-38). Consider extracting to a sharedconftest.pyfixture to avoid duplication.Optional: Move to conftest.py
# In tests/guardrails/conftest.py import pytest from unittest.mock import MagicMock from traceloop.sdk.guardrail.guardrail import Guardrails from traceloop.sdk.guardrail.on_failure import OnFailure `@pytest.fixture` def guardrails_factory(): """Factory fixture to create Guardrails with specified guards.""" def _create(guards: list, on_failure=None) -> Guardrails: mock_client = MagicMock() guardrails = Guardrails(mock_client) guardrails._guards = guards guardrails._on_failure = on_failure or OnFailure.noop() return guardrails return _createpackages/traceloop-sdk/tests/guardrails/test_validate_inputs.py (2)
26-29: UnusedAnotherInputclass defined but never referenced in tests.The
AnotherInputmodel is defined but not used anywhere in this file. Consider removing it or adding tests that utilize it.Remove unused class
-class AnotherInput(BaseModel): - """Different Pydantic model for testing type mismatches.""" - name: str - count: int - -
32-38: Duplicate helper function - same as test_validate.py.This helper is identical to the one in
test_validate.pyexcept for the defaulton_failure(lambda vsOnFailure.noop()). Consider consolidating inconftest.py.packages/traceloop-sdk/traceloop/sdk/guardrail/guards.py (1)
36-90: Consider handling missingcondition_fieldkey gracefully.At line 83, if
condition_fieldis set but the key doesn't exist inevaluator_result, this will raise aKeyError. Since guard execution errors are caught and wrapped inGuardExecutionErrorby the caller, this may be acceptable, but a more informative error message would help debugging.💡 Optional improvement for better error messages
if condition_field: - result_to_validate = eval_response.result.evaluator_result[condition_field] + evaluator_result = eval_response.result.evaluator_result + if condition_field not in evaluator_result: + raise KeyError( + f"condition_field '{condition_field}' not found in evaluator result. " + f"Available fields: {list(evaluator_result.keys())}" + ) + result_to_validate = evaluator_result[condition_field] else: result_to_validate = eval_response.result.evaluator_resultpackages/traceloop-sdk/traceloop/sdk/guardrail/guardrail.py (2)
29-29: Unused import:Evaluator.
Evaluatoris imported at line 29 but not used in this file. It appears to be used inguards.pyinstead.🧹 Proposed fix
-from traceloop.sdk.evaluator.evaluator import Evaluator
69-76:_evaluatorinstance is created but never used.The
Evaluatorinstance is created at line 71 and assigned toself._evaluator, but it's never referenced elsewhere in this class. Consider removing it if unused.🧹 Proposed fix
def __init__(self, async_http_client: httpx.AsyncClient): self._async_http = async_http_client - self._evaluator = Evaluator(async_http_client) self._guards = [] self._on_failure = None self._run_all = False self._parallel = True self._name = ""Also remove the class attribute declaration at line 61:
- _evaluator: Evaluator
| 1. PASS Case: General health information that should be allowed | ||
| - Educational content about hypertension and blood pressure | ||
| - Uses medical-advice-detector evaluator | ||
| - Demonstrates safe general health information |
There was a problem hiding this comment.
Align evaluator slug references to prevent misconfiguration.
Docs/console output mention "medical-advice-detector" while the guard uses "medicaladvice", which can lead users to configure the wrong evaluator and make the example fail. Please pick one slug and update all references consistently.
🛠️ Example alignment (update docs/prints to match the code)
- - Uses medical-advice-detector evaluator
+ - Uses medicaladvice evaluator
@@
- Custom Evaluator Required: 'medicaladvice'
+ Custom Evaluator Required: 'medicaladvice'
@@
- print("Note: Requires custom evaluator 'medical-advice-detector' in Traceloop")
+ print("Note: Requires custom evaluator 'medicaladvice' in Traceloop")Also applies to: 57-58, 68-69, 144-145
🤖 Prompt for AI Agents
In `@packages/sample-app/sample_app/guardrails/custom_evaluator_guard.py` around
lines 7 - 10, The guard references the evaluator slug "medicaladvice" but
docs/prints use "medical-advice-detector", causing misconfiguration; update all
occurrences of the string "medical-advice-detector" in this file (console
prints, docstrings, example outputs and any helper text) to the slug
"medicaladvice" so they match the guard's usage, and verify the evaluator lookup
code (the usage in CustomEvaluatorGuard or similarly named class/function) still
resolves correctly to "medicaladvice".
| return completion.choices[0].message.content | ||
|
|
There was a problem hiding this comment.
Potential None access when LLM returns no content.
completion.choices[0].message.content can be None when the API returns an empty response or when the model uses tool calls. This would cause issues downstream when the guard tries to process the result (e.g., accessing z["word_count"] or slicing with result[:100]).
Consider adding a fallback:
🛡️ Proposed fix
- return completion.choices[0].message.content
+ return completion.choices[0].message.content or ""This pattern appears in all generate_* functions (lines 50, 109, 140, 177).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return completion.choices[0].message.content | |
| return completion.choices[0].message.content or "" | |
🤖 Prompt for AI Agents
In `@packages/sample-app/sample_app/guardrails/custom_function_guard.py` around
lines 50 - 51, The code returns completion.choices[0].message.content directly
which can be None (especially for tool calls); in each generate_* function
replace that direct access with a safe fallback: read content =
completion.choices[0].message.content and if content is None set content = ""
(or raise a clear error) before further processing and before returning; update
all occurrences that reference completion.choices[0].message.content in the
generate_* functions so downstream code (e.g., accesses like z["word_count"] or
slicing result[:100]) never receives None.
|
|
||
|
|
||
| result = await output_guardrail.run( | ||
| lambda: generate_response(user_prompt), | ||
| input_mapper=lambda response_text: [ | ||
| AnswerRelevancyInput(answer=response_text, question=user_prompt), | ||
| SexismDetectorInput(text=response_text), | ||
| ToxicityDetectorInput(text=response_text), | ||
| ], | ||
| ) | ||
|
|
||
| print("Output validation passed.") | ||
|
|
||
| return result |
There was a problem hiding this comment.
Missing failure handling after run() - the result may be a failure response.
Line 107 prints "Output validation passed" unconditionally, but run() can return the on_failure handler's result when guards fail (the default is OnFailure.raise_exception() which would raise, but custom handlers like return_value would return a fallback). If you're using the default handler, this is fine since it raises on failure.
Also, there's an extra blank line at line 97.
Suggested fix
)
-
result = await output_guardrail.run(
lambda: generate_response(user_prompt),
input_mapper=lambda response_text: [
AnswerRelevancyInput(answer=response_text, question=user_prompt),
SexismDetectorInput(text=response_text),
ToxicityDetectorInput(text=response_text),
],
)
- print("Output validation passed.")
+ # Note: With default on_failure=OnFailure.raise_exception(),
+ # reaching here means all guards passed
+ print("Output guards passed.")
return result🤖 Prompt for AI Agents
In `@packages/sample-app/sample_app/guardrails/validate_example.py` around lines
96 - 109, The code prints "Output validation passed." unconditionally after
calling output_guardrail.run, which can return a failure result depending on the
guard's on_failure handler; remove the extra blank line and change the logic
after calling output_guardrail.run (the call to output_guardrail.run that wraps
generate_response) to check the returned result and only print the success
message when the result indicates success (or handle the failure branch
accordingly), referencing the result variable and the output_guardrail.run call
(and consider OnFailure.raise_exception()/return_value behaviors) so failure
paths are properly handled instead of always logging success.
| async def run( | ||
| self, | ||
| evaluator_slug: str, | ||
| input: Dict[str, Any], | ||
| timeout_in_sec: int = 120, | ||
| evaluator_version: Optional[str] = None, | ||
| evaluator_config: Optional[Dict[str, Any]] = None, | ||
| ) -> ExecutionResponse: | ||
| """ | ||
| Execute an evaluator without experiment context. | ||
|
|
||
| This is a simpler interface for running evaluators standalone, | ||
| without associating results with experiments. | ||
|
|
||
| Args: | ||
| evaluator_slug: Slug of the evaluator to execute | ||
| input: Dict mapping evaluator input field names to their values. | ||
| Values can be any type (str, int, dict, etc.) | ||
| timeout_in_sec: Timeout in seconds for execution | ||
| evaluator_version: Version of the evaluator to execute (optional) | ||
| evaluator_config: Configuration for the evaluator (optional) | ||
|
|
||
| Returns: | ||
| ExecutionResponse: The evaluation result | ||
| """ | ||
| _validate_evaluator_input(evaluator_slug, input) | ||
|
|
||
| schema_mapping = InputSchemaMapping( | ||
| root={k: InputExtractor(source=v) for k, v in input.items()} | ||
| ) | ||
|
|
||
| request = ExecuteEvaluatorRequest( | ||
| input_schema_mapping=schema_mapping, | ||
| evaluator_version=evaluator_version, | ||
| evaluator_config=evaluator_config, | ||
| ) | ||
|
|
||
| execute_response = await self._execute_evaluator_request( | ||
| evaluator_slug, request, timeout_in_sec | ||
| ) | ||
|
|
||
| sse_client = SSEClient(shared_client=self._async_http_client) | ||
| return await sse_client.wait_for_result( | ||
| execute_response.execution_id, | ||
| execute_response.stream_url, | ||
| timeout_in_sec, | ||
| ) |
There was a problem hiding this comment.
Potential type mismatch: InputExtractor.source expects str but input values are Any.
The run() method accepts input: Dict[str, Any] (line 201), but line 226 passes values directly to InputExtractor(source=v). According to the model definition, InputExtractor.source is typed as str. Non-string values may cause validation errors or unexpected serialization behavior.
Consider either:
- Restricting the input type to
Dict[str, str]to match experiment methods - Converting values to strings explicitly
Option 1: Align type signature with other methods
async def run(
self,
evaluator_slug: str,
- input: Dict[str, Any],
+ input: Dict[str, str],
timeout_in_sec: int = 120,
evaluator_version: Optional[str] = None,
evaluator_config: Optional[Dict[str, Any]] = None,
) -> ExecutionResponse:Option 2: Convert values to strings
schema_mapping = InputSchemaMapping(
- root={k: InputExtractor(source=v) for k, v in input.items()}
+ root={k: InputExtractor(source=str(v) if not isinstance(v, str) else v) for k, v in input.items()}
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def run( | |
| self, | |
| evaluator_slug: str, | |
| input: Dict[str, Any], | |
| timeout_in_sec: int = 120, | |
| evaluator_version: Optional[str] = None, | |
| evaluator_config: Optional[Dict[str, Any]] = None, | |
| ) -> ExecutionResponse: | |
| """ | |
| Execute an evaluator without experiment context. | |
| This is a simpler interface for running evaluators standalone, | |
| without associating results with experiments. | |
| Args: | |
| evaluator_slug: Slug of the evaluator to execute | |
| input: Dict mapping evaluator input field names to their values. | |
| Values can be any type (str, int, dict, etc.) | |
| timeout_in_sec: Timeout in seconds for execution | |
| evaluator_version: Version of the evaluator to execute (optional) | |
| evaluator_config: Configuration for the evaluator (optional) | |
| Returns: | |
| ExecutionResponse: The evaluation result | |
| """ | |
| _validate_evaluator_input(evaluator_slug, input) | |
| schema_mapping = InputSchemaMapping( | |
| root={k: InputExtractor(source=v) for k, v in input.items()} | |
| ) | |
| request = ExecuteEvaluatorRequest( | |
| input_schema_mapping=schema_mapping, | |
| evaluator_version=evaluator_version, | |
| evaluator_config=evaluator_config, | |
| ) | |
| execute_response = await self._execute_evaluator_request( | |
| evaluator_slug, request, timeout_in_sec | |
| ) | |
| sse_client = SSEClient(shared_client=self._async_http_client) | |
| return await sse_client.wait_for_result( | |
| execute_response.execution_id, | |
| execute_response.stream_url, | |
| timeout_in_sec, | |
| ) | |
| async def run( | |
| self, | |
| evaluator_slug: str, | |
| input: Dict[str, str], | |
| timeout_in_sec: int = 120, | |
| evaluator_version: Optional[str] = None, | |
| evaluator_config: Optional[Dict[str, Any]] = None, | |
| ) -> ExecutionResponse: | |
| """ | |
| Execute an evaluator without experiment context. | |
| This is a simpler interface for running evaluators standalone, | |
| without associating results with experiments. | |
| Args: | |
| evaluator_slug: Slug of the evaluator to execute | |
| input: Dict mapping evaluator input field names to their values. | |
| Values can be any type (str, int, dict, etc.) | |
| timeout_in_sec: Timeout in seconds for execution | |
| evaluator_version: Version of the evaluator to execute (optional) | |
| evaluator_config: Configuration for the evaluator (optional) | |
| Returns: | |
| ExecutionResponse: The evaluation result | |
| """ | |
| _validate_evaluator_input(evaluator_slug, input) | |
| schema_mapping = InputSchemaMapping( | |
| root={k: InputExtractor(source=v) for k, v in input.items()} | |
| ) | |
| request = ExecuteEvaluatorRequest( | |
| input_schema_mapping=schema_mapping, | |
| evaluator_version=evaluator_version, | |
| evaluator_config=evaluator_config, | |
| ) | |
| execute_response = await self._execute_evaluator_request( | |
| evaluator_slug, request, timeout_in_sec | |
| ) | |
| sse_client = SSEClient(shared_client=self._async_http_client) | |
| return await sse_client.wait_for_result( | |
| execute_response.execution_id, | |
| execute_response.stream_url, | |
| timeout_in_sec, | |
| ) |
| async def run( | |
| self, | |
| evaluator_slug: str, | |
| input: Dict[str, Any], | |
| timeout_in_sec: int = 120, | |
| evaluator_version: Optional[str] = None, | |
| evaluator_config: Optional[Dict[str, Any]] = None, | |
| ) -> ExecutionResponse: | |
| """ | |
| Execute an evaluator without experiment context. | |
| This is a simpler interface for running evaluators standalone, | |
| without associating results with experiments. | |
| Args: | |
| evaluator_slug: Slug of the evaluator to execute | |
| input: Dict mapping evaluator input field names to their values. | |
| Values can be any type (str, int, dict, etc.) | |
| timeout_in_sec: Timeout in seconds for execution | |
| evaluator_version: Version of the evaluator to execute (optional) | |
| evaluator_config: Configuration for the evaluator (optional) | |
| Returns: | |
| ExecutionResponse: The evaluation result | |
| """ | |
| _validate_evaluator_input(evaluator_slug, input) | |
| schema_mapping = InputSchemaMapping( | |
| root={k: InputExtractor(source=v) for k, v in input.items()} | |
| ) | |
| request = ExecuteEvaluatorRequest( | |
| input_schema_mapping=schema_mapping, | |
| evaluator_version=evaluator_version, | |
| evaluator_config=evaluator_config, | |
| ) | |
| execute_response = await self._execute_evaluator_request( | |
| evaluator_slug, request, timeout_in_sec | |
| ) | |
| sse_client = SSEClient(shared_client=self._async_http_client) | |
| return await sse_client.wait_for_result( | |
| execute_response.execution_id, | |
| execute_response.stream_url, | |
| timeout_in_sec, | |
| ) | |
| async def run( | |
| self, | |
| evaluator_slug: str, | |
| input: Dict[str, Any], | |
| timeout_in_sec: int = 120, | |
| evaluator_version: Optional[str] = None, | |
| evaluator_config: Optional[Dict[str, Any]] = None, | |
| ) -> ExecutionResponse: | |
| """ | |
| Execute an evaluator without experiment context. | |
| This is a simpler interface for running evaluators standalone, | |
| without associating results with experiments. | |
| Args: | |
| evaluator_slug: Slug of the evaluator to execute | |
| input: Dict mapping evaluator input field names to their values. | |
| Values can be any type (str, int, dict, etc.) | |
| timeout_in_sec: Timeout in seconds for execution | |
| evaluator_version: Version of the evaluator to execute (optional) | |
| evaluator_config: Configuration for the evaluator (optional) | |
| Returns: | |
| ExecutionResponse: The evaluation result | |
| """ | |
| _validate_evaluator_input(evaluator_slug, input) | |
| schema_mapping = InputSchemaMapping( | |
| root={k: InputExtractor(source=str(v) if not isinstance(v, str) else v) for k, v in input.items()} | |
| ) | |
| request = ExecuteEvaluatorRequest( | |
| input_schema_mapping=schema_mapping, | |
| evaluator_version=evaluator_version, | |
| evaluator_config=evaluator_config, | |
| ) | |
| execute_response = await self._execute_evaluator_request( | |
| evaluator_slug, request, timeout_in_sec | |
| ) | |
| sse_client = SSEClient(shared_client=self._async_http_client) | |
| return await sse_client.wait_for_result( | |
| execute_response.execution_id, | |
| execute_response.stream_url, | |
| timeout_in_sec, | |
| ) |
🤖 Prompt for AI Agents
In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py` around lines 198
- 244, The input values passed into InputExtractor are untyped Any but
InputExtractor.source expects a str; in run() convert each input value to a
string when building InputSchemaMapping (i.e. replace {k:
InputExtractor(source=v) for k,v in input.items()} with a mapping that uses
InputExtractor(source=str(v))) so non-string inputs won't break validation, and
keep or update the run() signature as needed (alternatively, change the type
hint to Dict[str, str] if you want to forbid non-string values); ensure this
change is applied where InputSchemaMapping and InputExtractor are constructed in
run().
| from typing import Any | ||
|
|
||
|
|
||
| def default_input_mapper(output: Any, num_guards: int) -> list[dict]: | ||
| """ | ||
| Default mapper for common response types. | ||
|
|
||
| Handles: | ||
| - str: Creates dict with common text field names for each guard | ||
| - dict with {question, answer, context}: Passes through with field aliases | ||
|
|
||
| Args: | ||
| output: The return value from the guarded function | ||
| num_guards: Number of guards to create inputs for | ||
|
|
||
| Returns: | ||
| List of dicts, one per guard | ||
|
|
||
| Raises: | ||
| ValueError: If output type cannot be handled | ||
| """ | ||
| if isinstance(output, str): | ||
| # Map string to common field names used by evaluators | ||
| input_dict = { | ||
| "text": output, | ||
| "prompt": output, | ||
| "completion": output | ||
| } | ||
| return [input_dict] * num_guards | ||
|
|
||
| if isinstance(output, dict): | ||
| # Enrich dict with aliases for compatibility with various evaluators | ||
| enriched = {**output} | ||
| if "text" in output: | ||
| enriched.setdefault("prompt", output["text"]) | ||
| enriched.setdefault("completion", output["text"]) | ||
| if "question" in output: | ||
| enriched.setdefault("query", output["question"]) | ||
| if "answer" in output: | ||
| enriched.setdefault("answer", output["answer"]) | ||
| enriched.setdefault("completion", output["answer"]) | ||
| if "context" in output: | ||
| enriched.setdefault("context", [output["context"]]) | ||
| return [enriched] * num_guards |
There was a problem hiding this comment.
Avoid sharing the same input dict across guards.
[input_dict] * num_guards and [enriched] * num_guards duplicate references, so any mutation by one guard can contaminate others. Create distinct copies per guard.
🛠️ Proposed fix
-from typing import Any
+from typing import Any
+import copy
@@
- return [input_dict] * num_guards
+ return [input_dict.copy() for _ in range(num_guards)]
@@
- return [enriched] * num_guards
+ return [copy.deepcopy(enriched) for _ in range(num_guards)]🤖 Prompt for AI Agents
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/default_mapper.py` around
lines 6 - 49, The default_input_mapper function currently returns repeated
references via [input_dict] * num_guards and [enriched] * num_guards which lets
one guard's mutation affect all others; change both return paths to produce
distinct copies per guard (e.g., construct a new dict per iteration or use
copy.deepcopy) so each guard gets an independent dict, and ensure any nested
mutable fields like the "context" list are also copied for each guard; update
references to input_dict and enriched accordingly.
| if "context" in output: | ||
| enriched.setdefault("context", [output["context"]]) |
There was a problem hiding this comment.
Ensure context is normalized to a list when provided.
setdefault doesn’t update existing values, so a string context remains a string. If the intent is to always provide a list, coerce when needed.
🧩 Proposed fix
- if "context" in output:
- enriched.setdefault("context", [output["context"]])
+ if "context" in output and not isinstance(output["context"], list):
+ enriched["context"] = [output["context"]]🤖 Prompt for AI Agents
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/default_mapper.py` around
lines 47 - 48, The code uses enriched.setdefault("context", [output["context"]])
which won't convert an existing string to a list; change the logic to normalize
output["context"] into a list and assign it to enriched. Specifically, in the
block where "context" in output is checked, read val = output["context"], if not
isinstance(val, list) wrap it as [val], then set enriched["context"] = that list
(or use enriched.setdefault only after converting); update the code around the
existing enriched/output handling so "context" is always a list.
| def __str__(self) -> str: | ||
| return ( | ||
| f"{self.args[0]} [guard {self.guard_index}]: " | ||
| f"expected {self.expected_type}, got {self.actual_type.__name__}" | ||
| ) |
There was a problem hiding this comment.
Potential AttributeError if actual_type is None or a non-class type.
Line 143 accesses actual_type.__name__, but if actual_type is None or a special typing construct, this will raise AttributeError.
Suggested defensive fix
def __str__(self) -> str:
+ type_name = getattr(self.actual_type, '__name__', str(self.actual_type))
return (
f"{self.args[0]} [guard {self.guard_index}]: "
- f"expected {self.expected_type}, got {self.actual_type.__name__}"
+ f"expected {self.expected_type}, got {type_name}"
)🤖 Prompt for AI Agents
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/model.py` around lines 140 -
144, The __str__ method can raise AttributeError when self.actual_type is None
or not a class; update TraceloopGuardError.__str__ (the __str__ method
referencing self.actual_type, self.expected_type, self.guard_index and
self.args) to defensively obtain the type name using getattr(self.actual_type,
"__name__", None) and fall back to str(self.actual_type) or "None" when __name__
is missing, and format the message using that safe value instead of directly
accessing actual_type.__name__.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@packages/sample-app/sample_app/guardrails/custom_evaluator_guard.py`:
- Around line 109-122: The function attempt_diagnosis_request returns
completion.choices[0].message.content without a null fallback, which can cause a
TypeError if content is None; update attempt_diagnosis_request to mirror the
earlier handler by returning completion.choices[0].message.content or "" (or an
explicit empty string fallback) so downstream slicing won't fail, and ensure you
reference the same openai_client.chat.completions.create result handling used
elsewhere.
In `@packages/sample-app/sample_app/guardrails/decorator_example.py`:
- Around line 30-39: The function generate_response should never return None
even if completion.choices[0].message.content is None; update the return
behavior in generate_response (after the await
openai_client.chat.completions.create call) to safely coalesce
completion.choices[0].message.content to a string fallback (e.g., empty string
or a descriptive fallback) so the declared return type -> str is honored and
downstream slicing won't raise a TypeError.
In `@packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py`:
- Around line 244-250: The sync wrapper uses asyncio.run(...) around g.run(...)
which raises RuntimeError if an event loop is already running; detect whether an
event loop is running (asyncio.get_event_loop().is_running() or
asyncio.get_running_loop() with try/except) and, if it is, execute the coroutine
by running a new event loop in a separate thread (submit a callable that calls
asyncio.run(g.run(...))) so the current loop isn't touched; otherwise keep using
asyncio.run directly. Update the call site that currently wraps g.run(lambda:
asyncio.to_thread(func, ...), input_mapper=...) to branch: when no loop is
running use asyncio.run(...), when a loop is running run the same coroutine
inside a new thread via Thread/Executor and wait for that thread/future result.
Ensure you reference the same g.run invocation and the lambda that calls
asyncio.to_thread(func, *args, **kwargs).
🧹 Nitpick comments (5)
packages/sample-app/sample_app/guardrails/decorator_example.py (1)
22-22: Use consistent environment variable name for OpenAI API key.This file uses
OPENAI_KEYwhilecustom_evaluator_guard.pyusesOPENAI_API_KEY. The standard convention (and OpenAI SDK default) isOPENAI_API_KEY.🔧 Proposed fix
-openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_KEY")) +openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))packages/sample-app/sample_app/guardrails/custom_evaluator_guard.py (1)
42-44: Consider removing or using MedicalAdviceInput.The
MedicalAdviceInputmodel is defined but never instantiated or used. If it's intended as documentation for the evaluator's expected schema, consider adding a comment clarifying this, or remove it to avoid confusion.packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py (1)
167-168: Inconsistent union type syntax.Line 167 uses the Python 3.10+
|syntax (InputMapper | None) while Line 168 usesUnion[...]. Consider using consistent syntax throughout.♻️ Use consistent typing syntax
def guardrail( *guards: Guard, - input_mapper: InputMapper | None = None, - on_failure: Union[OnFailureHandler, str, None] = None, + input_mapper: Optional[InputMapper] = None, + on_failure: Optional[Union[OnFailureHandler, str]] = None, name: str = "", ) -> Callable[[Callable[_P, _R]], Callable[_P, _R]]:Or use Python 3.10+ syntax consistently:
def guardrail( *guards: Guard, - input_mapper: InputMapper | None = None, - on_failure: Union[OnFailureHandler, str, None] = None, + input_mapper: InputMapper | None = None, + on_failure: OnFailureHandler | str | None = None, name: str = "", ) -> Callable[[Callable[_P, _R]], Callable[_P, _R]]:packages/traceloop-sdk/tests/guardrails/test_guardrail_decorator.py (2)
214-260: Consider verifying on_failure handler behavior more precisely.The tests verify that
on_failureis callable but don't verify the actual behavior (e.g., that the default handler raises an exception or that the string handler returns the correct value). While this is acceptable for unit testing the decorator wiring, consider adding an integration test that verifies the handler behavior when guards actually fail.
309-331: Missing test for sync function execution.The
TestGuardrailDecoratorSyncSupportclass only tests metadata preservation but doesn't include a test that actually executes a sync function through the guardrail decorator. Consider adding a test similar totest_decorator_passes_through_result_when_guards_passbut for sync functions.💡 Suggested test for sync function execution
def test_sync_decorator_passes_through_result_when_guards_pass(self): """Decorator returns sync function result when all guards pass.""" mock_guardrails = MagicMock() mock_guardrails.create.return_value = mock_guardrails mock_guardrails.run = AsyncMock(return_value="guarded result") mock_client = MagicMock() mock_client.guardrails = mock_guardrails with patch("traceloop.sdk.Traceloop") as mock_traceloop: mock_traceloop.get.return_value = mock_client `@guardrail`(lambda z: True, on_failure=OnFailure.raise_exception()) def my_sync_function(prompt: str) -> str: return f"Response to: {prompt}" result = my_sync_function("Hello") assert result == "guarded result" mock_guardrails.create.assert_called_once() mock_guardrails.run.assert_awaited_once()
| async def attempt_diagnosis_request() -> str: | ||
| """Generate response to diagnosis request (will be blocked).""" | ||
| user_question = "I have chest pain, shortness of breath, and dizziness. Do I have a heart attack?" | ||
|
|
||
| completion = await openai_client.chat.completions.create( | ||
| model="gpt-4o-mini", | ||
| messages=[ | ||
| { | ||
| "role": "user", | ||
| "content": user_question, | ||
| } | ||
| ], | ||
| ) | ||
| return completion.choices[0].message.content |
There was a problem hiding this comment.
Add null fallback for message content.
Line 122 returns message.content without a fallback, unlike line 85 which uses or "". If content is None, line 134's slicing will raise a TypeError.
🛡️ Proposed fix
- return completion.choices[0].message.content
+ return completion.choices[0].message.content or ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def attempt_diagnosis_request() -> str: | |
| """Generate response to diagnosis request (will be blocked).""" | |
| user_question = "I have chest pain, shortness of breath, and dizziness. Do I have a heart attack?" | |
| completion = await openai_client.chat.completions.create( | |
| model="gpt-4o-mini", | |
| messages=[ | |
| { | |
| "role": "user", | |
| "content": user_question, | |
| } | |
| ], | |
| ) | |
| return completion.choices[0].message.content | |
| async def attempt_diagnosis_request() -> str: | |
| """Generate response to diagnosis request (will be blocked).""" | |
| user_question = "I have chest pain, shortness of breath, and dizziness. Do I have a heart attack?" | |
| completion = await openai_client.chat.completions.create( | |
| model="gpt-4o-mini", | |
| messages=[ | |
| { | |
| "role": "user", | |
| "content": user_question, | |
| } | |
| ], | |
| ) | |
| return completion.choices[0].message.content or "" |
🤖 Prompt for AI Agents
In `@packages/sample-app/sample_app/guardrails/custom_evaluator_guard.py` around
lines 109 - 122, The function attempt_diagnosis_request returns
completion.choices[0].message.content without a null fallback, which can cause a
TypeError if content is None; update attempt_diagnosis_request to mirror the
earlier handler by returning completion.choices[0].message.content or "" (or an
explicit empty string fallback) so downstream slicing won't fail, and ensure you
reference the same openai_client.chat.completions.create result handling used
elsewhere.
| async def generate_response(user_prompt: str) -> str: | ||
| """Generate LLM response - automatically guarded by decorator.""" | ||
| completion = await openai_client.chat.completions.create( | ||
| model="gpt-4o-mini", | ||
| messages=[ | ||
| {"role": "system", "content": "You are a helpful assistant."}, | ||
| {"role": "user", "content": user_prompt}, | ||
| ], | ||
| ) | ||
| return completion.choices[0].message.content |
There was a problem hiding this comment.
Add null fallback for message content.
completion.choices[0].message.content can be None (e.g., when the model uses tool calls). The function signature declares -> str but could return None, causing a TypeError at line 58 when slicing.
🛡️ Proposed fix
- return completion.choices[0].message.content
+ return completion.choices[0].message.content or ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def generate_response(user_prompt: str) -> str: | |
| """Generate LLM response - automatically guarded by decorator.""" | |
| completion = await openai_client.chat.completions.create( | |
| model="gpt-4o-mini", | |
| messages=[ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": user_prompt}, | |
| ], | |
| ) | |
| return completion.choices[0].message.content | |
| async def generate_response(user_prompt: str) -> str: | |
| """Generate LLM response - automatically guarded by decorator.""" | |
| completion = await openai_client.chat.completions.create( | |
| model="gpt-4o-mini", | |
| messages=[ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": user_prompt}, | |
| ], | |
| ) | |
| return completion.choices[0].message.content or "" |
🤖 Prompt for AI Agents
In `@packages/sample-app/sample_app/guardrails/decorator_example.py` around lines
30 - 39, The function generate_response should never return None even if
completion.choices[0].message.content is None; update the return behavior in
generate_response (after the await openai_client.chat.completions.create call)
to safely coalesce completion.choices[0].message.content to a string fallback
(e.g., empty string or a descriptive fallback) so the declared return type ->
str is honored and downstream slicing won't raise a TypeError.
| # Run async guardrail in event loop for sync functions | ||
| return asyncio.run( | ||
| g.run( | ||
| lambda: asyncio.to_thread(func, *args, **kwargs), | ||
| input_mapper=input_mapper, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
asyncio.run() will fail if called from a running event loop.
The sync wrapper uses asyncio.run() which raises RuntimeError when called from within an existing event loop (common in Jupyter notebooks, GUI applications, or nested async contexts).
🛠️ Proposed fix to handle existing event loops
`@wraps`(func)
def sync_wrapper(*args: _P.args, **kwargs: _P.kwargs) -> _R:
from traceloop.sdk import Traceloop
client = Traceloop.get()
g = client.guardrails.create(
guards=guards_list,
on_failure=failure_handler,
name=name or func.__name__,
)
- # Run async guardrail in event loop for sync functions
- return asyncio.run(
- g.run(
- lambda: asyncio.to_thread(func, *args, **kwargs),
- input_mapper=input_mapper,
- )
- )
+ coro = g.run(
+ lambda: asyncio.to_thread(func, *args, **kwargs),
+ input_mapper=input_mapper,
+ )
+ # Handle case when called from existing event loop
+ try:
+ loop = asyncio.get_running_loop()
+ except RuntimeError:
+ loop = None
+
+ if loop is not None:
+ # Already in an event loop - use thread to run new loop
+ import concurrent.futures
+ with concurrent.futures.ThreadPoolExecutor() as executor:
+ future = executor.submit(asyncio.run, coro)
+ return future.result()
+ else:
+ return asyncio.run(coro)
return sync_wrapper # type: ignore[return-value]🤖 Prompt for AI Agents
In `@packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py` around lines 244
- 250, The sync wrapper uses asyncio.run(...) around g.run(...) which raises
RuntimeError if an event loop is already running; detect whether an event loop
is running (asyncio.get_event_loop().is_running() or asyncio.get_running_loop()
with try/except) and, if it is, execute the coroutine by running a new event
loop in a separate thread (submit a callable that calls asyncio.run(g.run(...)))
so the current loop isn't touched; otherwise keep using asyncio.run directly.
Update the call site that currently wraps g.run(lambda: asyncio.to_thread(func,
...), input_mapper=...) to branch: when no loop is running use asyncio.run(...),
when a loop is running run the same coroutine inside a new thread via
Thread/Executor and wait for that thread/future result. Ensure you reference the
same g.run invocation and the lambda that calls asyncio.to_thread(func, *args,
**kwargs).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/guards.py`:
- Around line 36-90: The guard_fn currently uses dict(input_data) as a
last-resort which raises TypeError for non-mapping/non-iterable types; update
the input conversion in _create_guard -> guard_fn to explicitly handle mappings
and pydantic models and raise a clear TypeError for unsupported types: use
isinstance(input_data, Mapping) to assign input_dict, elif hasattr(input_data,
"model_dump") to call model_dump(), else raise TypeError(f"Unsupported input
type for guard '{evaluator_slug}': {type(input_data).__name__}") (remove the
fallback dict(input_data)).
🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py (1)
165-170: Inconsistent type annotation syntax.Line 167 uses
|syntax while line 168 usesUnion[...]. Consider using a consistent style throughout.♻️ Suggested fix for consistency
def guardrail( *guards: Guard, - input_mapper: InputMapper | None = None, - on_failure: Union[OnFailureHandler, str, None] = None, + input_mapper: Optional[InputMapper] = None, + on_failure: Optional[Union[OnFailureHandler, str]] = None, name: str = "", ) -> Callable[[Callable[_P, _R]], Callable[_P, _R]]:
| def _create_guard( | ||
| evaluator_details: EvaluatorDetails, | ||
| condition: Callable[[Any], bool], | ||
| timeout_in_sec: int = 60, | ||
| ) -> Guard: | ||
| """ | ||
| Convert an EvaluatorDetails to a guard function. | ||
|
|
||
| Args: | ||
| evaluator_details: The evaluator configuration | ||
| condition: Function that receives evaluator result and returns bool. | ||
| True = pass, False = fail. | ||
| timeout_in_sec: Maximum time to wait for evaluator execution | ||
|
|
||
| Returns: | ||
| Async function suitable for client.guardrails.create(guards=[...]) | ||
| """ | ||
|
|
||
| evaluator_slug = evaluator_details.slug | ||
| evaluator_version = evaluator_details.version | ||
| evaluator_config = evaluator_details.config | ||
| condition_field = evaluator_details.condition_field | ||
|
|
||
| async def guard_fn(input_data: Any) -> bool: | ||
| from traceloop.sdk import Traceloop | ||
| from traceloop.sdk.evaluator.evaluator import Evaluator | ||
|
|
||
| # Convert Pydantic model to dict, or use dict directly | ||
| if isinstance(input_data, dict): | ||
| input_dict = input_data | ||
| elif hasattr(input_data, "model_dump"): | ||
| input_dict = input_data.model_dump() | ||
| else: | ||
| input_dict = dict(input_data) | ||
|
|
||
| client = Traceloop.get() | ||
| evaluator = Evaluator(client._async_http) | ||
|
|
||
| eval_response = await evaluator.run( | ||
| evaluator_slug=evaluator_slug, | ||
| input=input_dict, | ||
| evaluator_version=evaluator_version, | ||
| evaluator_config=evaluator_config, | ||
| timeout_in_sec=timeout_in_sec, | ||
| ) | ||
|
|
||
| if condition_field: | ||
| result_to_validate = eval_response.result.evaluator_result[condition_field] | ||
| else: | ||
| result_to_validate = eval_response.result.evaluator_result | ||
|
|
||
| return condition(result_to_validate) | ||
|
|
||
| guard_fn.__name__ = evaluator_slug | ||
| return guard_fn |
There was a problem hiding this comment.
Potential TypeError on line 69 for non-mapping input types.
The fallback dict(input_data) will raise TypeError if input_data is not iterable as key-value pairs (e.g., a string, number, or custom object without __iter__).
Proposed fix with explicit error handling
# Convert Pydantic model to dict, or use dict directly
if isinstance(input_data, dict):
input_dict = input_data
elif hasattr(input_data, "model_dump"):
input_dict = input_data.model_dump()
else:
- input_dict = dict(input_data)
+ try:
+ input_dict = dict(input_data)
+ except (TypeError, ValueError) as e:
+ raise TypeError(
+ f"Guard input must be a dict, Pydantic model, or dict-convertible type, "
+ f"got {type(input_data).__name__}"
+ ) from e🤖 Prompt for AI Agents
In `@packages/traceloop-sdk/traceloop/sdk/guardrail/guards.py` around lines 36 -
90, The guard_fn currently uses dict(input_data) as a last-resort which raises
TypeError for non-mapping/non-iterable types; update the input conversion in
_create_guard -> guard_fn to explicitly handle mappings and pydantic models and
raise a clear TypeError for unsupported types: use isinstance(input_data,
Mapping) to assign input_dict, elif hasattr(input_data, "model_dump") to call
model_dump(), else raise TypeError(f"Unsupported input type for guard
'{evaluator_slug}': {type(input_data).__name__}") (remove the fallback
dict(input_data)).
feat(instrumentation): ...orfix(instrumentation): ....Summary by CodeRabbit
New Features
New Examples
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.