fix: Change attributes datasets and experiment on client to lazyloading and mark datasets as deprecated#3646
Conversation
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 2f53649 in 10 seconds. Click for details.
- Reviewed
125lines of code in2files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_PbsjYa6y5uOLuSYK
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
📝 WalkthroughWalkthroughClient class attributes Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@packages/traceloop-sdk/traceloop/sdk/client/client.py`:
- Around line 81-90: Update the deprecation messages that incorrectly say
"Dataclass traceloop.sdk.datasets" to correctly reference the Datasets
class/module by changing the reason text in the `@deprecated` decorators for the
datasets accessor and the two other deprecated items in this file (the datasets
method and the other two decorators around lines 93-99 and 101-104); ensure all
three reasons consistently say something like "Datasets traceloop.sdk.datasets"
or "the Datasets class in traceloop.sdk.datasets" so they uniformly point to the
Datasets class/module.
- Around line 1-3: Ruff F821 arises because the name Datasets is used in a
setter type annotation but not defined at import time; add "from typing import
TYPE_CHECKING" to the imports and inside an if TYPE_CHECKING: block import the
Datasets symbol (e.g., "from .datasets import Datasets" or the correct module
path) so the type name is available for type checking only and doesn't trigger
runtime imports—update the setter/property annotation to remain unchanged,
relying on the forward-import under TYPE_CHECKING to satisfy the linter.
Changes ------- - Import and load attribute datasets only once when requested - Mark attribute datasets on Client as deprecated Reasoning --------- Calling Traceloop.init() allocates ~35 MB of memory when pandas is installed, even if the datasets attribute is not used. People interested in the Datasets class should IMO just create an instance on their own. Since it's already released I propose a deprecation process.
Changes
-------
- Import and load attribute datasets only once when requested
2f53649 to
14172f2
Compare
| reason="datasets as client attribute is deprecated. Use a dedicated instance " | ||
| "of Datasets from traceloop.sdk.datasets" | ||
| ) | ||
| def datasets(self): |
There was a problem hiding this comment.
Can you pls explain why you need this change ?
There was a problem hiding this comment.
As laid out already in my initial PR description (see section Reasoning)
You let everybody who has pandas installed pay a 35 MB tax, when loading this library, where it is not clear why the experimental and datasets should be loaded alongside the main client.
If it is experimental code, it should not be loaded automatically into the environment of consumers of the library.
There was a problem hiding this comment.
Thanks for the detailed explanation @tassadarius , this is a valid concern.
We're currently working on some architectural changes to the SDK, and as part of that effort, we're planning to introduce package extras to properly address this.
This is on our roadmap but will take some time to implement properly without breaking existing users.
In the meantime, if this is blocking you, you could apply a local patch to defer the import — for example, by setting an environment variable check before the pandas-dependent imports in your local installation.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.feat(instrumentation): ...orfix(instrumentation): ....(If applicable) I have updated the documentation accordingly.Changes
datasetsandexperimentalonly once when actually requesteddatasetson Client as deprecatedReasoning
Calling Traceloop.init() allocates ~35 MB of memory when pandas is
installed, either by the attributes
datasetsorexperiment. Both of these attributes seemnot relevant for production
People interested in the Datasets class should IMO just create an
instance on their own.
Since it's already released I propose a deprecation process.
In general I think it is a bad idea to expose and initialize all these internal attributes directly on the public main class.
Here you can see the memory usages done with memray of this simple execution:
TRACELOOP_API_KEY=asdf uv run --active memray run -c "from traceloop.sdk import Traceloop;Traceloop.init(app_name='joke_generation_service')Important
Optimize
Clientmemory usage by lazy-loadingdatasetsandexperimentattributes, markingdatasetsas deprecated.datasetsandexperimentattributes inClientare now lazy-loaded, reducing initial memory usage.datasetsattribute is marked as deprecated with a warning, suggesting users create their own instances.test_client_lazy_loads_datasets()andtest_client_lazy_loads_experiment()intest_client.pyto verify lazy loading.test_datasets_deprecation_warnings()to check for deprecation warnings.DatasetsandExperimentinclient.pyand moved them inside the properties.This description was created by
for 2f53649. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Deprecations
datasetsproperty on the client is deprecated, including its setter and deleter; it will be removed in a future release.Changes
experimentproperty now initializes on first access instead of during client creation.Tests
experimentanddatasets, caching/idempotence, and that deprecation warnings are emitted fordatasets.✏️ Tip: You can customize this high-level summary in your review settings.