refactor(streaming): remove stream_usage and fix streaming metadata capture#1624
Open
refactor(streaming): remove stream_usage and fix streaming metadata capture#1624
Conversation
Contributor
Documentation preview |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
7565f67 to
12411c8
Compare
Contributor
Greptile OverviewGreptile SummaryThis PR refactors streaming metadata handling by removing redundant Key Changes:
The refactor is well-designed with proper backward compatibility through deprecation warnings, comprehensive test coverage (33 passing streaming tests, 173 runnable rails tests), and clear documentation updates.
|
| Filename | Overview |
|---|---|
| nemoguardrails/actions/llm/utils.py | Fixed _stream_llm_call to properly extract and accumulate both response_metadata and usage_metadata from chunks using new _extract_chunk_metadata helper |
| nemoguardrails/streaming.py | Renamed include_generation_metadata to include_metadata with deprecation warning, renamed generation_info to metadata, fixed END_OF_STREAM handling to set text to empty string |
| nemoguardrails/rails/llm/llmrails.py | Removed redundant stream_usage=True from _prepare_model_kwargs, updated stream_async to use include_metadata parameter with deprecation warning for old parameter |
| nemoguardrails/integrations/langchain/runnable_rails.py | Updated to use include_metadata instead of include_generation_metadata, renamed generation_info to metadata in chunk processing, added empty string filtering |
| tests/test_streaming_handler.py | Updated tests to use metadata instead of generation_info, added new test test_metadata_accumulation_across_chunks to verify metadata accumulation behavior |
Sequence Diagram
sequenceDiagram
participant User
participant LLMRails
participant StreamingHandler
participant _stream_llm_call
participant LLM
User->>LLMRails: stream_async(messages, include_metadata=True)
LLMRails->>StreamingHandler: new StreamingHandler(include_metadata=True)
LLMRails->>LLMRails: generate_async(streaming_handler)
LLMRails->>_stream_llm_call: call with handler
loop For each chunk from LLM
_stream_llm_call->>LLM: llm.astream(messages)
LLM-->>_stream_llm_call: chunk (with content)
_stream_llm_call->>_stream_llm_call: _extract_chunk_metadata(chunk)
Note over _stream_llm_call: Extract response_metadata<br/>and usage_metadata if present
_stream_llm_call->>_stream_llm_call: accumulated_metadata.update(chunk_metadata)
_stream_llm_call->>StreamingHandler: push_chunk(content, chunk_metadata)
StreamingHandler->>StreamingHandler: current_metadata.update(metadata)
StreamingHandler->>StreamingHandler: queue.put({"text": content, "metadata": current_metadata})
StreamingHandler-->>User: yield {"text": content, "metadata": {...}}
end
_stream_llm_call->>_stream_llm_call: llm_response_metadata_var.set(accumulated_metadata)
_stream_llm_call->>StreamingHandler: finish()
StreamingHandler->>StreamingHandler: push_chunk(END_OF_STREAM)
Note over StreamingHandler: END_OF_STREAM converted to<br/>{"text": "", "metadata": {...}}
StreamingHandler-->>User: yield final chunk with all metadata
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
stream_usage=Truefrom_prepare_model_kwargs(OpenAI auto-enables it; ChatNVIDIA ignores it) and the defensivekwargs.pop("stream_usage", None)inlangchain_initializer.py_stream_llm_callto properly capture metadata from streaming chunks using bothresponse_metadataandusage_metadata(previously only readresponse_metadatawhich is empty during streaming)response_metadataandusage_metadataon separate final chunks)include_generation_metadatatoinclude_metadatawith graceful deprecation (warning + removal in 0.22.0)generation_infotometadataacross streaming handler, runnable rails, and tests_TEST_PROVIDERS_WITH_TOKEN_USAGE_SUPPORT→_TEST_PROVIDERS_WITH_TOKEN_USAGE)Test plan
poetry run pytest tests/test_streaming_handler.py— 33 passedpoetry run pytest tests/runnable_rails/— 173 passedpoetry run pytest tests/ -k "stream"— 134 passedpoetry run python verify.py)poetry run nemoguardrails chat --config=./examples/bots/hello_world --streamingNVIDIA_API_KEY):