refactor(streaming): remove stream_usage and fix streaming metadata capture by Pouyanpi · Pull Request #1624 · NVIDIA-NeMo/Guardrails

Pouyanpi · 2026-02-06T12:45:08Z

Summary

Remove redundant stream_usage=True from _prepare_model_kwargs (OpenAI auto-enables it; ChatNVIDIA ignores it) and the defensive kwargs.pop("stream_usage", None) in langchain_initializer.py
Fix _stream_llm_call to properly capture metadata from streaming chunks using both response_metadata and usage_metadata (previously only read response_metadata which is empty during streaming)
Accumulate metadata across chunks (OpenAI sends response_metadata and usage_metadata on separate final chunks)
Rename include_generation_metadata to include_metadata with graceful deprecation (warning + removal in 0.22.0)
Rename internal field generation_info to metadata across streaming handler, runnable rails, and tests
Rename test utility constants for clarity (_TEST_PROVIDERS_WITH_TOKEN_USAGE_SUPPORT → _TEST_PROVIDERS_WITH_TOKEN_USAGE)

Test plan

poetry run pytest tests/test_streaming_handler.py — 33 passed
poetry run pytest tests/runnable_rails/ — 173 passed
poetry run pytest tests/ -k "stream" — 134 passed
E2E verification (save the script below and run with poetry run python verify.py)
Manual streaming chat: poetry run nemoguardrails chat --config=./examples/bots/hello_world --streaming

Manual streaming with output rails (requires NVIDIA_API_KEY):

sed -i '' '/check output/a\
    streaming:\
      enabled: True' ./examples/configs/nemoguards/config.yml
poetry run nemoguardrails chat --config=./examples/configs/nemoguards --streaming
git checkout ./examples/configs/nemoguards/config.yml

import asyncio
import os
import sys
import warnings

from nemoguardrails import LLMRails, RailsConfig


async def main():
    if not os.environ.get("OPENAI_API_KEY"):
        sys.exit("Set OPENAI_API_KEY first")

    config = RailsConfig.from_content(
        config={
            "models": [{"type": "main", "engine": "openai", "model": "gpt-4o"}],
            "streaming": True,
        },
    )

    rails = LLMRails(config)
    messages = [{"role": "user", "content": "Say hello in one word."}]

    # 1. Basic streaming returns plain strings
    chunks = []
    async for chunk in rails.stream_async(messages=messages):
        assert isinstance(chunk, str), f"Expected str, got {type(chunk)}"
        chunks.append(chunk)
    assert chunks and "".join(chunks), "No chunks received"

    # 2. Metadata streaming returns dicts with response_metadata + usage_metadata
    chunks = []
    async for chunk in rails.stream_async(messages=messages, include_metadata=True):
        assert isinstance(chunk, dict) and "text" in chunk
        chunks.append(chunk)
    metadata_chunks = [c for c in chunks if "metadata" in c]
    assert metadata_chunks, "No metadata in any chunk"
    meta = metadata_chunks[-1]["metadata"]
    assert "response_metadata" in meta, "Missing response_metadata"
    assert "usage_metadata" in meta, "Missing usage_metadata"

    # 3. Deprecated param still works and warns
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        async for _ in rails.stream_async(messages=messages, include_generation_metadata=True):
            pass
        assert any(issubclass(x.category, DeprecationWarning) for x in w)

    # 4. Non-streaming still works
    result = await rails.generate_async(messages=messages)
    assert result is not None

    print("All tests passed")


if __name__ == "__main__":
    asyncio.run(main())

github-actions · 2026-02-06T12:46:44Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1624

codecov · 2026-02-06T12:54:10Z

Codecov Report

❌ Patch coverage is 82.97872% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/actions/llm/utils.py	73.33%	4 Missing ⚠️
nemoguardrails/rails/llm/llmrails.py	60.00%	2 Missing ⚠️
nemoguardrails/streaming.py	90.90%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

…apture

greptile-apps · 2026-02-06T13:02:19Z

Greptile Overview

Greptile Summary

This PR refactors streaming metadata handling by removing redundant stream_usage=True parameter (OpenAI enables it automatically, ChatNVIDIA ignores it) and fixing metadata capture during streaming.

Key Changes:

Fixed _stream_llm_call to properly extract both response_metadata and usage_metadata from streaming chunks (previously only captured response_metadata which is empty during streaming)
Added metadata accumulation across chunks to handle OpenAI's pattern of sending response_metadata and usage_metadata in separate final chunks
Renamed include_generation_metadata → include_metadata with graceful deprecation (warning + removal planned for 0.22.0)
Removed redundant stream_usage=True from model initialization and defensive kwargs.pop("stream_usage", None) cleanup code
Updated all internal references from generation_info → metadata across streaming handler, runnable rails, and tests
Fixed END_OF_STREAM handling to set text to empty string instead of keeping the sentinel value
Added comprehensive test coverage for metadata accumulation behavior

The refactor is well-designed with proper backward compatibility through deprecation warnings, comprehensive test coverage (33 passing streaming tests, 173 runnable rails tests), and clear documentation updates.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The refactoring is well-executed with comprehensive test coverage (33 streaming tests, 173 runnable rails tests all passing), proper backward compatibility through deprecation warnings, clear documentation, and focused changes that address a specific bug (missing usage_metadata during streaming). The removal of redundant stream_usage parameter is justified and safe.
No files require special attention

Important Files Changed

Filename	Overview
nemoguardrails/actions/llm/utils.py	Fixed `_stream_llm_call` to properly extract and accumulate both `response_metadata` and `usage_metadata` from chunks using new `_extract_chunk_metadata` helper
nemoguardrails/streaming.py	Renamed `include_generation_metadata` to `include_metadata` with deprecation warning, renamed `generation_info` to `metadata`, fixed END_OF_STREAM handling to set text to empty string
nemoguardrails/rails/llm/llmrails.py	Removed redundant `stream_usage=True` from `_prepare_model_kwargs`, updated `stream_async` to use `include_metadata` parameter with deprecation warning for old parameter
nemoguardrails/integrations/langchain/runnable_rails.py	Updated to use `include_metadata` instead of `include_generation_metadata`, renamed `generation_info` to `metadata` in chunk processing, added empty string filtering
tests/test_streaming_handler.py	Updated tests to use `metadata` instead of `generation_info`, added new test `test_metadata_accumulation_across_chunks` to verify metadata accumulation behavior

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant StreamingHandler
    participant _stream_llm_call
    participant LLM
    
    User->>LLMRails: stream_async(messages, include_metadata=True)
    LLMRails->>StreamingHandler: new StreamingHandler(include_metadata=True)
    LLMRails->>LLMRails: generate_async(streaming_handler)
    LLMRails->>_stream_llm_call: call with handler
    
    loop For each chunk from LLM
        _stream_llm_call->>LLM: llm.astream(messages)
        LLM-->>_stream_llm_call: chunk (with content)
        _stream_llm_call->>_stream_llm_call: _extract_chunk_metadata(chunk)
        Note over _stream_llm_call: Extract response_metadata<br/>and usage_metadata if present
        _stream_llm_call->>_stream_llm_call: accumulated_metadata.update(chunk_metadata)
        _stream_llm_call->>StreamingHandler: push_chunk(content, chunk_metadata)
        StreamingHandler->>StreamingHandler: current_metadata.update(metadata)
        StreamingHandler->>StreamingHandler: queue.put({"text": content, "metadata": current_metadata})
        StreamingHandler-->>User: yield {"text": content, "metadata": {...}}
    end
    
    _stream_llm_call->>_stream_llm_call: llm_response_metadata_var.set(accumulated_metadata)
    _stream_llm_call->>StreamingHandler: finish()
    StreamingHandler->>StreamingHandler: push_chunk(END_OF_STREAM)
    Note over StreamingHandler: END_OF_STREAM converted to<br/>{"text": "", "metadata": {...}}
    StreamingHandler-->>User: yield final chunk with all metadata

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

refactor(streaming): remove stream_usage and fix streaming metadata c…

12411c8

…apture

Pouyanpi force-pushed the refactor/streaming-metadata-capture branch from 7565f67 to 12411c8 Compare February 6, 2026 12:54

Pouyanpi marked this pull request as ready for review February 6, 2026 12:58

greptile-apps bot reviewed Feb 6, 2026

View reviewed changes

Pouyanpi self-assigned this Feb 6, 2026

Pouyanpi added the enhancement New feature or request label Feb 6, 2026

Pouyanpi added this to the v0.21 milestone Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(streaming): remove stream_usage and fix streaming metadata capture#1624

refactor(streaming): remove stream_usage and fix streaming metadata capture#1624
Pouyanpi wants to merge 1 commit intodevelopfrom
refactor/streaming-metadata-capture

Pouyanpi commented Feb 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

codecov bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 6, 2026

Confidence Score: 5/5

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pouyanpi commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Feb 6, 2026

Documentation preview

Uh oh!

codecov bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pouyanpi commented Feb 6, 2026 •

edited

Loading

codecov bot commented Feb 6, 2026 •

edited

Loading