feat(trainer): add structured and configurable logging support by anencore94 · Pull Request #110 · kubeflow/sdk

anencore94 · 2025-09-27T15:21:05Z

What this PR does / why we need it:

This PR implements comprehensive structured and configurable logging support for the Kubeflow SDK as requested in Issue #85. The implementation provides:

NullHandler Pattern: Prevents logging noise by default while allowing users to override with their own configuration
Configurable Logging: Support for console, detailed, and JSON output formats
Structured Logging: JSON formatter for log aggregation systems (ELK stack, Fluentd, etc.)
Environment-based Configuration: Configure logging via environment variables (now exposed in public API)
SDK Integration: Added debug logging to TrainerClient for better observability

The logging system ensures that the SDK is quiet by default (using NullHandler) but provides rich logging capabilities when users explicitly configure logging.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #85

Recent Updates

Latest Improvements (Commit: `4711bf1`)

Exported configure_from_env function: Now available in the public API for easier environment-based configuration
Enhanced test isolation: Added cleanup_logging pytest fixture with autouse to prevent test interference
Improved test reliability: Added propagation settings in SDK integration tests to ensure consistent behavior

Features Implemented

Core Logging Infrastructure

NullHandler Pattern: Prevents logging noise by default, users can override with their own configuration
Configurable Logging: Support for console, detailed, and JSON output formats
Structured Logging: JSON formatter for log aggregation systems (ELK stack, Fluentd, etc.)
Environment-based Configuration: Configure logging via environment variables with configure_from_env()

Key Components

kubeflow/trainer/logging/config.py: Centralized logging configuration
- get_logger(): Get properly named logger instances
- setup_logging(): Configure logging with various options
- configure_from_env(): Configure from environment variables (now exported)
kubeflow/trainer/logging/formatters.py: Custom formatters including JSON structured logging
kubeflow/trainer/logging/logging_test.py: Comprehensive test suite with proper isolation

Integration Points

SDK Integration: Added debug logging to TrainerClient for better observability
Package Integration: Exposed logging utilities through kubeflow.trainer.logging
NullHandler Setup: Configured at package level to prevent logging noise

Issue #85 Requirements Fulfilled

✅ Consistent use of Python's logging library instead of print statements

All SDK operations now use proper Python logging
NullHandler pattern prevents unwanted output by default

✅ Support for different logging levels (DEBUG, INFO, WARNING, ERROR)

Full support for all standard Python logging levels
Configurable via setup_logging() function or environment variables

✅ Ability for users to configure log formatting and destinations

Console, detailed, and JSON output formats
File output support
Custom formatter support

✅ Clear and actionable log messages for key SDK operations

Debug messages in TrainerClient initialization
Backend selection logging
Job creation and ID logging

Testing

Comprehensive unit tests covering all logging functionality
Test isolation: Auto-cleanup fixture ensures tests don't interfere with each other
NullHandler pattern verification with proper isolation
SDK integration testing with real TrainerClient usage
Application integration examples with file and console logging
All tests pass with proper linting compliance

Usage Examples

Basic Usage

from kubeflow.trainer import TrainerClient, setup_logging

# Setup logging (optional - NullHandler prevents noise by default)
setup_logging(level="DEBUG", format_type="console")

# Use SDK - debug messages will appear if logging is configured
client = TrainerClient()

JSON Logging for Production

from kubeflow.trainer import setup_logging

setup_logging(level="INFO", format_type="json")
# Logs will be in JSON format suitable for log aggregation

Environment Configuration

from kubeflow.trainer import configure_from_env

# Configure from environment variables
configure_from_env()

export KUBEFLOW_LOG_LEVEL=DEBUG
export KUBEFLOW_LOG_FORMAT=json
export KUBEFLOW_LOG_FILE=/var/log/kubeflow.log

Code Review

A comprehensive code review was performed with the following assessment:

Overall Grade: A- (Excellent)
No critical issues identified
Well-architected following Python logging best practices
Production-ready with proper test coverage
Minor enhancements identified for future iterations

Breaking Changes

None - this is a pure addition with backward compatibility.

Migration Guide

No migration required. Existing code will continue to work without changes.
Users can optionally configure logging for better observability.

Checklist:

Docs included if any changes are user facing

google-oss-prow · 2025-09-27T15:21:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coveralls · 2025-09-27T16:43:36Z

Pull Request Test Coverage Report for Build 20360630788

Details

218 of 232 (93.97%) changed or added relevant lines in 7 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+1.6%) to 68.186%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
kubeflow/trainer/logging/config.py	18	22	81.82%
kubeflow/trainer/logging/logging_test.py	180	184	97.83%
kubeflow/trainer/api/trainer_client.py	4	10	40.0%

Files with Coverage Reduction	New Missed Lines	%
kubeflow/trainer/api/trainer_client.py	1	65.31%

Totals
Change from base Build 20310674450:	1.6%
Covered Lines:	2722
Relevant Lines:	3992

💛 - Coveralls

andreyvelich

Thank you for this @anencore94!
/assign @kubeflow/kubeflow-sdk-team

andreyvelich · 2025-10-16T19:18:09Z

kubeflow/trainer/__init__.py

+    "get_logger",
+    "setup_logging",


shall we move this to a separate package (e.g. kubeflow/logging)

I agree with it will be better for the long-term purpose, how about migrating in another PR ?
However, I could apply that feature in this PR too. Feel free to give your feedback!

andreyvelich · 2025-10-16T19:20:34Z

kubeflow/trainer/api/trainer_client.py

+        )
+
+        job_id = self.backend.train(runtime=runtime, initializer=initializer, trainer=trainer)
+        logger.debug("Successfully created TrainJob with ID: %s", job_id)


Can you move all of the debug statements out of backend, and place them into trainer_client.py

sdk/kubeflow/trainer/backends/kubernetes/backend.py

Lines 254 to 256 in 9642b41

logger.debug(

f"{constants.TRAINJOB_KIND} {self.namespace}/{train_job_name} has been created"

)

@szaher @kramaranya Does it sound good ?

I agree with you!

andreyvelich · 2025-10-16T19:21:22Z

kubeflow/trainer/logging/config.py

+
+
+def setup_logging(
+    level: Union[str, int] = "INFO",


As we discussed here, do we need to have INFO level for our logger ? #85 (comment)

Let me double-check your word!
Did you mean that SDK will only have debug() logs, so there could be meaningless to introduce logger level, right ?

kubeflow/trainer/logging/config.py

andreyvelich · 2025-10-16T19:26:35Z

kubeflow/trainer/logging/logging_test.py

+class TestLoggingConfig:
+    """Test logging configuration functionality."""
+
+    def test_get_logger(self):


Please refactor these test cases, similar to this:

sdk/kubeflow/trainer/backends/kubernetes/backend_test.py

Line 573 in 9642b41

@pytest.mark.parametrize(

andreyvelich · 2025-10-16T19:28:13Z

kubeflow/trainer/api/trainer_client.py

            ValueError: Invalid backend configuration.

        """
+        logger.debug("Initializing TrainerClient with backend_config=%s", backend_config)


How you propagate your loggers to the clients ? Should it be part of the TrainerClient() property ?

I just thought that users to use logger like this pattern in the description.

from kubeflow.trainer import TrainerClient, setup_logging # Setup logging (optional - NullHandler prevents noise by default) setup_logging(level="DEBUG", format_type="console") # Use SDK - debug messages will appear if logging is configured client = TrainerClient()

Log records created by module-level loggers (e.g. logging.getLogger(name)) are attached to a named logger such as "kubeflow.trainer.api.trainer_client".

By default, these records propagate up through the logger name hierarchy ("kubeflow.trainer.api.trainer_client" -> "kubeflow.trainer.api" -> "kubeflow.trainer" -> "kubeflow" -> root) until they reach a logger that has handlers attached.
The handlers and levels configured on the "kubeflow" and root loggers here, therefore control how all Kubeflow SDK log records are ultimately emitted.

anencore94 · 2025-12-08T15:12:22Z

Thanks for the review! Sorry for the delay, I'll follow up this on this week 🙏

This PR implements comprehensive logging support for the Kubeflow SDK as requested in Issue kubeflow#85. - **NullHandler Pattern**: Prevents logging noise by default, users can override with their own configuration - **Configurable Logging**: Support for console, detailed, and JSON output formats - **Structured Logging**: JSON formatter for log aggregation systems (ELK stack, Fluentd, etc.) - **Environment-based Configuration**: Configure logging via environment variables - `kubeflow/trainer/logging/config.py`: Centralized logging configuration - `kubeflow/trainer/logging/formatters.py`: Custom formatters including JSON structured logging - `kubeflow/trainer/logging/logging_test.py`: Comprehensive test suite (14 tests) - **SDK Integration**: Added debug logging to TrainerClient for better observability - **Package Integration**: Exposed logging utilities through kubeflow.trainer.logging - **NullHandler Setup**: Configured at package level to prevent logging noise ✅ **Consistent use of Python's logging library instead of print statements** - All SDK operations now use proper Python logging - NullHandler pattern prevents unwanted output by default ✅ **Support for different logging levels (DEBUG, INFO, WARNING, ERROR)** - Full support for all standard Python logging levels - Configurable via setup_logging() function or environment variables ✅ **Ability for users to configure log formatting and destinations** - Console, detailed, and JSON output formats - File output support - Custom formatter support ✅ **Clear and actionable log messages for key SDK operations** - Debug messages in TrainerClient initialization - Backend selection logging - Job creation and ID logging - **14 comprehensive unit tests** covering all logging functionality - **NullHandler pattern verification** with proper isolation - **SDK integration testing** with real TrainerClient usage - **Application integration examples** with file and console logging - **All tests pass** with proper linting compliance ```python from kubeflow.trainer import TrainerClient, setup_logging setup_logging(level="DEBUG", format_type="console") client = TrainerClient() ``` ```python setup_logging(level="INFO", format_type="json") ``` ```bash export KUBEFLOW_LOG_LEVEL=DEBUG export KUBEFLOW_LOG_FORMAT=json ``` None - this is a pure addition with backward compatibility. No migration required. Existing code will continue to work without changes. Users can optionally configure logging for better observability. Resolves kubeflow#85 Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>

- Export configure_from_env function in logging module's public API to enable environment-based logging configuration - Add cleanup_logging fixture to ensure proper test isolation by storing and restoring logging handlers between tests - Add propagation settings in test_sdk_integration_with_nullhandler to ensure SDK logging works correctly when user configures logging Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>

google-oss-prow bot requested review from kramaranya and szaher September 27, 2025 15:21

google-oss-prow bot added the size/XL label Sep 27, 2025

anencore94 force-pushed the feat/logging-system-v2 branch from 149c8bc to 56896c3 Compare September 27, 2025 15:23

anencore94 changed the title ~~feat(logging): Implement structured and configurable logging support~~ feat: add structured and configurable logging support Sep 27, 2025

anencore94 marked this pull request as draft September 27, 2025 15:24

google-oss-prow bot added the do-not-merge/work-in-progress label Sep 27, 2025

anencore94 changed the title ~~feat: add structured and configurable logging support~~ feat(trainer): add structured and configurable logging support Sep 27, 2025

anencore94 marked this pull request as ready for review September 27, 2025 15:55

google-oss-prow bot removed the do-not-merge/work-in-progress label Sep 27, 2025

anencore94 force-pushed the feat/logging-system-v2 branch 3 times, most recently from 6f99044 to 9642b41 Compare September 27, 2025 16:41

andreyvelich reviewed Oct 16, 2025

View reviewed changes

andreyvelich mentioned this pull request Oct 20, 2025

feat: Add ContainerBackend with Docker and Podman #119

Merged

1 task

anencore94 force-pushed the feat/logging-system-v2 branch from 9642b41 to 38a15b7 Compare December 18, 2025 13:29

anencore94 force-pushed the feat/logging-system-v2 branch from 268ec58 to 564aac4 Compare December 18, 2025 13:48

anencore94 force-pushed the feat/logging-system-v2 branch from 4711bf1 to 2fab08c Compare December 19, 2025 05:16

	logger.debug(
	f"{constants.TRAINJOB_KIND} {self.namespace}/{train_job_name} has been created"
	)

Conversation

anencore94 commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Recent Updates

Latest Improvements (Commit: 4711bf1)

Features Implemented

Core Logging Infrastructure

Key Components

Integration Points

Issue #85 Requirements Fulfilled

Testing

Usage Examples

Basic Usage

JSON Logging for Production

Environment Configuration

Code Review

Breaking Changes

Migration Guide

Uh oh!

google-oss-prow bot commented Sep 27, 2025

Uh oh!

coveralls commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 20360630788

Details

💛 - Coveralls

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anencore94 commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anencore94 commented Sep 27, 2025 •

edited

Loading

Latest Improvements (Commit: `4711bf1`)

coveralls commented Sep 27, 2025 •

edited

Loading