Skip to content

docs: enhance Model Serving Management Guide with comprehensive documentation#1414

Draft
ChenYi015 wants to merge 1 commit intokubeflow:masterfrom
ChenYi015:doc/update-serving-docs
Draft

docs: enhance Model Serving Management Guide with comprehensive documentation#1414
ChenYi015 wants to merge 1 commit intokubeflow:masterfrom
ChenYi015:doc/update-serving-docs

Conversation

@ChenYi015
Copy link
Member

Purpose of this PR

This PR significantly enhances the Model Serving Management Guide (docs/serving/index.md) to provide users with a comprehensive reference for deploying, managing, and monitoring inference services using the Arena CLI.

Proposed changes:

  • Added an overview section explaining Arena's model serving capabilities
  • Documented all supported serving frameworks: TensorFlow Serving, NVIDIA Triton, KServe, KFServing, TensorRT, Custom Serving, Seldon Core, and Distributed Serving
  • Included quick start guide with deployment example commands
  • Added detailed workflow examples for common patterns (simple serving, multi-version deployment with traffic splitting, GPU serving)
  • Added troubleshooting section covering model loading issues, out-of-memory errors, and inference timeouts
  • Added "Next Steps" section linking to related guides
  • Added "See Also" section with references to CLI, Training, Model Management, and Monitoring guides
  • Improved navigation with comprehensive links to all serving-related documentation

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

The original serving documentation was minimal (~50 lines) and did not provide users with adequate guidance on model serving operations. This enhancement makes the documentation more user-friendly and helps users understand:

  • What model serving frameworks Arena supports
  • How to deploy models for inference
  • How to manage model versions and traffic routing
  • How to troubleshoot common serving issues

- Add overview section explaining Arena model serving capabilities
- Document all serving frameworks (TensorFlow, Triton, KServe, KFServing, etc.)
- Include workflow examples for common serving patterns
- Add troubleshooting section with common issues and solutions
- Add next steps and related resources sections

Signed-off-by: Yi Chen <github@chenyicn.net>
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from chenyi015. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant