Skip to content

Add LLM-D Dashboard and improve filtering UX#5

Merged
Harshith-umesh merged 6 commits intoopenshift-psap:mainfrom
Harshith-umesh:llm-d
Nov 20, 2025
Merged

Add LLM-D Dashboard and improve filtering UX#5
Harshith-umesh merged 6 commits intoopenshift-psap:mainfrom
Harshith-umesh:llm-d

Conversation

@Harshith-umesh
Copy link
Member

@Harshith-umesh Harshith-umesh commented Nov 20, 2025

Overview

This PR introduces a new LLM-D Dashboard for analyzing disaggregated LLM inference performance, along with key improvements to existing dashboards.


Key Changes

New LLM-D Dashboard

  • Complete dashboard for LLM-D benchmark results with disaggregated prefill/decode architecture
  • Smart cascading filters (Accelerator → ISL/OSL → Version → Model → TP → Replicas → Pod Counts)
  • Performance plots, runtime configs, and filtered data sections
  • Compare with RHAIIS section: Side-by-side comparison of LLM-D vs RHAIIS performance
    • Throughput, latency, and detailed metrics comparison
    • Auto-filters by matching accelerators

RHAIIS Dashboard Improvements

  • Reorganized filters with number emojis (1️⃣-5️⃣) for better UX
  • Smart TP filtering: selecting models auto-selects all their TP sizes
  • Added TPOT P95 metric to performance plots

Bug Fixes

  • Fixed cost analysis bug when multiple TP values selected (now groups by model/accelerator/TP)

Files Modified

  • dashboard.py: Filter improvements, cost analysis fix
  • llmd_dashboard.py: New file - Complete LLM-D dashboard implementation

Testing: All make ci-local checks pass

Summary by CodeRabbit

  • New Features
    • Added LLM-D Dashboard view for exploring LLMD benchmark results with interactive filtering
    • Added support for Llama 3.1 405B and Mixtral 8x7B models
    • Added TP (Tensor Parallelism) configuration tracking and smart filtering
    • Added RHAIIS data comparison capabilities with multi-tab performance analysis
    • Added logo rendering in dashboard headers

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

Walkthrough

The changes introduce support for an LLM-D (Large Language Model Disaggregated) benchmark dashboard alongside existing dashboards. A new Streamlit-based dashboard module is added with comprehensive filtering and performance visualization capabilities. The main dashboard is updated to conditionally render the LLMD dashboard with logo support and TP-aware filtering enhancements. Docker configuration and dataset processing are extended to support new models.

Changes

Cohort / File(s) Summary
Docker & Deployment
Dockerfile.openshift
Copies llmd_dashboard.py and llmd-dashboard.csv into the image; adds assets/ directory; adds chmod -R g=u /app permission adjustment step alongside existing configuration.
Main Dashboard Integration
dashboard.py
Adds optional LLMD dashboard rendering with conditional import via LLMD_AVAILABLE flag; introduces get_logo_base64() function to load Red Hat logo from assets; extends view options to include LLM-D Dashboard; adds TP (tensor parallelism) grouping to analyses with column reordering; enhances metric labels including tpot_p95; implements smart TP filtering with model selection tracking to preserve user preferences.
LLM-D Dashboard Implementation
llmd_dashboard.py
New module with data ingestion (load_llmd_data, load_rhaiis_data), profile utilities (assign_profile, clean_profile_name), cascading filter UI (render_llmd_filters), performance visualization (render_performance_plots_section), RHAIIS comparison section with three-tab performance view, runtime configuration display, detailed data table rendering, and main orchestration function (render_llmd_dashboard).
Dataset Processing
mlperf-data/original/generate_dataset_summaries.py
Adds process_llama31_405b() and process_mixtral_8x7b() functions to process model pickle data, extract token metrics, and write CSV summaries; updates main() to invoke both new processors.
Model Mappings
mlperf_datacenter.py
Extends dataset path mappings in load_dataset_for_model and alternate mapping block to include llama3-1-405b and mixtral-8x7b models; removes minor UI caption text.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Dashboard as dashboard.py
    participant LLMD as llmd_dashboard.py
    participant Data as Data<br/>(CSV/Assets)
    
    User->>Dashboard: Open dashboard
    Dashboard->>Dashboard: Check LLMD_AVAILABLE flag
    
    alt LLMD available
        Dashboard->>Dashboard: Load logo via get_logo_base64()
        Data-->>Dashboard: Logo (base64)
        Dashboard->>Dashboard: Display view options including LLM-D
        
        User->>Dashboard: Select LLM-D Dashboard view
        Dashboard->>LLMD: render_llmd_dashboard()
        LLMD->>Data: load_llmd_data()
        Data-->>LLMD: LLMD benchmark DataFrame
        
        LLMD->>LLMD: render_llmd_filters()
        User->>LLMD: Select filters (accelerator, model, TP, etc.)
        LLMD->>LLMD: Apply cascading filters
        
        par Parallel Rendering
            LLMD->>LLMD: render_performance_plots_section()
            LLMD->>LLMD: render_rhaiis_comparison_section()
            LLMD->>LLMD: render_runtime_configs_section()
            LLMD->>LLMD: render_filtered_data_section()
        end
        
        LLMD-->>User: Display comprehensive dashboard
    else LLMD not available
        Dashboard->>Dashboard: Render standard RHAIIS dashboard
        Dashboard-->>User: Display existing dashboard
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • dashboard.py: Complex state management for TP-aware filtering, logo rendering fallback logic, and conditional dashboard selection require careful validation of control flow and filter preservation across user interactions.
  • llmd_dashboard.py: Comprehensive new module with dense filtering logic, cascading UI state management, RHAIIS data alignment, and multi-tab visualization warrant thorough review of data transformation pipelines and edge case handling.
  • Integration points: Verify that LLMD_AVAILABLE flag handling gracefully falls back to standard dashboard when LLMD import fails; confirm TP propagation through result dictionaries and label generation is consistent.
  • Data pipeline: Ensure new dataset processing functions (process_llama31_405b, process_mixtral_8x7b) properly handle column name variations and missing input/output fields per existing error-handling patterns.

Possibly related PRs

Poem

🐰✨ A dashboard blooms with disaggregated might,
LLM-D benchmarks rendered pixel-bright,
With logos, filters, TP configurations flow,
Performance plots and RHAIIS to know,
From Dockerfile to Streamlit's delight! 🎨📊

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding a new LLM-D Dashboard and improving filtering UX, which are the primary features introduced in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (10)
mlperf-data/original/generate_dataset_summaries.py (1)

165-224: Significant code duplication between dataset processing functions.

The process_llama31_405b() and process_mixtral_8x7b() functions are nearly identical, differing only in file paths and print messages. This duplication also exists with the earlier process_deepseek_r1() and process_llama2_70b() functions.

Consider extracting a common function to reduce duplication:

def _process_pickle_dataset(
    dataset_name: str,
    input_path: str,
    output_path: str,
    column_variants: list[tuple[str, str]] = None
):
    """Generic function to process pickle-based datasets.
    
    Args:
        dataset_name: Display name for logging
        input_path: Path to input pickle file
        output_path: Path to output CSV file
        column_variants: List of (input_col, output_col) tuples to try
    """
    print(f"Processing {dataset_name}...")
    
    if not os.path.exists(input_path):
        print(f"  ⚠️  Input file not found: {input_path}")
        return
    
    if column_variants is None:
        column_variants = [
            ("tok_input_len", "tok_ref_output_len"),
            ("input_length", "output_length"),
            ("tok_input_length", "tok_output_length"),
        ]
    
    try:
        import pandas as pd
        
        with open(input_path, "rb") as f:
            data = pickle.load(f)  # nosec B301 - Loading trusted MLCommons dataset
        
        if not isinstance(data, pd.DataFrame):
            print(f"  ⚠️  Unexpected data type: {type(data)}")
            return
        
        # Try each column variant
        for input_col, output_col in column_variants:
            if input_col in data.columns and output_col in data.columns:
                summary = pd.DataFrame({
                    "input_length": data[input_col],
                    "output_length": data[output_col],
                })
                break
        else:
            print(f"  ⚠️  Expected columns not found. Available: {list(data.columns)}")
            return
        
        # Save to CSV
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        summary.to_csv(output_path, index=False)
        print(f"  ✅ Generated: {output_path} ({len(summary):,} samples)")
    
    except ImportError:
        print("  ⚠️  Requires pandas. Please run with: python3 -m pip install pandas")
    except Exception as e:
        print(f"  ❌ Error: {e}")


def process_llama31_405b():
    """Process llama3.1-405b dataset (pickle format)."""
    _process_pickle_dataset(
        dataset_name="llama3.1-405b",
        input_path="mlperf-data/original/mlperf_llama3.1_405b_dataset_8313_processed_fp16_eval.pkl",
        output_path="mlperf-data/summaries/llama3-1-405b.csv",
    )


def process_mixtral_8x7b():
    """Process mixtral-8x7b dataset (pickle format)."""
    _process_pickle_dataset(
        dataset_name="mixtral-8x7b",
        input_path="mlperf-data/original/09292024_mixtral_15k_mintoken2_v1.pkl",
        output_path="mlperf-data/summaries/mixtral-8x7b.csv",
    )

This refactoring would:

  • Eliminate ~100 lines of duplicated code
  • Make it easier to add new pickle-based datasets
  • Improve maintainability and consistency

Also applies to: 226-283

dashboard.py (4)

34-42: LLM-D import/availability handling is correct; consider logging over print

The conditional import with LLMD_AVAILABLE mirrors the MLPerf pattern and safely degrades when llmd_dashboard is missing. You might optionally swap the print warning for Streamlit logging or st.warning so the message surfaces in the UI or logs rather than stdout.


44-55: Logo loading and header rendering look good; minor caching improvement possible

get_logo_base64() correctly uses a relative assets/RedHat-logo.png path and falls back gracefully when missing, and the header wiring renders the logo without impacting non-asset deployments. If you want to avoid rereading the file on every rerun, you could wrap get_logo_base64() in @st.cache_data or cache the value in st.session_state, but it’s not critical given the tiny asset size.

Also applies to: 3626-3637


3595-3624: View selection + LLMD routing are wired correctly; consider centralizing allowed views

The radio-based view selector, URL param sync, and early-exit blocks for MLPerf and LLM-D follow a clear pattern and won’t interfere with the RHAIIS flow (st.stop() is correctly used). The URL restore logic also accepts LLM-D Dashboard even when MLPerf isn’t available, and the header safely falls back to RHAIIS when a stored view isn’t in view_options. To reduce drift, you could centralize the allowed-view list (used in both the URL guard and the radio options) in one place, but functionally this is solid.

Also applies to: 3711-3713, 3747-3750


4058-4135: Cascading profile/accelerator/version/model filters and smart TP auto-selection look correct

The new ordering (1️⃣ Accelerator → 2️⃣ Profile → 3️⃣ Version → 4️⃣ Model → 5️⃣ TP) with profile-aware subsetting and preservation of prior selections via filter_change_key is coherent and avoids stale values. The “smart TP filtering” logic—tracking previous_models_for_tp_tracking and auto-selecting all TP sizes when the selected model set changes or “Select All Models” is enabled, while otherwise preserving manual TP choices—implements the desired UX. Note that list equality is order-sensitive, so reordering the same models can trigger a “models_changed” refresh and reselect all TPs, which is acceptable but could be relaxed by comparing sets if you want fewer auto-resets. Reset/Clear correctly reset the tracking key.

Also applies to: 4173-4176, 4185-4188, 4235-4302, 4310-4333

llmd_dashboard.py (5)

14-42: Profile helpers are consistent with the main dashboard; consider de-duplicating

assign_profile and clean_profile_name mirror the implementations in dashboard.py, which keeps behavior consistent between RHAIIS and LLM-D views. To avoid future drift, you could move these helpers into a shared utility module and import them in both places.


107-723: LLM-D cascading filters are coherent; baseline-state and flags could be slightly tightened

The multi-column filter cascade (Accelerator → Profile → Version → Model → TP → Replicas → Prefill/Decode pods) is consistent with the RHAIIS dashboard and correctly scopes each subsequent filter to prior selections. The “Select All Models” checkbox shared between Filters 4–8 and the logic that auto-selects all TP/replica/pod values when models are selected provide a good UX for quickly broadening slices.

A few small cleanups you may consider:

  • llmd_filters_were_cleared and llmd_reset_to_defaults are set but never read; you can drop them, or wire them into default-selection logic if you want semantics identical to the RHAIIS filters.
  • The baseline block uses if not st.session_state.get("llmd_baseline_accelerators"):; if you ever have a non-empty dataset with no accelerators (unlikely), this will keep reinitializing baselines on every run. Using a presence check (if "llmd_baseline_accelerators" not in st.session_state:) would avoid that edge case.

Otherwise the state management (via llmd_filter_change_key and distinct widget keys) looks correct.


726-749: RHAIIS data loader for comparison is straightforward; consider mirroring robustness from main loader

load_rhaiis_data mirrors the minimal logic needed for comparison (CSV read, string trim, profile assignment). For consistency with the main dashboard you could optionally:

  • Add FileNotFoundError handling with a clearer message, and
  • Reuse the same assign_profile/error handling patterns as in load_data in dashboard.py.

Not required, but it would keep behavior aligned.


1184-1285: LLM-D performance plots mirror RHAIIS behavior; minor lint/UX improvements possible

The LLM-D performance plots section correctly:

  • Builds a concise run_identifier including accelerator, short model, version, TP, replicas, and pod counts, and
  • Shares the same x/y metric set and unit logic (ttft_p95_s in seconds, ITL/TPOT in ms, request latency in seconds) as the main dashboard.

Two optional tweaks:

  • caption_col1, caption_col2 = st.columns([3, 1]) only uses caption_col2; rename the first to _ to satisfy linters.
  • If you want behavior identical to the RHAIIS plots, you could use the keep_expander_open pattern from dashboard.py (via on_change) so the expander stays open after axis changes.

Neither affects correctness.


1669-1695: Top-level LLM-D dashboard orchestration is correct; unused filter_selections can be dropped

render_llmd_dashboard cleanly wires together loading, filtering, and the four sections, with early exits when data is missing or filters yield no runs. The captured filter_selections dict is currently unused; you can either prefix it with _ or surface it in a small “Current filters” summary if you want it to be user-visible.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9ba8e0 and 739352b.

⛔ Files ignored due to path filters (3)
  • assets/RedHat-logo.png is excluded by !**/*.png
  • mlperf-data/summaries/llama3-1-405b.csv is excluded by !**/*.csv
  • mlperf-data/summaries/mixtral-8x7b.csv is excluded by !**/*.csv
📒 Files selected for processing (5)
  • Dockerfile.openshift (1 hunks)
  • dashboard.py (22 hunks)
  • llmd_dashboard.py (1 hunks)
  • mlperf-data/original/generate_dataset_summaries.py (2 hunks)
  • mlperf_datacenter.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dashboard.py (1)
llmd_dashboard.py (2)
  • render_llmd_dashboard (1669-1695)
  • clean_profile_name (34-41)
🪛 Ruff (0.14.5)
mlperf-data/original/generate_dataset_summaries.py

182-182: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


222-222: Do not catch blind exception: Exception

(BLE001)


241-241: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


281-281: Do not catch blind exception: Exception

(BLE001)

llmd_dashboard.py

98-98: Consider moving this statement to an else block

(TRY300)


102-102: Do not catch blind exception: Exception

(BLE001)


103-103: Use explicit conversion flag

Replace with conversion flag

(RUF010)


745-745: Consider moving this statement to an else block

(TRY300)


746-746: Do not catch blind exception: Exception

(BLE001)


747-747: Use explicit conversion flag

Replace with conversion flag

(RUF010)


816-816: String contains ambiguous (INFORMATION SOURCE). Did you mean i (LATIN SMALL LETTER I)?

(RUF001)


1287-1287: Unpacked variable caption_col1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1401-1401: String contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)


1683-1683: Unpacked variable filter_selections is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🔇 Additional comments (11)
Dockerfile.openshift (2)

19-21: LGTM - New LLMD dashboard assets staged correctly.

The new files and assets directory are properly staged for the LLMD dashboard feature.

Also applies to: 26-27


31-31: chmod command is appropriate for OpenShift—no excessive permissions issues found.

Analysis of the Dockerfile confirms that all files in /app (Python scripts, CSV data files, mlperf-data/, and assets/) are application-level code and data meant to be readable by the container process. No sensitive files, configs with restricted permissions, or credentials are present in /app. The chmod -R g=u /app combined with chgrp -R 0 /app is standard OpenShift practice: it ensures the group (GID 0) has permissions matching the user, which is necessary because OpenShift runs containers with arbitrary UIDs (1001 here) but fixed GIDs. This is required for Streamlit framework access and OpenShift health probe functionality. The permission grant is not excessive—it's necessary and appropriate.

mlperf-data/original/generate_dataset_summaries.py (1)

295-297: LGTM - New dataset processors properly integrated.

The calls to process_llama31_405b() and process_mixtral_8x7b() are correctly added to the main processing pipeline.

mlperf_datacenter.py (1)

1288-1291: Both dataset summary files are present and contain data (llama3-1-405b.csv: 8314 lines, mixtral-8x7b.csv: 15001 lines). The new model mappings are valid.

dashboard.py (4)

181-209: Adding TPOT P95 metric and unit handling is consistent with existing metrics

The new "Time Per Output Token P95 (Token generation time)" → "tpot_p95" option and the (ms) suffix for both itl_p95 and tpot_p95 keep axis semantics clear and aligned with how these columns are treated elsewhere (filtered-data table). This is a clean extension of the performance plots.


3358-3365: Grafana link column reordering is safe and improves usability

Reordering grafana_metrics_link to sit immediately after TP by rebuilding the column list is safe (guarded on both columns existing) and makes the runtime table easier to scan. No behavioral risks here.


2675-2712: Cost analysis now groups by (model, accelerator, TP); behavior matches intended fix

The “first pass” SLO analysis groups by ["model", "accelerator", "TP"] and carries TP into the result dict, so subsequent cost calculations and rankings operate per model/accelerator/TP combination as desired. The new model_tp_label = f"{model} (TP={tp})" and its use as the y-axis in TTMT/CPMT bar charts ensures multiple TP configs for the same model don’t collapse into a single bar. This cleanly addresses the previous grouping ambiguity when multiple TP values were selected.

Also applies to: 2741-2815, 2817-2820, 2827-2837, 2855-2865


4515-4517: Section separator before plots is harmless and improves layout

The added horizontal rule before the main analysis sections (render_performance_plots_section, Pareto, etc.) is purely cosmetic and doesn’t affect logic; it gives a clearer visual separation from the filters.

llmd_dashboard.py (3)

751-1182: LLM-D vs RHAIIS comparison logic is sound and scoped to matching configurations

The comparison section:

  • Restricts LLM-D to single-replica runs (replicas == 1),
  • Auto-aligns RHAIIS data to the same accelerators as the filtered LLM-D slice,
  • Filters on a common (model, RHAIIS version, profile) triple, and
  • Builds clear config_labels (including TP and pod layout for LLM-D, TP for RHAIIS).

The three-tab view (throughput vs concurrency, TTFT/TPOT latency, and detailed stats/table) uses those labels consistently and won’t accidentally cross-mix configurations. Assumptions about columns (replicas, prefill_pod_count, decode_pod_count, tpot_median, etc.) are reasonable given the datasets you’re targeting.


1292-1414: Runtime configs section for LLM-D is consistent and correctly de-duplicates

Dropping duplicates on (model, accelerator, version, replicas, prefill_pod_count, decode_pod_count) and then rendering a sized st.dataframe plus a selector to show raw runtime_args is a good match to the RHAIIS view while exposing the additional disaggregation dimensions. Logic and column renaming look correct.


1425-1666: Filtered data table for LLM-D is comprehensive and aligned with metric semantics

The filtered-data table:

  • Inserts a Row # index,
  • Exposes all LLM-D–specific topology columns (TP/DP/EP, replicas, prefill/decode pods, router_config), and
  • Reuses the same latency/throughput column semantics and help text as the RHAIIS dashboard.

No functional issues here; the configuration should make the LLM-D dataset as explorable as the consolidated RHAIIS data.

@Harshith-umesh Harshith-umesh merged commit 67033ba into openshift-psap:main Nov 20, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant