Add LLM-D Dashboard and improve filtering UX#5
Add LLM-D Dashboard and improve filtering UX#5Harshith-umesh merged 6 commits intoopenshift-psap:mainfrom
Conversation
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
WalkthroughThe changes introduce support for an LLM-D (Large Language Model Disaggregated) benchmark dashboard alongside existing dashboards. A new Streamlit-based dashboard module is added with comprehensive filtering and performance visualization capabilities. The main dashboard is updated to conditionally render the LLMD dashboard with logo support and TP-aware filtering enhancements. Docker configuration and dataset processing are extended to support new models. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Dashboard as dashboard.py
participant LLMD as llmd_dashboard.py
participant Data as Data<br/>(CSV/Assets)
User->>Dashboard: Open dashboard
Dashboard->>Dashboard: Check LLMD_AVAILABLE flag
alt LLMD available
Dashboard->>Dashboard: Load logo via get_logo_base64()
Data-->>Dashboard: Logo (base64)
Dashboard->>Dashboard: Display view options including LLM-D
User->>Dashboard: Select LLM-D Dashboard view
Dashboard->>LLMD: render_llmd_dashboard()
LLMD->>Data: load_llmd_data()
Data-->>LLMD: LLMD benchmark DataFrame
LLMD->>LLMD: render_llmd_filters()
User->>LLMD: Select filters (accelerator, model, TP, etc.)
LLMD->>LLMD: Apply cascading filters
par Parallel Rendering
LLMD->>LLMD: render_performance_plots_section()
LLMD->>LLMD: render_rhaiis_comparison_section()
LLMD->>LLMD: render_runtime_configs_section()
LLMD->>LLMD: render_filtered_data_section()
end
LLMD-->>User: Display comprehensive dashboard
else LLMD not available
Dashboard->>Dashboard: Render standard RHAIIS dashboard
Dashboard-->>User: Display existing dashboard
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (10)
mlperf-data/original/generate_dataset_summaries.py (1)
165-224: Significant code duplication between dataset processing functions.The
process_llama31_405b()andprocess_mixtral_8x7b()functions are nearly identical, differing only in file paths and print messages. This duplication also exists with the earlierprocess_deepseek_r1()andprocess_llama2_70b()functions.Consider extracting a common function to reduce duplication:
def _process_pickle_dataset( dataset_name: str, input_path: str, output_path: str, column_variants: list[tuple[str, str]] = None ): """Generic function to process pickle-based datasets. Args: dataset_name: Display name for logging input_path: Path to input pickle file output_path: Path to output CSV file column_variants: List of (input_col, output_col) tuples to try """ print(f"Processing {dataset_name}...") if not os.path.exists(input_path): print(f" ⚠️ Input file not found: {input_path}") return if column_variants is None: column_variants = [ ("tok_input_len", "tok_ref_output_len"), ("input_length", "output_length"), ("tok_input_length", "tok_output_length"), ] try: import pandas as pd with open(input_path, "rb") as f: data = pickle.load(f) # nosec B301 - Loading trusted MLCommons dataset if not isinstance(data, pd.DataFrame): print(f" ⚠️ Unexpected data type: {type(data)}") return # Try each column variant for input_col, output_col in column_variants: if input_col in data.columns and output_col in data.columns: summary = pd.DataFrame({ "input_length": data[input_col], "output_length": data[output_col], }) break else: print(f" ⚠️ Expected columns not found. Available: {list(data.columns)}") return # Save to CSV os.makedirs(os.path.dirname(output_path), exist_ok=True) summary.to_csv(output_path, index=False) print(f" ✅ Generated: {output_path} ({len(summary):,} samples)") except ImportError: print(" ⚠️ Requires pandas. Please run with: python3 -m pip install pandas") except Exception as e: print(f" ❌ Error: {e}") def process_llama31_405b(): """Process llama3.1-405b dataset (pickle format).""" _process_pickle_dataset( dataset_name="llama3.1-405b", input_path="mlperf-data/original/mlperf_llama3.1_405b_dataset_8313_processed_fp16_eval.pkl", output_path="mlperf-data/summaries/llama3-1-405b.csv", ) def process_mixtral_8x7b(): """Process mixtral-8x7b dataset (pickle format).""" _process_pickle_dataset( dataset_name="mixtral-8x7b", input_path="mlperf-data/original/09292024_mixtral_15k_mintoken2_v1.pkl", output_path="mlperf-data/summaries/mixtral-8x7b.csv", )This refactoring would:
- Eliminate ~100 lines of duplicated code
- Make it easier to add new pickle-based datasets
- Improve maintainability and consistency
Also applies to: 226-283
dashboard.py (4)
34-42: LLM-D import/availability handling is correct; consider logging over printThe conditional import with
LLMD_AVAILABLEmirrors the MLPerf pattern and safely degrades whenllmd_dashboardis missing. You might optionally swap thest.warningso the message surfaces in the UI or logs rather than stdout.
44-55: Logo loading and header rendering look good; minor caching improvement possible
get_logo_base64()correctly uses a relativeassets/RedHat-logo.pngpath and falls back gracefully when missing, and the header wiring renders the logo without impacting non-asset deployments. If you want to avoid rereading the file on every rerun, you could wrapget_logo_base64()in@st.cache_dataor cache the value inst.session_state, but it’s not critical given the tiny asset size.Also applies to: 3626-3637
3595-3624: View selection + LLMD routing are wired correctly; consider centralizing allowed viewsThe radio-based view selector, URL param sync, and early-exit blocks for MLPerf and LLM-D follow a clear pattern and won’t interfere with the RHAIIS flow (
st.stop()is correctly used). The URL restore logic also acceptsLLM-D Dashboardeven when MLPerf isn’t available, and the header safely falls back to RHAIIS when a stored view isn’t inview_options. To reduce drift, you could centralize the allowed-view list (used in both the URL guard and the radio options) in one place, but functionally this is solid.Also applies to: 3711-3713, 3747-3750
4058-4135: Cascading profile/accelerator/version/model filters and smart TP auto-selection look correctThe new ordering (1️⃣ Accelerator → 2️⃣ Profile → 3️⃣ Version → 4️⃣ Model → 5️⃣ TP) with profile-aware subsetting and preservation of prior selections via
filter_change_keyis coherent and avoids stale values. The “smart TP filtering” logic—trackingprevious_models_for_tp_trackingand auto-selecting all TP sizes when the selected model set changes or “Select All Models” is enabled, while otherwise preserving manual TP choices—implements the desired UX. Note that list equality is order-sensitive, so reordering the same models can trigger a “models_changed” refresh and reselect all TPs, which is acceptable but could be relaxed by comparing sets if you want fewer auto-resets. Reset/Clear correctly reset the tracking key.Also applies to: 4173-4176, 4185-4188, 4235-4302, 4310-4333
llmd_dashboard.py (5)
14-42: Profile helpers are consistent with the main dashboard; consider de-duplicating
assign_profileandclean_profile_namemirror the implementations indashboard.py, which keeps behavior consistent between RHAIIS and LLM-D views. To avoid future drift, you could move these helpers into a shared utility module and import them in both places.
107-723: LLM-D cascading filters are coherent; baseline-state and flags could be slightly tightenedThe multi-column filter cascade (Accelerator → Profile → Version → Model → TP → Replicas → Prefill/Decode pods) is consistent with the RHAIIS dashboard and correctly scopes each subsequent filter to prior selections. The “Select All Models” checkbox shared between Filters 4–8 and the logic that auto-selects all TP/replica/pod values when models are selected provide a good UX for quickly broadening slices.
A few small cleanups you may consider:
llmd_filters_were_clearedandllmd_reset_to_defaultsare set but never read; you can drop them, or wire them into default-selection logic if you want semantics identical to the RHAIIS filters.- The baseline block uses
if not st.session_state.get("llmd_baseline_accelerators"):; if you ever have a non-empty dataset with no accelerators (unlikely), this will keep reinitializing baselines on every run. Using a presence check (if "llmd_baseline_accelerators" not in st.session_state:) would avoid that edge case.Otherwise the state management (via
llmd_filter_change_keyand distinct widget keys) looks correct.
726-749: RHAIIS data loader for comparison is straightforward; consider mirroring robustness from main loader
load_rhaiis_datamirrors the minimal logic needed for comparison (CSV read, string trim, profile assignment). For consistency with the main dashboard you could optionally:
- Add
FileNotFoundErrorhandling with a clearer message, and- Reuse the same
assign_profile/error handling patterns as inload_dataindashboard.py.Not required, but it would keep behavior aligned.
1184-1285: LLM-D performance plots mirror RHAIIS behavior; minor lint/UX improvements possibleThe LLM-D performance plots section correctly:
- Builds a concise
run_identifierincluding accelerator, short model, version, TP, replicas, and pod counts, and- Shares the same x/y metric set and unit logic (
ttft_p95_sin seconds, ITL/TPOT in ms, request latency in seconds) as the main dashboard.Two optional tweaks:
caption_col1, caption_col2 = st.columns([3, 1])only usescaption_col2; rename the first to_to satisfy linters.- If you want behavior identical to the RHAIIS plots, you could use the
keep_expander_openpattern fromdashboard.py(viaon_change) so the expander stays open after axis changes.Neither affects correctness.
1669-1695: Top-level LLM-D dashboard orchestration is correct; unused filter_selections can be dropped
render_llmd_dashboardcleanly wires together loading, filtering, and the four sections, with early exits when data is missing or filters yield no runs. The capturedfilter_selectionsdict is currently unused; you can either prefix it with_or surface it in a small “Current filters” summary if you want it to be user-visible.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
assets/RedHat-logo.pngis excluded by!**/*.pngmlperf-data/summaries/llama3-1-405b.csvis excluded by!**/*.csvmlperf-data/summaries/mixtral-8x7b.csvis excluded by!**/*.csv
📒 Files selected for processing (5)
Dockerfile.openshift(1 hunks)dashboard.py(22 hunks)llmd_dashboard.py(1 hunks)mlperf-data/original/generate_dataset_summaries.py(2 hunks)mlperf_datacenter.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dashboard.py (1)
llmd_dashboard.py (2)
render_llmd_dashboard(1669-1695)clean_profile_name(34-41)
🪛 Ruff (0.14.5)
mlperf-data/original/generate_dataset_summaries.py
182-182: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
222-222: Do not catch blind exception: Exception
(BLE001)
241-241: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
281-281: Do not catch blind exception: Exception
(BLE001)
llmd_dashboard.py
98-98: Consider moving this statement to an else block
(TRY300)
102-102: Do not catch blind exception: Exception
(BLE001)
103-103: Use explicit conversion flag
Replace with conversion flag
(RUF010)
745-745: Consider moving this statement to an else block
(TRY300)
746-746: Do not catch blind exception: Exception
(BLE001)
747-747: Use explicit conversion flag
Replace with conversion flag
(RUF010)
816-816: String contains ambiguous ℹ (INFORMATION SOURCE). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
1287-1287: Unpacked variable caption_col1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1401-1401: String contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF001)
1683-1683: Unpacked variable filter_selections is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🔇 Additional comments (11)
Dockerfile.openshift (2)
19-21: LGTM - New LLMD dashboard assets staged correctly.The new files and assets directory are properly staged for the LLMD dashboard feature.
Also applies to: 26-27
31-31: chmod command is appropriate for OpenShift—no excessive permissions issues found.Analysis of the Dockerfile confirms that all files in
/app(Python scripts, CSV data files,mlperf-data/, andassets/) are application-level code and data meant to be readable by the container process. No sensitive files, configs with restricted permissions, or credentials are present in/app. Thechmod -R g=u /appcombined withchgrp -R 0 /appis standard OpenShift practice: it ensures the group (GID 0) has permissions matching the user, which is necessary because OpenShift runs containers with arbitrary UIDs (1001 here) but fixed GIDs. This is required for Streamlit framework access and OpenShift health probe functionality. The permission grant is not excessive—it's necessary and appropriate.mlperf-data/original/generate_dataset_summaries.py (1)
295-297: LGTM - New dataset processors properly integrated.The calls to
process_llama31_405b()andprocess_mixtral_8x7b()are correctly added to the main processing pipeline.mlperf_datacenter.py (1)
1288-1291: Both dataset summary files are present and contain data (llama3-1-405b.csv: 8314 lines, mixtral-8x7b.csv: 15001 lines). The new model mappings are valid.dashboard.py (4)
181-209: Adding TPOT P95 metric and unit handling is consistent with existing metricsThe new
"Time Per Output Token P95 (Token generation time)" → "tpot_p95"option and the(ms)suffix for bothitl_p95andtpot_p95keep axis semantics clear and aligned with how these columns are treated elsewhere (filtered-data table). This is a clean extension of the performance plots.
3358-3365: Grafana link column reordering is safe and improves usabilityReordering
grafana_metrics_linkto sit immediately afterTPby rebuilding the column list is safe (guarded on both columns existing) and makes the runtime table easier to scan. No behavioral risks here.
2675-2712: Cost analysis now groups by (model, accelerator, TP); behavior matches intended fixThe “first pass” SLO analysis groups by
["model", "accelerator", "TP"]and carriesTPinto the result dict, so subsequent cost calculations and rankings operate per model/accelerator/TP combination as desired. The newmodel_tp_label = f"{model} (TP={tp})"and its use as the y-axis in TTMT/CPMT bar charts ensures multiple TP configs for the same model don’t collapse into a single bar. This cleanly addresses the previous grouping ambiguity when multiple TP values were selected.Also applies to: 2741-2815, 2817-2820, 2827-2837, 2855-2865
4515-4517: Section separator before plots is harmless and improves layoutThe added horizontal rule before the main analysis sections (
render_performance_plots_section, Pareto, etc.) is purely cosmetic and doesn’t affect logic; it gives a clearer visual separation from the filters.llmd_dashboard.py (3)
751-1182: LLM-D vs RHAIIS comparison logic is sound and scoped to matching configurationsThe comparison section:
- Restricts LLM-D to single-replica runs (
replicas == 1),- Auto-aligns RHAIIS data to the same accelerators as the filtered LLM-D slice,
- Filters on a common
(model, RHAIIS version, profile)triple, and- Builds clear
config_labels (including TP and pod layout for LLM-D, TP for RHAIIS).The three-tab view (throughput vs concurrency, TTFT/TPOT latency, and detailed stats/table) uses those labels consistently and won’t accidentally cross-mix configurations. Assumptions about columns (
replicas,prefill_pod_count,decode_pod_count,tpot_median, etc.) are reasonable given the datasets you’re targeting.
1292-1414: Runtime configs section for LLM-D is consistent and correctly de-duplicatesDropping duplicates on
(model, accelerator, version, replicas, prefill_pod_count, decode_pod_count)and then rendering a sizedst.dataframeplus a selector to show rawruntime_argsis a good match to the RHAIIS view while exposing the additional disaggregation dimensions. Logic and column renaming look correct.
1425-1666: Filtered data table for LLM-D is comprehensive and aligned with metric semanticsThe filtered-data table:
- Inserts a
Row #index,- Exposes all LLM-D–specific topology columns (TP/DP/EP, replicas, prefill/decode pods, router_config), and
- Reuses the same latency/throughput column semantics and help text as the RHAIIS dashboard.
No functional issues here; the configuration should make the LLM-D dataset as explorable as the consolidated RHAIIS data.
Overview
This PR introduces a new LLM-D Dashboard for analyzing disaggregated LLM inference performance, along with key improvements to existing dashboards.
Key Changes
New LLM-D Dashboard
RHAIIS Dashboard Improvements
Bug Fixes
model/accelerator/TP)Files Modified
dashboard.py: Filter improvements, cost analysis fixllmd_dashboard.py: New file - Complete LLM-D dashboard implementationTesting: All
make ci-localchecks passSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.