Skip to content

Comments

Add Case-Phenotype Grid visualization for disease pages#1247

Open
kevinschaper wants to merge 13 commits intomainfrom
case-x-phenotype-visualization
Open

Add Case-Phenotype Grid visualization for disease pages#1247
kevinschaper wants to merge 13 commits intomainfrom
case-x-phenotype-visualization

Conversation

@kevinschaper
Copy link
Member

Summary

This PR adds a new visualization section to disease pages showing a grid of cases (columns) × phenotypes (rows), with phenotypes grouped by histopheno bins (body systems) that expand to show individual phenotypes. This enables researchers to quickly see which phenotypes are present or absent across cases associated with a disease.

  • Adds Case-Phenotype Grid section to disease pages (between HistoPheno and Associations)
  • Fetches cases via CaseToDiseaseAssociation and phenotypes via CaseToPhenotypicFeatureAssociation
  • Groups phenotypes by body system using object_closure matching against histopheno bin IDs
  • Supports expandable bin rows, cell click details modal, and horizontal scrolling for many cases

Architecture

Data Flow

Disease Page
    │
    ├─ 1. GET /association?category=CaseToDiseaseAssociation&object={diseaseId}
    │      → Returns all cases linked to this disease
    │
    └─ 2. GET /association?category=CaseToPhenotypicFeatureAssociation&subject={case1}&subject={case2}...
           → Returns all phenotypes for those cases with object_closure for bin grouping

Client-side processing:
    → Group phenotypes by histopheno bins using object_closure
    → Build case × phenotype matrix
    → Render expandable grid

Files Added

File Purpose
frontend/src/api/case-phenotype-types.ts TypeScript interfaces for matrix data
frontend/src/api/case-phenotype.ts API functions to fetch case-phenotype data
frontend/src/util/case-phenotype-matrix.ts Matrix building and bin grouping logic
frontend/src/components/CasePhenotypeGrid.vue Main grid component with expand/collapse
frontend/src/components/CasePhenotypeModal.vue Detail popup modal for cell clicks
frontend/src/pages/node/SectionCasePhenotypeGrid.vue Page section wrapper

Files Modified

File Change
frontend/src/pages/node/PageNode.vue Added SectionCasePhenotypeGrid import and usage

Implementation Details

1. Type Definitions (case-phenotype-types.ts)

Defines the core data structures for the visualization:

export interface CasePhenotypeMatrix {
  diseaseId: string;
  diseaseName?: string;
  cases: CaseEntity[];           // columns
  bins: HistoPhenoBin[];         // collapsed rows (body systems)
  phenotypes: CasePhenotype[];   // expanded rows (HP terms)
  cells: Map<string, CasePhenotypeCellData>; // key: "caseId:phenotypeId"
  totalCases: number;
  totalPhenotypes: number;
}

export interface CasePhenotypeCellData {
  present: boolean;
  negated?: boolean;
  onset?: string;
  frequency?: string;
  publications?: string[];
  publicationLinks?: ExpandedCurie[];
}

Includes the HISTOPHENO_BINS constant mapping UPHENO/HP IDs to human-readable labels (Nervous System, Eye, Skeletal System, etc.).

2. API Layer (case-phenotype.ts)

Orchestrates data fetching with batched requests to handle URL length limits:

export const getCasePhenotypeMatrix = async (
  diseaseId: string,
  diseaseName?: string,
): Promise<CasePhenotypeMatrix | null> => {
  // Fetch cases associated with the disease
  const casesResponse = await getCasesForDisease(diseaseId);
  if (casesResponse.items.length === 0) return null;

  // Extract case IDs and fetch phenotypes (batched in groups of 5)
  const caseIds = casesResponse.items.map((assoc) => assoc.subject);
  const phenotypeAssociations = await getCasePhenotypes(caseIds);

  // Build and return the matrix
  return buildMatrix(diseaseId, diseaseName, casesResponse.items, phenotypeAssociations);
};

Case phenotype requests are batched in groups of 5 due to URL length limits with unicode characters in case IDs.

3. Matrix Building (case-phenotype-matrix.ts)

Transforms flat associations into a structured matrix:

export function findBinForPhenotype(objectClosure: string[] | undefined): string {
  if (!objectClosure) return "OTHER";
  for (const id of objectClosure) {
    if (HISTOPHENO_BIN_IDS.has(id)) return id;
  }
  return "OTHER";
}

export function buildMatrix(
  diseaseId: string,
  diseaseName: string | undefined,
  caseAssociations: Association[],
  phenotypeAssociations: Association[],
): CasePhenotypeMatrix {
  // Build case map, phenotype map, cells map, and bin groupings
  // Returns structured matrix for visualization
}

Uses object_closure from phenotype associations to match against histopheno bin IDs.

4. Grid Component (CasePhenotypeGrid.vue)

Renders the interactive grid with expand/collapse functionality:

+----------------------------------------------------+
|  [Phenotype / System]    | 1 | 2 | 3 | 4 | ... | n |  <- Case numbers (hover for ID)
+------------------------+---+---+---+---+-----+-----+
| ▶ Nervous System (11)  | ● | ● | ○ | ● |     |     |  <- Collapsed bin row
| ▶ Eye (4)              | ○ | ● | ● | ○ |     |     |
| ▼ Skeletal System (2)  | ● | ○ | ○ | ● |     |     |  <- Expanded bin
|   Cerebral palsy       | ● |   |   | ● |     |     |  <- Individual phenotypes
|   Ataxia               | ● | ● |   |   |     |     |
+----------------------------------------------------+

Legend: ● = present (green), ○ = excluded/negated (red), ▶/▼ = expand/collapse

Features:

  • Sticky first column (phenotype names stay visible during horizontal scroll)
  • Tooltips on case headers showing full case ID/label
  • Tooltips on cells showing presence status, onset, frequency
  • Click handlers emitting events for modal display

5. Section Wrapper (SectionCasePhenotypeGrid.vue)

Handles loading states and conditionally renders the section:

const hideSection = computed(() => isSuccess.value && !matrix.value);
  • Only renders for disease nodes (node.category === 'biolink:Disease')
  • Hides automatically if no cases are found (returns null from API)
  • Shows loading/error states appropriately

Test Plan

  • Navigate to a disease page with phenopacket cases (e.g., MONDO:0007298 - spinocerebellar ataxia type 29)
  • Verify the "Case Phenotypes" section appears between HistoPheno and Associations
  • Verify bin rows show with phenotype counts and expand/collapse on click
  • Verify cell indicators show presence (●) and negated (○) correctly
  • Hover over case number headers to verify tooltip shows case ID
  • Hover over cells to verify tooltip shows phenotype details
  • Click on a cell to verify modal opens with full details
  • Navigate to a disease without cases (e.g., MONDO:0000001) and verify section doesn't render
  • Test horizontal scroll with diseases that have many cases

🤖 Generated with Claude Code

Adds a new grid visualization on disease pages showing phenotypes
observed in cases (from phenopackets) associated with that disease.

Features:
- Grid with cases as columns, phenotypes grouped by body system (bins) as rows
- Expandable bin rows to show individual phenotypes
- Tooltips showing case/phenotype details on hover (using vue-tippy)
- Click cells to open modal with full details including onset, frequency, publications
- Handles unicode characters in case IDs with batched API requests

New files:
- case-phenotype-types.ts: TypeScript interfaces and HISTOPHENO_BINS constant
- case-phenotype.ts: API functions for fetching case/phenotype data
- case-phenotype-matrix.ts: Utilities for transforming data into matrix format
- CasePhenotypeGrid.vue: Main grid component with expandable rows
- CasePhenotypeModal.vue: Detail modal for cell clicks
- SectionCasePhenotypeGrid.vue: Page section wrapper

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@netlify
Copy link

netlify bot commented Jan 17, 2026

Deploy Preview for monarch-app ready!

Name Link
🔨 Latest commit cd1f74b
🔍 Latest deploy log https://app.netlify.com/projects/monarch-app/deploys/698b7f6f3a99910008472cbc
😎 Deploy Preview https://deploy-preview-1247--monarch-app.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@claude
Copy link

claude bot commented Jan 17, 2026

Code Review: Case-Phenotype Grid Visualization

Thank you for this comprehensive PR! This is a well-structured feature that adds valuable functionality to disease pages. Below is my detailed review covering code quality, potential issues, performance, security, and test coverage.

Overall Assessment

Summary: This is high-quality code with good architectural decisions. The implementation is clean, well-documented, and follows Vue 3 best practices. A few minor improvements are suggested below.


✅ Code Quality & Best Practices

Strengths

  1. Excellent Code Organization

    • Clean separation of concerns: types, API layer, utilities, and components
    • Well-structured data flow from API → matrix builder → visualization
    • Good use of TypeScript interfaces for type safety
  2. Documentation

    • Comprehensive PR description with architecture diagrams
    • JSDoc comments on functions (following repo conventions)
    • Clear inline comments explaining complex logic
  3. Vue 3 Best Practices

    • Proper use of Composition API with <script setup>
    • Scoped SCSS styles
    • Reactive state management with ref and computed
    • Proper event emission patterns
  4. Code Style Consistency

    • Matches existing codebase patterns (compared with SectionVisualization.vue and associations.ts)
    • Follows the repository's style guidelines from CONTRIBUTING.md

Minor Suggestions

  1. Console.log Statements (frontend/src/components/CasePhenotypeGrid.vue:368, 430, 432)

    According to the style guide: "Use console.log strictly for temporary debugging during local development that should be removed before merging PRs."

    // Remove or convert to appropriate logging level:
    console.log("toggleBin clicked:", binId);  // Line 368
    console.log("handleCellClick:", caseId, phenotypeId);  // Line 430
    console.log("cellData:", cellData);  // Line 432

    Recommendation: Remove these or convert to console.debug if needed for production troubleshooting.

  2. Typo in Documentation (frontend/src/api/case-phenotype.ts:122)

    Comment says "Batches requests in groups of 50 case IDs" but the constant is set to 5.

    // Line 122: Update comment to match CASE_BATCH_SIZE = 5
    * Fetch CaseToPhenotypicFeatureAssociations for multiple cases Batches requests
    * in groups of 50 case IDs to avoid URL length limits
  3. Comment Formatting (frontend/src/api/case-phenotype-types.ts:7-9)

    Missing period at end of sentence in JSDoc comment.

    /**
     * Types for Case-Phenotype Grid visualization These are UI-specific types,
     * separate from the LinkML-generated model.ts
     */

    Should be: ...visualization. These are UI-specific types...


🐛 Potential Bugs & Issues

1. Map Serialization Issue (frontend/src/api/case-phenotype-types.ts:51)

The CasePhenotypeMatrix interface uses a Map<string, CasePhenotypeCellData> for the cells property. While this works in memory, it could cause issues if you ever need to serialize this data (e.g., for caching, state management, or debugging).

export interface CasePhenotypeMatrix {
  // ...
  cells: Map<string, CasePhenotypeCellData>; // Maps don't serialize to JSON well
}

Impact: Low (no immediate issue, but could cause problems later)

Recommendation: Document this limitation or consider using Record<string, CasePhenotypeCellData> instead. Maps are fine for now since the data isn't serialized.

2. Missing Null Check (frontend/src/util/case-phenotype-matrix.ts:1021)

While unlikely, there's a non-null assertion that could theoretically fail:

binPhenotypes.get(binId)!.add(phenotypeId);  // Line 1021

Impact: Very Low (the check on line 1019 ensures this exists)

Recommendation: Code is actually safe due to the preceding if block, but for extra defensive programming, you could use optional chaining: binPhenotypes.get(binId)?.add(phenotypeId).

3. Hardcoded Limit (frontend/src/api/case-phenotype.ts:113)

The limit is hardcoded to 500 for case associations. If a disease has more than 500 cases, some would be silently excluded.

const params = {
  category: "biolink:CaseToDiseaseAssociation",
  object: diseaseId,
  limit: 500,  // Hardcoded limit
};

Impact: Low to Medium (depends on whether diseases with >500 cases exist in your data)

Recommendation:

  • Add pagination logic similar to getCasePhenotypes (lines 144-160)
  • Or add a comment explaining why 500 is sufficient
  • Or add logging when cases are truncated

⚡ Performance Considerations

Strengths

  1. Good Batching Strategy

    • Case phenotype requests are batched (5 per request) to avoid URL length issues
    • Pagination implemented for phenotype associations to handle large datasets
  2. Efficient Data Structures

    • Uses Maps for O(1) cell lookups by key
    • Uses Sets for unique phenotype tracking
  3. Vue Performance Best Practices

    • Sticky column using CSS position: sticky (no JS scroll listeners)
    • Proper use of :key in v-for loops
    • Conditional rendering with v-if to hide empty sections

Recommendations

  1. Consider Virtual Scrolling for Large Datasets

    If some diseases have hundreds of cases or phenotypes, the DOM could get quite large. The current implementation renders all rows.

    Impact: Medium (only affects edge cases with many phenotypes/cases)

    Suggestion: Monitor performance in production. If needed, consider:

    • Virtual scrolling for the table body
    • Pagination for cases (showing 10-20 at a time)
    • Progressive disclosure (expand one bin at a time)
  2. Batching API Calls Could Be Optimized (frontend/src/api/case-phenotype.ts:138-160)

    Currently, batches are processed sequentially. For better performance, you could process batches in parallel:

    // Current: sequential batching
    for (let i = 0; i < caseIds.length; i += CASE_BATCH_SIZE) {
      // ... await request
    }
    
    // Alternative: parallel batching
    const batchPromises = [];
    for (let i = 0; i < caseIds.length; i += CASE_BATCH_SIZE) {
      batchPromises.push(fetchBatch(caseIds.slice(i, i + CASE_BATCH_SIZE)));
    }
    const results = await Promise.all(batchPromises);

    Impact: Medium (could significantly speed up loading for diseases with many cases)

    Trade-off: More concurrent requests to the backend. Assess backend capacity first.

  3. Sorting Bins (frontend/src/util/case-phenotype-matrix.ts:1055)

    Good decision to sort bins by count descending! This puts the most relevant systems at the top.


🔒 Security Considerations

Assessment: No Major Security Issues Found ✅

  1. XSS Protection

    • All user-generated content is properly escaped by Vue's template system
    • Tooltips use v-tooltip directive (verify this uses safe HTML rendering)
    • Modal uses AppNodeBadge component for IDs (good encapsulation)
  2. API Security

    • No direct user input passed to API calls without going through existing API layer
    • Follows existing patterns from other API files (good consistency)
    • URL construction uses existing apiUrl and request utilities
  3. Data Validation

    • Good null checks throughout
    • Proper handling of optional fields

Minor Notes

  1. HTML in Tooltips (frontend/src/components/CasePhenotypeGrid.vue:336-426)

    The tooltips contain HTML with inline styles. Verify that your v-tooltip directive properly sanitizes HTML to prevent XSS if tooltip content could ever come from untrusted sources.

    html += `<br><span style="color: ${statusColor}">${statusText}</span>`;

    Current Risk: Very Low (all content is from backend API, not user input)

    Recommendation: Ensure v-tooltip uses a safe HTML renderer (DOMPurify or similar).


🧪 Test Coverage

Concerns

No tests included in this PR. The repository uses Vitest for unit tests and Playwright for e2e tests (per CONTRIBUTING.md), but this PR adds:

  • 7 new files
  • 1,060 lines of code
  • 0 test files

Recommendations

  1. Unit Tests Needed

    Critical functions that should have unit tests:

    • findBinForPhenotype() - Test with various object_closure arrays
    • buildMatrix() - Test matrix building logic with mock data
    • getBinCellSummary() - Test summary calculations

    Example test structure:

    // frontend/src/util/case-phenotype-matrix.test.ts
    import { describe, it, expect } from 'vitest';
    import { findBinForPhenotype, buildMatrix } from './case-phenotype-matrix';
    
    describe('findBinForPhenotype', () => {
      it('should return correct bin ID when found in closure', () => {
        const closure = ['HP:0000001', 'UPHENO:0004523', 'HP:0000002'];
        expect(findBinForPhenotype(closure)).toBe('UPHENO:0004523');
      });
      
      it('should return OTHER when no bin found', () => {
        const closure = ['HP:0000001', 'HP:0000002'];
        expect(findBinForPhenotype(closure)).toBe('OTHER');
      });
      
      it('should return OTHER when closure is undefined', () => {
        expect(findBinForPhenotype(undefined)).toBe('OTHER');
      });
    });
  2. Component Tests

    Consider testing:

    • CasePhenotypeGrid.vue - Test expand/collapse, cell rendering, event emission
    • CasePhenotypeModal.vue - Test data display, modal open/close
  3. E2E Tests

    The test plan in the PR description is good, but should be automated:

    • Navigate to disease page with cases
    • Verify grid section appears
    • Test expand/collapse functionality
    • Test modal opening on cell click
    • Test that section is hidden for diseases without cases
  4. API Tests

    Mock API responses and test:

    • getCasePhenotypeMatrix() with various response scenarios
    • Batching logic in getCasePhenotypes()
    • Error handling

Priority

High Priority: At minimum, add unit tests for the utility functions in case-phenotype-matrix.ts before merging. These are pure functions that are easy to test and critical to the feature.

Medium Priority: Component tests for the grid and modal.

Lower Priority: E2E tests (can be added in a follow-up PR if needed).


📋 Additional Observations

Positive Notes

  1. Excellent PR Description - The architecture documentation with ASCII art diagrams is exceptional. This makes the feature easy to understand and maintain.

  2. Progressive Enhancement - The feature degrades gracefully (hides section) when no data is available.

  3. Accessibility Considerations - Tooltips, proper semantic HTML (table), and keyboard interaction support.

  4. Responsive Design - Mobile breakpoint with adjusted padding (line 600).

  5. Clean Integration - The feature integrates seamlessly into the existing page structure without modifying much existing code.

Questions

  1. Histopheno Bin Maintenance - The HISTOPHENO_BINS constant is hardcoded. How will this be maintained if new bins are added upstream? Consider:

    • Loading from API
    • Documenting update process
    • Adding a fallback for unknown bin IDs
  2. Performance Testing - Has this been tested with a disease that has many cases (50+) and many phenotypes (100+)? What's the practical upper limit?

  3. Empty Bins - What happens if a bin has no phenotypes for any case? Should it be hidden?


📝 Summary & Recommendations

Must Fix Before Merge

  1. ❗ Remove console.log statements (lines 368, 430, 432)
  2. ❗ Fix comment typo about batch size (line 122)

Should Fix Before Merge

  1. ⚠️ Add unit tests for utility functions (especially buildMatrix and findBinForPhenotype)
  2. ⚠️ Add pagination logic for cases or document the 500-case limit

Consider for Follow-up

  1. 💡 Implement parallel batching for API calls (performance improvement)
  2. 💡 Add component tests for the grid and modal
  3. 💡 Monitor performance with large datasets and consider virtual scrolling if needed
  4. 💡 Consider Record<> instead of Map<> for cells if serialization is ever needed

Verdict

Recommendation: Approve with minor changes

This is excellent work! The code is well-structured, follows best practices, and adds valuable functionality. The main concerns are:

  1. Remove console.log statements
  2. Add basic unit tests

Once those are addressed, this is ready to merge. The performance and test coverage items can be addressed in follow-up PRs if needed.

Great job on the comprehensive PR description and clean implementation! 🎉


🤖 Review generated with Claude Code

@codecov
Copy link

codecov bot commented Jan 17, 2026

Codecov Report

❌ Patch coverage is 68.82883% with 173 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.06%. Comparing base (6b15e4c) to head (eb4756f).

Files with missing lines Patch % Lines
...ckend/tests/integration/test_case_phenotype_api.py 22.72% 68 Missing ⚠️
...rch_py/implementations/solr/solr_implementation.py 11.42% 31 Missing ⚠️
backend/src/monarch_py/api/case_phenotype.py 25.00% 30 Missing ⚠️
...kend/tests/integration/test_case_phenotype_solr.py 35.89% 25 Missing ⚠️
...sts/integration/test_case_phenotype_performance.py 48.00% 13 Missing ⚠️
...ckend/src/monarch_py/utils/case_phenotype_utils.py 91.42% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
- Coverage   72.62%   72.06%   -0.57%     
==========================================
  Files          93      102       +9     
  Lines        3306     3859     +553     
==========================================
+ Hits         2401     2781     +380     
- Misses        905     1078     +173     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kevinschaper
Copy link
Member Author

Closes #1248

- Add check for Mondo subset markers (disease_grouping, ordo_group_of_disorders,
  rare_grouping) to identify organizational grouping classes vs specific diseases
- Skip API calls and hide grid section for grouping classes like "Ehlers-Danlos
  syndrome" while still showing for specific subtypes
- Update grid cell styling from text circles to styled squares

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 20, 2026

Code Review: Case-Phenotype Grid Visualization

This PR adds a well-architected feature for visualizing case-phenotype relationships on disease pages. The implementation is solid overall, with clear separation of concerns and good code organization.

Strengths

Architecture & Code Quality

  • Excellent separation of concerns: Type definitions, API layer, business logic, and UI components are cleanly separated
  • Well-documented with clear JSDoc comments and comprehensive PR description
  • Consistent with codebase patterns: Vue 3 Composition API, TypeScript, SCSS scoped styles
  • Smart optimization: Batching API requests (5 cases at a time) for URL length limits
  • Progressive disclosure: Expandable bin rows provide good UX

Type Safety

  • Strong TypeScript usage with well-defined interfaces
  • Clear type definitions in case-phenotype-types.ts

Issues Found

1. Debug Console Logging (HIGH PRIORITY)

Location: frontend/src/components/CasePhenotypeGrid.vue:364-365, 428-430

console.log("toggleBin clicked:", binId);
console.log("handleCellClick:", caseId, phenotypeId);
console.log("cellData:", cellData);

Issue: Per CONTRIBUTING.md lines 207-208, console.log should be removed before merging PRs.

Fix: Remove these statements or convert to console.debug.

2. Missing Pagination for Disease-to-Case Associations

Location: frontend/src/api/case-phenotype.ts:107-118

The getCasePhenotypes function implements pagination, but getCasesForDisease does not. If a disease has >500 cases, they'll be silently truncated.

Impact: Medium - rare but could cause incomplete data display.

Recommendation: Add pagination or document the 500-case limit.

3. Potential Race Condition in Data Fetching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:942-950

The useQuery composable has race condition protection, but rapid navigation could cause brief UI flashes.

Impact: Low - acceptable but could be improved with debouncing.

4. Edge Case: OTHER Bin Handling

Location: frontend/src/util/case-phenotype-matrix.ts:994-1008

The OTHER bin handling works but could be clearer by adding "OTHER": "Other" to HISTOPHENO_BINS constant.

Performance Considerations

Positive:

  • Batched requests effectively manage API constraints
  • Efficient Vue reactivity with computed properties
  • CSS position: sticky is performant

Considerations:

  • Large matrices (500 cases x 200 phenotypes = 100k cells) should be tested
  • Bin sorting is O(n log n) but negligible with typical data volumes

Verdict: Performance looks solid.

Security Concerns

No critical issues found:

  • XSS Protection: Using Vue template syntax correctly
  • No code injection risks
  • Tooltip content constructed safely
  • API calls use parameterized requests
  • Publication links use rel="noopener" correctly

Test Coverage

Current State: No test files found for new functionality

Recommendations:

  1. Unit tests for matrix building logic (buildMatrix, findBinForPhenotype, getBinCellSummary)
  2. Unit tests for API batching
  3. Component tests for CasePhenotypeGrid and CasePhenotypeModal
  4. E2E integration test for the full user flow

Minor Suggestions

  1. Convert inline comments to JSDoc format per style guidelines
  2. Extract magic number 500 to named constant MAX_RESULTS_PER_REQUEST
  3. Create typed key function for cells Map to prevent typos
  4. Add aria-expanded to bin rows for accessibility

Summary

Must Fix Before Merge:

  1. Remove debug console.log statements

Should Consider:
2. Add pagination to getCasesForDisease or document limit
3. Add unit tests for critical business logic

Nice to Have:
4. JSDoc comment style consistency
5. Named constants for magic numbers
6. aria-expanded for accessibility

Overall Assessment

This is high-quality work with excellent architecture and thoughtful design. With debug logging removed and consideration for pagination, this will be a valuable addition.

Recommendation: Approve pending removal of debug logs. Testing and pagination can follow in subsequent PRs.

Great job on this implementation! 🎉


Review conducted using Claude Code CLI

When viewing a disease with descendants, cases from descendant diseases
are now properly identified and displayed:
- Added tabs to toggle between direct cases only vs all cases (including descendants)
- Show source disease in tooltip for cases from descendant diseases
- Auto-select "All" tab when no direct cases exist

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 22, 2026

Code Review - PR #1247: Add Case-Phenotype Grid visualization

Summary

This is a well-structured PR that adds a sophisticated case-phenotype grid visualization to disease pages. The code quality is generally very good with clear separation of concerns, comprehensive documentation, and thoughtful user experience design.


✅ Strengths

Architecture & Code Organization

  • Excellent separation of concerns: Clear layering between API, data transformation, types, and UI components
  • Well-documented: Comprehensive JSDoc comments explaining complex logic
  • Consistent with existing patterns: Follows Vue 3 Composition API conventions
  • Proper TypeScript usage: Strong type safety throughout

User Experience

  • Progressive disclosure: Expandable bins provide clean, organized views
  • Informative tooltips: Rich contextual information on hover
  • Accessibility considerations: Proper ARIA attributes
  • Responsive design: Sticky columns and horizontal scrolling

🔍 Code Quality & Best Practices

Issues to Address

1. Console.log statements in production code

Location: frontend/src/components/CasePhenotypeGrid.vue:381, 446, 448

Remove debug console.log statements before merging.

2. Hardcoded color values

Location: Multiple locations in tooltip generation and CSS

Extract color values to SCSS variables for consistency and easier theming.

3. Magic numbers

Location: frontend/src/api/case-phenotype.ts:105

The CASE_BATCH_SIZE of 5 could use more detailed documentation explaining why this specific value.


🐛 Potential Bugs & Edge Cases

1. Race condition with tab switching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:1038-1041

Rapid tab switching could cause multiple API requests with responses arriving out of order. Consider adding request cancellation.

2. Missing null checks

Location: frontend/src/util/case-phenotype-matrix.ts:1158

Replace non-null assertion operator (!) with proper null checks.


⚡ Performance Considerations

Sequential API batching

Location: frontend/src/api/case-phenotype.ts:148-170

Consider using Promise.all() for parallel batching to significantly reduce load time when there are many cases.

O(n) lookups in rendering

Location: frontend/src/components/CasePhenotypeGrid.vue

The getPhenotypeLabel function does O(n) array.find() on every render. Memoize with a computed Map for better performance.


🧪 Test Coverage - HIGH PRIORITY

This PR adds 1,197 lines of code across 7 files with 0 test files.

Tests Needed:

  1. Unit tests for case-phenotype-matrix.ts utility functions
  2. API batching logic tests
  3. Component tests for grid expansion, cell clicks, tooltips
  4. E2E tests for disease page integration

Example:

// frontend/unit/case-phenotype-matrix.test.ts
test('findBinForPhenotype returns OTHER for unknown closure', () => {
  expect(findBinForPhenotype(['HP:0000001'])).toBe('OTHER');
  expect(findBinForPhenotype(undefined)).toBe('OTHER');
});

🎯 Priority Action Items

Must Fix Before Merge

  1. ❗ Remove console.log statements
  2. ❗ Add unit tests for case-phenotype-matrix.ts

Should Fix Before Merge

  1. ⚠️ Extract hardcoded colors to SCSS variables
  2. ⚠️ Add request cancellation for tab switching
  3. ⚠️ Replace non-null assertions with proper null checks
  4. ⚠️ Add basic E2E test for disease page integration

Consider for Future PR

  1. Optimize batching with parallel requests
  2. Memoize phenotype label lookups
  3. Add keyboard navigation for accessibility

Overall Assessment

Rating: 8/10 - Very Good

Substantial, well-architected feature with strong TypeScript/Vue 3 implementation. Main concerns are test coverage and minor code quality issues. Once critical items are addressed, this will be an excellent addition.

Great work on this feature! 🎉

…optimization

Implements the case-phenotype matrix backend per the TDD implementation plan:

- Add /v3/api/case-phenotype-matrix/{disease_id} endpoint
- Use Solr JOIN query to fetch all phenotype associations in single request
- Add facet queries for HistoPheno bin counts
- Support direct_only parameter for filtering to exact disease matches
- Add limit enforcement with clear error messages
- Include data models: CasePhenotypeMatrixResponse, CaseEntity, CasePhenotype,
  HistoPhenoBin, CasePhenotypeCellData
- Add HISTOPHENO_BIN_LABELS mapping for human-readable bin names
- Add comprehensive unit and integration tests

This reduces frontend API calls from 40+ to 1 for building the case-phenotype grid.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 23, 2026

Code Review: PR 1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive PR! The implementation is well-structured with good separation of concerns.

OVERALL ASSESSMENT

Strengths:

  • Excellent test coverage with 60+ tests
  • Clean architecture with clear separation between API, business logic, and presentation
  • Well-documented code
  • Good use of modern Python and TypeScript patterns

Critical Issues to Address:

  • XSS vulnerabilities in frontend tooltip/link rendering
  • Unreachable code/potential runtime error in bin assignment logic
  • Missing HTTP request timeout (DoS risk)
  • Race conditions in frontend data fetching

CRITICAL ISSUES (MUST FIX)

  1. Backend: Unreachable Code in _find_bin_for_phenotype
    File: backend/src/monarch_py/utils/case_phenotype_utils.py:875-899
    Line 899 will raise KeyError when intersection is empty. The return intersection.pop() contradicts the comment.
    Fix: Return None instead of intersection.pop()

  2. Backend: Missing HTTP Timeout
    File: backend/src/monarch_py/implementations/solr/solr_implementation.py:620
    Request could hang indefinitely on slow/unresponsive Solr.
    Fix: Add timeout parameter: requests.get(url, timeout=30)

  3. Frontend: XSS Vulnerability in Tooltip HTML
    File: frontend/src/components/CasePhenotypeGrid.vue:128-162
    Case labels/IDs are not HTML-escaped in tooltip strings.
    Fix: Escape all user-provided data in getCaseTooltip(), getBinTooltip(), and getCellTooltip()

  4. Frontend: XSS in External Links
    File: frontend/src/components/CasePhenotypeModal.vue:54-55
    Publication URLs are not validated and could use javascript: protocol.
    Fix: Validate URLs to only allow http/https protocols and add rel=noopener noreferrer

HIGH PRIORITY ISSUES

  1. Backend: Overly Restrictive MONDO ID Pattern
    File: backend/src/monarch_py/api/case_phenotype.py:23
    Requires exactly 7 digits, will reject valid IDs with different lengths.

  2. Frontend: Performance - O(n) Lookups in Render Loop
    File: frontend/src/components/CasePhenotypeGrid.vue:180-183
    getPhenotypeLabel() performs find() on entire array for every cell. With 100 cases × 500 phenotypes = 50,000 iterations.
    Fix: Use computed Map for O(1) lookups

  3. Frontend: Race Condition in Data Fetching
    File: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:204-230
    No cancellation of in-flight requests when component unmounts.
    Fix: Use AbortController

  4. Frontend: Inefficient Count Fetching
    File: frontend/src/api/case-phenotype.ts:210-216
    Makes two full API calls with limit=1000 just to get counts.

MEDIUM PRIORITY ISSUES

  • Backend: Silent exception swallowing without logging (solr_implementation.py:633-637)
  • Backend: Unused import urlencode (solr_implementation.py:604)
  • Frontend: Debug console.log statements (CasePhenotypeGrid.vue:166, 231, 233)
  • Frontend: Fragile error parsing with regex (case-phenotype.ts:105, 228, 240)
  • Frontend: Missing ARIA attributes for accessibility

TEST COVERAGE: 7/10

Good unit test coverage but integration tests need strengthening. Missing tests for: Solr connection failures/timeouts, concurrent requests, Unicode/special characters, security (injection, XSS), and boundary values.

SUMMARY CHECKLIST

Before merging:

  • Fix unreachable code in _find_bin_for_phenotype
  • Add HTTP timeout to Solr requests
  • Fix XSS vulnerabilities in tooltips and links
  • Fix race condition in data fetching
  • Optimize phenotype label lookups
  • Fix MONDO ID regex pattern
  • Remove debug console.log statements
  • Add ARIA attributes for accessibility

Great work on this feature! Once the critical security and bug fixes are applied, this will be ready to merge.

- Use node.association_counts to check case count before API calls
- Skip expensive matrix API for diseases with 0 or >100 cases
- Move grid to render after Cases section for better layout
- Remove unused getCaseCounts() function
- Reduce limit from 1000 to 100 cases for performance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 24, 2026

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! I've completed a thorough review. Overall, the implementation is well-structured with good separation of concerns.

Summary

  • High Severity: 7 issues (Security, data consistency, validation)
  • Medium Severity: 16 issues (Error handling, performance, logic)
  • Low Severity: 6 issues (Code quality, documentation)

🔴 High Priority - Must Fix Before Merge

1. Solr Injection Risk

File: backend/src/monarch_py/implementations/solr/solr_query_utils.py:374, 425

The disease_id is not escaped before insertion into Solr query. Apply escape() function.

2. XSS via HTML Injection

File: frontend/src/components/CasePhenotypeGrid.vue:128-139

User labels concatenated directly into HTML. Use proper escaping or DOM construction.

3. Dead Code

File: frontend/src/util/case-phenotype-matrix.ts:52-148

The buildMatrix() function is never used - remove it and findBinForPhenotype().

4. Race Condition in Tab Switching

File: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:210-215

Setting selectedTab triggers watcher which calls fetchMatrix() again. Add guard flag.

5. Missing Response Validation

File: frontend/src/api/case-phenotype.ts:118

Validate backend response structure before transforming to prevent crashes.

6. Console Logs in Production

File: frontend/src/components/CasePhenotypeGrid.vue:166, 231, 233

Remove debug console.log statements.

7. Silent Phenotype Filtering

File: backend/src/monarch_py/utils/case_phenotype_utils.py:103-104

Phenotypes without HistoPhenoBin matches are silently dropped. Log warnings or include in OTHER bin.

🟡 Medium Priority

  • Overly broad exception handling (catches all Exception types)
  • Python 3.8 type compatibility (use typing.Tuple not tuple)
  • Fragile error message regex parsing
  • Missing frequency field display
  • Publication links missing noreferrer
  • Test coverage gaps for edge cases
  • Integration test fragility

🟢 Low Priority

  • Unused variables
  • Documentation typos
  • Hardcoded field names
  • UI flicker during loading

✅ Strengths

  1. Clean architecture with good separation of concerns
  2. Excellent PR documentation with diagrams
  3. Comprehensive testing (with noted gaps)
  4. Backend query optimization is excellent
  5. Good UX with expandable bins and tooltips
  6. Strong type safety

Great work! Please address the 7 high-priority issues before merging. Happy to discuss any feedback.

@claude
Copy link

claude bot commented Jan 28, 2026

Pull Request Review: Case-Phenotype Grid Visualization

Overview

This PR adds a sophisticated visualization feature for disease pages that displays a matrix of cases (phenopacket patients) and their phenotypes, grouped by body systems. The implementation includes both backend and frontend components with comprehensive test coverage.

Code Quality & Best Practices

✅ Strengths

  1. Well-Documented Code: Excellent inline documentation throughout, especially in query building functions (e.g., build_case_phenotype_query with detailed explanation of JOIN logic)

  2. Strong Type Safety: Comprehensive TypeScript interfaces and Python datamodels with proper typing throughout

  3. Separation of Concerns: Clear architectural layering:

    • API endpoint validation (case_phenotype.py:20-48)
    • Query building (solr_query_utils.py:331-429)
    • Data transformation (case_phenotype_utils.py)
    • UI components (Grid, Modal, Section)
  4. Comprehensive Test Coverage: Multiple test levels:

    • Unit tests for matrix building, query construction, bin assignment
    • Integration tests for API and Solr queries
    • Performance tests
  5. Performance Optimization:

    • Uses association_counts to gate expensive API calls (SectionCasePhenotypeGrid.vue:116-121)
    • Single JOIN query replaces 40+ sequential requests
    • Configurable limits with clear error messages
  6. User Experience:

    • Progressive enhancement (loading states, error handling)
    • Automatic tab switching when no direct cases exist (SectionCasePhenotypeGrid.vue:211-223)
    • Informative tooltips and modal details

Potential Issues & Concerns

🔴 Critical

1. Hardcoded Row Limit May Cause Data Loss

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:334

def build_case_phenotype_query(
    disease_id: str,
    direct_only: bool,
    rows: int = 50000,  # ⚠️ Hard limit
)

Issue: If a disease has many cases with many phenotypes each, the phenotype query could return >50,000 rows and silently truncate results. For example, 100 cases × 500 phenotypes each = 50,000 associations (already at the limit).

Impact: Grid would show incomplete data without warning to the user.

Recommendation:

  • Add pagination or streaming for large result sets
  • Validate that numFound matches returned rows and warn if truncated
  • Calculate reasonable limit based on case count (e.g., rows = min(50000, cases * 1000))

2. Missing Input Sanitization in ID Normalization

Location: backend/src/monarch_py/api/case_phenotype.py:38-40

normalized = disease_id.strip().upper()
if "_" in normalized and ":" not in normalized:
    normalized = normalized.replace("_", ":", 1)  # Only replaces first occurrence

Issue: The replacement logic replace("_", ":", 1) only replaces the first underscore. This could create invalid IDs like:

  • Input: "MONDO_0007_078" → Output: "MONDO:0007_078" (invalid)
  • Input: "mondo__0007078" → Output: "MONDO:_0007078" (invalid)

Recommendation: Either:

  1. Remove the normalization entirely (rely on regex validation)
  2. Or use: normalized.replace("_", ":", 1).replace("_", "") to strip remaining underscores

🟡 High Priority

3. Potential Race Condition in Tab Switching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:211-223

if (isDirect && result && result.totalCases === 0) {
  if (caseCountFromAssociations.value > 0) {
    selectedTab.value = "all";
    matrix.value = await getCasePhenotypeMatrix(props.node.id || "", {
      direct: false,
      limit: MAX_CASES_LIMIT,
    });
    return;
  }
}

Issue: User could click tabs rapidly while auto-switching is happening, causing state inconsistency.

Recommendation: Add loading lock:

const isFetching = ref(false);

async function fetchMatrix() {
  if (isFetching.value) return;
  isFetching.value = true;
  try {
    // ... existing code
  } finally {
    isFetching.value = false;
  }
}

4. Unclear Error Messages for Bin Assignment Failure

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:101-104

bin_id = _find_bin_for_phenotype(object_closure, bin_ids)
if bin_id is None:
    continue  # Silently skips phenotype

Issue: Phenotypes without valid bins are silently dropped. Users won't know why expected phenotypes are missing from the grid.

Recommendation: Log a warning or track count of skipped phenotypes and include in response metadata.


5. Inconsistent Limit Defaults

The limit default varies across the codebase:

  • API endpoint: limit: int = 1000 (case_phenotype.py:99)
  • Solr implementation: limit: int = 200 (solr_implementation.py:579)
  • Frontend section: MAX_CASES_LIMIT = 100 (SectionCasePhenotypeGrid.vue:110)

Issue: This creates confusion about the actual enforced limit.

Recommendation: Define a single constant shared across backend/frontend or document why different layers have different defaults.


🟢 Low Priority / Nice-to-Have

6. Magic Numbers in Vue Component

Location: frontend/src/components/CasePhenotypeGrid.vue:56-68

The tooltip formatter hardcodes HTML structure and styling. Consider extracting to a separate formatter function for testability.


7. Potential Memory Issue with Large Maps

Location: frontend/src/api/case-phenotype.ts:168-177

const cells = new Map<string, CasePhenotypeCellData>();
for (const [key, cell] of Object.entries(data.cells)) {
  cells.set(key, {...});
}

For 100 cases × 100 phenotypes = 10,000 Map entries. This is fine for current limits, but document maximum expected size.


8. Missing Accessibility Features

  • Grid cells and bin rows need aria-label attributes for screen readers
  • Keyboard navigation not implemented for grid traversal
  • Modal doesn't trap focus when open

9. Test Coverage Gaps

While test coverage is generally good, I noticed:

  • No integration test for the auto-tab-switching behavior
  • Missing test for case limit exceeded error in frontend
  • No test for the MONDO ID normalization edge cases mentioned above

Performance Considerations

Good:

  • JOIN query optimization reduces network overhead significantly
  • Facet queries run in parallel with main query
  • Frontend uses association_counts to gate expensive calls
  • Computed properties minimize re-renders

⚠️ Watch:

  • Large Maps (10k+ entries) could impact memory on older devices
  • Re-fetching entire matrix on tab switch (could cache both tabs)
  • No debouncing on rapid navigation

Security Concerns

Low Risk: This feature is read-only and doesn't handle user-generated content.

Minor Concerns:

  1. ID Injection: The MONDO ID regex (^MONDO:\d{7}$) prevents injection, but the normalization logic (issue Host from GCP #2 above) happens before validation
  2. No Rate Limiting: Documented on API endpoint but consider adding client-side throttling for navigation events

Summary & Recommendations

This is a high-quality, well-architected feature with excellent documentation and test coverage. The main concerns are:

Must Fix Before Merge:

  1. Row limit truncation (Critical) - Add validation/pagination
  2. ID normalization logic (Critical) - Fix or remove

Should Fix Before Merge:

  1. Tab switching race condition (High) - Add loading lock
  2. Standardize limit defaults (High) - Document or unify

Consider for Follow-up PR:

  1. Accessibility improvements
  2. Error logging for skipped phenotypes
  3. Client-side caching of both tabs

Overall assessment: Approve with requested changes 🟢

The architectural decisions are sound, performance optimizations are effective, and the feature provides clear value. Addressing the critical issues will make this production-ready.

Great work on the comprehensive test coverage and detailed documentation! 🎉


🤖 Generated via Claude Code PR review

kevinschaper and others added 2 commits January 28, 2026 19:23
The API returns null when totalCases === 0, but the auto-switch logic
was checking for `result && result.totalCases === 0` which never matched.
Changed to `!result` so diseases with only indirect cases now correctly
auto-switch to the All tab on load.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Solr creates multiValued fields by default when the schema isn't pre-defined.
This causes validation errors when new scalar fields (like has_biological_sex)
are added to the Pydantic model but Solr returns them as lists.

Added normalize_solr_doc_for_model() utility that inspects model type hints
and converts single-element lists to scalars for fields expecting scalar types.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 29, 2026

PR Review: Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! I've conducted a thorough review covering code quality, security, performance, and test coverage.

🔴 CRITICAL ISSUES (Must Fix Before Merge)

1. Solr Injection Vulnerability

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:373-374, 425

The code directly injects disease_id into Solr queries without proper escaping. While the API validates the disease_id format with regex, defense-in-depth principles require sanitization at the query construction level.

Fix: Use the existing escape() utility from monarch_py.utils.utils

2. Missing Dependencies - Broken Imports

Location: Multiple frontend files

The following imports reference files that do not exist in this PR:

  • frontend/src/api/case-phenotype-types.ts imports from "./entity-grid/types"
  • frontend/src/components/CasePhenotypeGrid.vue imports EntityGrid components
  • frontend/src/pages/PageTestbed.vue imports EntityGrid components

Required files missing:

  • /frontend/src/api/entity-grid/types.ts
  • /frontend/src/components/EntityGrid/EntityGrid.vue
  • /frontend/src/components/EntityGrid/EntityGridModal.vue
  • /frontend/src/components/EntityGrid/entity-grid-utils.ts

TypeScript compilation will fail without these files.

🟡 HIGH PRIORITY ISSUES

3. Network Error Handling

Location: backend/src/monarch_py/implementations/solr/solr_implementation.py:782-784

Network failures propagate as generic exceptions. Wrap requests in try-except to provide meaningful error messages (503 for network errors, 502 for JSON decode errors).

4. TypeScript Type Safety

Location: frontend/src/components/CasePhenotypeGrid.vue:76

Using any type defeats TypeScript's purpose. Should use CellData | null

5. Uncaught Promise Rejection

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:218

The nested getCasePhenotypeMatrix call in auto-switch logic could throw but isn't properly wrapped in error handling.

🟢 MEDIUM PRIORITY ISSUES

6. Missing JSDoc Comments

Per frontend/CONTRIBUTING.md:193, use JSDoc comments instead of regular comments. Missing on computed properties and functions in multiple files.

7. Accessibility Issues

Location: frontend/src/components/CasePhenotypeModal.vue

  • Should use semantic HTML (dl, dt, dd) instead of divs
  • Publication links don't indicate they open in new tab
  • Status colors rely only on color (add icons or text indicators)

8. Vague Exception Messages

Location: backend/src/monarch_py/api/case_phenotype.py:136-142

Broad exception catching masks real errors (network, Solr) as "not found". Add logging and be more specific.

9. Unreachable Code

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:117-141

Fallback return statement is unreachable because the loop iterates all possible bin IDs.

📊 TEST COVERAGE ANALYSIS

Backend Tests: B+ Grade

✅ Strengths: Excellent unit test coverage (889 lines), good integration tests, parametrized tests
⚠️ Gaps:

  • No tests for 100 case limit mentioned in PR (tests use 200)
  • No tests for disease grouping/class filtering
  • No tests for phenotypes without object_closure (OTHER bin fallback)
  • Performance tests only cover small/empty diseases

Frontend Tests: F Grade

❌ CRITICAL GAP: No frontend tests exist for:

  • Utility functions (case-phenotype-matrix.ts)
  • API layer (case-phenotype.ts)
  • Vue components
  • No E2E tests for grid interactions

Recommendation: Add unit tests for core functions and E2E test for the complete user flow.

✅ POSITIVE ASPECTS

  1. Excellent architecture - Clear separation between API, query builders, and utilities
  2. Good type safety - TypeScript interfaces are well-defined
  3. Smart gating - Using association_counts to avoid expensive calls
  4. Performance optimization - Backend consolidates 40+ API calls into 1 Solr query
  5. Comprehensive PR description - Clear architecture diagrams
  6. Proper Vue 3 patterns - Using Composition API correctly

📝 RECOMMENDATIONS

Before Merge:

  1. Fix Solr injection vulnerability (add escape() calls)
  2. Add missing EntityGrid component files or refactor imports
  3. Fix TypeScript any type usage
  4. Improve error handling in async auto-switch logic
  5. Add JSDoc comments per style guide
  6. Improve accessibility (semantic HTML, ARIA labels)

Follow-up PRs:

  1. Add frontend unit and E2E tests
  2. Test edge cases (100+ cases, missing closure, disease grouping)
  3. Consider virtual scrolling for large grids
  4. Extract magic numbers to configuration
  5. Add performance monitoring

🎯 SUMMARY

This is a valuable feature with solid backend implementation. Main concerns:

  1. Security: Solr injection must be fixed
  2. Missing files: EntityGrid components must be added
  3. Frontend testing: No tests exist for frontend code
  4. Error handling: Several improvements needed

Once the critical issues are addressed, this will be a great addition to the Monarch App!

Phenotypes can belong to multiple body system bins (e.g., an eye
phenotype is also a head/neck phenotype). Previously, each phenotype
was assigned to only ONE bin based on priority order, causing bins
like Eye, Ear, and Connective Tissue to show counts but have no
phenotypes to display.

Changes:
- Add phenotype_ids field to HistoPhenoBin model
- Build phenotype_ids by checking each phenotype's closure against
  ALL bins, not just the first matching one
- Frontend now uses phenotype_ids directly from backend response

This allows the same phenotype to correctly appear in multiple bin
expansions (e.g., "Supranuclear gaze palsy" appears in both Eye and
Head/Neck bins).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 29, 2026

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! The case-phenotype matrix visualization is well-architected with excellent separation of concerns. The PR includes 889 lines of new tests across 7 test files, demonstrating strong testing discipline.

Overview

Architecture: ✅ Excellent

  • Clean separation: API → Solr implementation → utility functions
  • Well-structured frontend with proper Vue 3 Composition API usage
  • Comprehensive data models with LinkML schema definitions

Test Coverage: ✅ Strong

  • 7 test files (5 unit, 3 integration) covering core functionality
  • Performance benchmarks included
  • Total additions: 3,306 lines with 889 test lines (~27% test coverage)

Critical Issues 🔴

1. Security: Potential Solr Query Injection

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:371-375, 425

The disease_id parameter is directly interpolated into Solr queries without proper escaping. While the API layer validates with regex MONDO:\d{7}, the utility functions are reusable and could be called without validation.

Recommendation: Apply the escape() function consistently (as done in build_multi_entity_association_query() at line 156)

2. Frontend: Missing Component Dependencies

Location: frontend/src/components/CasePhenotypeGrid.vue:24-25

References components that don't exist in the codebase. This will prevent compilation.

Action Required: Verify these dependencies exist or provide implementations before merge.

3. XSS Risk: Unsanitized HTML in Tooltips

Location: frontend/src/components/CasePhenotypeGrid.vue:56-64

HTML is constructed from user data without sanitization. If backend returns malicious content, it could execute JavaScript when rendered.

Recommendation: Use text interpolation with structured DOM elements instead of HTML strings, or sanitize with DOMPurify.

High Priority Issues 🟠

4. Overly Broad Exception Handling

Location: backend/src/monarch_py/api/case_phenotype.py:138-142

Catches ALL exceptions and masks unexpected errors. Makes debugging difficult.

Fix: Replace with specific exception types and add logging.

5. Silent Data Loss: Phenotypes Without Bins

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:101-104

Phenotypes that don't match HistoPheno bins are silently excluded without logging.

Recommendation: Add logging and consider an "Uncategorized" bin.

6. Fragile Regex Error Parsing

Location: frontend/src/api/case-phenotype.ts:105-113

Regex assumes exact backend error format. Missing radix parameter on parseInt() and no NaN validation.

Fix: Add radix parameter, validate extracted numbers, and add logging for parse failures.

Medium Priority Issues 🟡

7. Performance: Redundant Entity Name Fetch

Location: backend/src/monarch_py/implementations/solr/solr_implementation.py:742-799

Makes an additional network call after already fetching case and phenotype data.

Solution: Extract disease name from case association docs.

8. Input Validation: Incomplete Normalization

Location: backend/src/monarch_py/api/case_phenotype.py:39-40

Only replaces FIRST underscore. Edge case: MONDO_0007_078 becomes MONDO:0007_078 (invalid).

9. Type Safety: Unsafe any Type

Location: frontend/src/components/CasePhenotypeGrid.vue:75-76

Using any type disables type checking. Should use proper type or unknown with narrowing.

10. Missing Logging Throughout

No logging for debugging or performance monitoring. Would be valuable for production troubleshooting.

Positive Highlights ✨

  1. Excellent Test Coverage: 889 lines across 7 test files with unit, integration, and performance tests
  2. Smart Query Optimization: Uses Solr JOIN queries to avoid N+1 pattern
  3. Proper Vue 3 Patterns: Clean Composition API usage with TypeScript
  4. Performance Guardrails: 1-100 case limit prevents expensive queries
  5. Comprehensive Documentation: LinkML schema definitions and detailed docstrings
  6. Custom Error Classes: CaseLimitExceededError provides clear UX feedback
  7. Defense in Depth: Multiple validation layers (API, Solr, client)

Test Coverage Summary

Type Files Lines Coverage
Unit Tests 5 602 Models, builders, queries, bins
Integration Tests 3 287 API endpoints, Solr, performance
Total 8 889 27% of PR additions

Strong areas: Core matrix building, model validation, API contract

Gap: Frontend components lack unit tests (Vue Test Utils recommended)

Recommendations for Merge

Must Fix Before Merge:

  1. ✅ Resolve missing EntityGrid component references (build blocker)
  2. ✅ Apply escape() to disease_id in Solr queries (security)
  3. ✅ Sanitize HTML in tooltip formatters (XSS prevention)
  4. ✅ Replace broad exception handlers with specific types

Should Fix Before Merge:

  1. Add logging for silent data loss scenarios
  2. Fix regex parsing fragility with proper validation
  3. Remove redundant entity name fetch

Can Address in Follow-up:

  1. Add frontend component unit tests
  2. Improve type safety (remove any types)
  3. Add performance monitoring instrumentation

Overall Assessment

This is a well-engineered feature with strong architectural foundations and comprehensive backend testing. The critical issues are fixable and primarily involve applying existing security patterns consistently and resolving missing dependencies.

Once the 4 critical issues are addressed, this PR will be in excellent shape for merge. Great work on the test coverage and documentation!


Reviewed by: Claude Sonnet 4.5

Files Reviewed: 26 files (15 backend, 9 frontend, 8 tests)

Review Date: 2026-01-29

Backend (Phase 1):
- New /entity-grid/{context_id} API endpoint with configurable traversals
- Grid configs for association categories and row grouping
- Grid groupings for histopheno row organization
- Entity grid utilities for column building and cell aggregation
- Extend model with GridColumnEntity and GridResponse types

Frontend (Phase 2):
- EntityGrid.vue: Generic configurable grid component with diagonal headers
- EntityGridModal.vue: Detail modal for cell inspection
- Entity grid API client with type-safe queries
- Grid utilities for column grouping and cell formatting

Playground (Phase 3):
- PageGridPlayground.vue for interactive grid testing
- Entity search, association category selection, row grouping options

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 30, 2026

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! This is a well-structured PR with excellent documentation and test coverage. However, I've identified several critical security and code quality issues that should be addressed before merging.

Critical Issues (Must Fix)

1. Solr Injection Vulnerability (Backend)

Files: backend/src/monarch_py/implementations/solr/solr_query_utils.py
Lines: 369-375, 422-428, 453-456

User-provided IDs are interpolated directly into Solr queries without escaping. Apply the escape() utility function to all user-provided IDs.

2. XSS Vulnerability via Tooltip HTML (Frontend)

Files: frontend/src/components/EntityGrid/EntityGrid.vue, CasePhenotypeGrid.vue
Lines: EntityGrid.vue: 184-312, CasePhenotypeGrid.vue: 58-66

Tooltip HTML is constructed from unsanitized API data. Use DOMPurify or replace with Vue template rendering.

3. Missing Pagination (Backend API)

Current API returns ALL results up to limit in single response. Add pagination with offset/limit parameters.

4. Broad Exception Catching (Backend)

File: backend/src/monarch_py/api/case_phenotype.py
Lines: 136-148

ALL exceptions converted to 404 errors. Catch specific exceptions instead.

5. TypeScript any Type (Frontend)

File: frontend/src/components/CasePhenotypeGrid.vue
Lines: 77-78

Replace any type with proper type definition.

High Priority Issues

6. No Accessibility Support

Missing keyboard navigation, ARIA attributes, and focus indicators throughout grid components.

7. Memory/Performance Issues

Backend default 50,000 rows, frontend no virtual scrolling for large grids.

8. No Rate Limiting

Public endpoints with no throttling for expensive JOIN queries.

Medium Priority

9. Code Duplication

Entity validation logic duplicated across files.

10. Inefficient Nested Loop

case_phenotype_utils.py lines 195-200: Use set intersection instead.

11. MONDO ID Validation

Regex only accepts 7 digits, should accept 1-7.

Strengths

  • Excellent test coverage (15 test files)
  • Comprehensive documentation
  • Well-structured code
  • Performance-conscious (Solr JOIN queries)
  • Good error handling

Risk Assessment

  • Security: HIGH (Injection, XSS)
  • Performance: MEDIUM (Large result sets)
  • Accessibility: MEDIUM (No keyboard nav)
  • Maintainability: LOW (Well-structured)

Action Plan

  1. Fix critical security issues (1-5)
  2. Add security tests
  3. Follow-up PR for accessibility/performance (6-8)

Overall solid implementation. Address security vulnerabilities before merge.

Uses a reverse JOIN query to filter column entities that have no
second-hop (row) associations. For ortholog-phenotype grids, this
excludes orthologs without phenotype data from the grid columns.

The JOIN query finds column associations where the column entity
appears as the context in at least one row association, reducing
network transfer and eliminating empty columns from the response.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 30, 2026

PR Review: Case-Phenotype Grid Visualization

Comprehensive review completed for PR #1247. This is impressive, well-architected work with excellent test coverage.


Strengths

Architecture & Design

  • Clean separation of concerns across API, data models, business logic, and UI
  • Generic grid framework supporting multiple types (case-phenotype, disease-phenotype, ortholog-phenotype)
  • Configuration-driven approach makes adding new grid types easy
  • Reusable EntityGrid.vue component

Code Quality

  • Strong type safety with TypeScript interfaces and Python dataclasses
  • Excellent documentation with clear docstrings and comments
  • Well-structured, focused utility functions
  • Proper input validation with regex patterns

Test Coverage

  • Comprehensive unit, integration, and performance tests
  • Edge cases covered: empty inputs, limits exceeded, invalid IDs, wrong entity types

Issues Found

Medium Priority

  1. Solr Injection Risk (backend/src/monarch_py/implementations/solr/solr_query_utils.py)

    • escape() function only handles colons, not other Solr special characters
    • Risk mitigated by regex validation, but consider comprehensive escaping
  2. Large Result Sets (solr_query_utils.py:405)

    • Default 50,000 max rows could cause memory issues
    • Consider pagination or document safety
  3. Error Handling (solr_implementation.py:637)

    • Silently returns empty list on Solr errors
    • Add validation for expected response structure
  4. Missing Type Annotation (case_phenotype.py:51)

    • _is_disease_entity() parameter lacks type hint

Low Priority

  • Consider batch loading for multiple grids
  • Extract magic numbers to constants
  • Document test data choices

Assessments

Security: Low Risk - Input validation, no user-generated content in queries, proper error handling

Performance: Good - Limit parameters, efficient JOIN queries, faceted queries, performance tests included


Recommendations

Should Fix Soon:

  1. Enhance Solr query escaping
  2. Add explicit Solr response validation
  3. Add type hints to _is_disease_entity()

Nice to Have:

  1. Response caching
  2. Extract frontend constants
  3. Query performance monitoring

Summary

Category Rating
Code Quality 5/5
Security 4/5
Performance 4/5
Test Coverage 5/5
Maintainability 5/5

Overall: High-quality work demonstrating strong engineering practices. The generic grid framework is particularly well-designed. Issues identified are minor and preventative.

Great job!

- Left-align phenotype/system column content in EntityGrid
- Default to showing column labels and category grid lines in playground
- Replace checkbox/radio controls with styled toggle switches showing on/off state

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 30, 2026

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive PR! This is a well-architected feature with excellent separation of concerns and solid documentation.

Overall Assessment

Strengths:

  • Clear architecture with good separation between API layer, utilities, and data models
  • Strong typing throughout (Pydantic models + TypeScript)
  • Comprehensive test coverage including unit, integration, and performance tests
  • Excellent documentation in the PR description

Areas Requiring Attention:

  • Security: Input validation and escaping needs strengthening
  • Bug Risk: Edge cases in cell key generation and race conditions
  • Performance: Opportunities for caching and optimization

CRITICAL Issues (Please Address Before Merge)

1. Solr Injection Risk - Input Escaping Insufficient

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:371-375

While case_phenotype.py has strict MONDO ID validation, these query-building functions are reusable and could be called from contexts with looser validation. Direct string interpolation creates injection risk. The escape function in utils.py only escapes colons, not all Solr special characters.

2. Generic Entity Grid Lacks Input Validation

Location: backend/src/monarch_py/implementations/solr/solr_implementation.py:742-820

The context_id parameter is not validated before being used in Solr queries.

3. Cell Key Collision Risk

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:220

If a case ID contains a colon (e.g., PMID:12345), parsing will fail. Use a safer delimiter.

HIGH Priority Issues

4. Race Condition in Tab Switching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:186-232

Multiple rapid tab switches could result in stale data. Implement request cancellation using AbortController.

5. Automatic Tab Switching Without User Notification

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:213-221

The code silently switches tabs which could confuse users. Add a notification.

6. Missing Security Test Coverage

No tests for injection attempts, malformed IDs, or special characters.

MEDIUM Priority Issues

7. Bin Assignment Fallback is Non-Deterministic

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:117-141

8. Silent Filtering of Unbinned Phenotypes

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:100-104

9. Error Message Parsing is Brittle

Location: frontend/src/api/case-phenotype.ts:100-116

10. Inefficient Bin Building Loop

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:188-200

Recommendations

Before Merge:

  1. Implement comprehensive Solr escaping
  2. Add input validation to generic entity grid
  3. Fix cell key delimiter collision
  4. Add security test cases

Short-term:
5. Implement request cancellation
6. Add user notifications
7. Expand test coverage
8. Fix error message parsing

Long-term:
9. Add caching layer
10. Implement pagination
11. Add rate limiting

Conclusion

This is a well-designed feature with solid architecture. However, the security concerns around input validation and Solr injection should be addressed before merging to production.

Great work overall!

…lection

Adds subject/object category metadata to association type mappings, a new
API endpoint to query traversable associations by entity type, and frontend
logic to dynamically populate column/row options based on entity category.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 10, 2026

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Executive Summary

This PR delivers excellent architectural improvements, reducing API calls from 40+ to 1 through efficient Solr JOIN queries. The implementation is well-designed, type-safe, and includes comprehensive test coverage. However, several security, performance, and code quality issues should be addressed before merging.

Recommendation: ✅ Approve with Changes (address high-priority items first)


🎯 Positive Findings

Architecture & Design Excellence

  1. Brilliant Query Optimization: The Solr JOIN query approach replacing 40+ sequential API calls with a single 2-hop query is a significant performance win
  2. Generic & Reusable: The entity grid infrastructure is well-designed for extensibility with configuration-based patterns
  3. Clear Separation of Concerns: Clean layering in both backend (API → Implementation → Utils) and frontend (API Client → Components → Types)
  4. Strong Type Safety: Comprehensive TypeScript typing with proper transformations between snake_case/camelCase
  5. Comprehensive Testing: 7 backend test files covering unit, integration, and performance tests with realistic thresholds

🚨 Issues Found

CRITICAL - Must Fix Before Merge

1. Solr Injection Vulnerability (Security)

Files: backend/src/monarch_py/implementations/solr/solr_query_utils.py:374, 425

Disease IDs are inserted into Solr queries with inadequate escaping. The escape() function only handles colons, missing special Solr characters like ", *, (, ), OR, AND.

Risk: While the API validates MONDO IDs with regex, if these query builders are reused elsewhere, this creates injection vulnerabilities.

Fix: Enhance escape function to handle all Solr special characters


MAJOR - Should Fix Before Merge

2. Race Condition in Frontend Data Fetching (Bugs)

File: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:261-263

The watch on route changes can cause race conditions where stale data displays if route changes during a fetch.

Fix: Add AbortController to cancel in-flight requests when route changes.

3. Performance: Inefficient Bin Summary Calculation (Performance)

File: frontend/src/components/EntityGrid/entity-grid-utils.ts

getBinCellSummary recalculates on every render. With 20 bins × 100 columns = 2,000 calls per render.

Fix: Memoize summaries in a computed property with Map cache.

4. No Upper Limit Validation on Solr Results (Performance)

File: backend/src/monarch_py/implementations/solr/solr_query_utils.py:63

rows=50000 is hardcoded. A disease with 100 cases × 500 phenotypes = 50,000 associations could hit this limit silently.

Fix: Add explicit limit checking and return warnings when approaching limits.

5. Missing Null Checks in Cell Data Access (Bugs)

File: frontend/src/components/EntityGrid/EntityGrid.vue:255-263

The isCellPresent function doesn't validate cell data structure.

Fix: Add validation to warn when cell data is missing expected fields.


MINOR - Can Address in Follow-up

6. XSS Risk in Tooltip HTML (Security)

Tooltip content constructed with raw HTML from user data. Fix with DOMPurify or component slots.

7. Expanded Bin State Not Reset (Bugs)

expandedBin ref persists across matrix changes if component is reused.

8. Inconsistent Limit Values (Code Quality)

Frontend MAX_CASES_LIMIT = 100 but backend allows up to 1000. Document why they differ.

9. Missing Edge Case Tests (Testing)

No tests for Unicode characters in case IDs, empty bins, negated-only phenotypes, or partial Solr results.

10. Hardcoded Bin Definitions (Maintainability)

HistoPheno bins defined separately in backend and frontend. Consider single source of truth.


📊 Analysis by Category

Code Quality: ⭐⭐⭐⭐ (Very Good)

  • ✅ Excellent type hints and dataclasses
  • ✅ Strong TypeScript typing
  • ✅ Proper Vue 3 Composition API patterns
  • ✅ Good inline documentation

Performance: ⭐⭐⭐⭐ (Great)

  • ✅ Single JOIN query replaces 40+ calls
  • ✅ Facet queries for aggregations
  • ✅ Smart filtering with association_counts
  • ✅ Performance tests with 2-3s thresholds
  • ⚠️ Missing memoization for bin summaries

Security: ⭐⭐⭐ (Good)

  • ✅ Strict input validation with regex
  • ✅ Proper HTTP status codes
  • ⚠️ Solr escaping insufficient (critical)
  • ⚠️ No HTML sanitization in tooltips

Testing: ⭐⭐⭐⭐ (Comprehensive)

  • ✅ Unit tests for query builders
  • ✅ Integration tests for API
  • ✅ Performance benchmarks
  • ⚠️ Missing edge case coverage

🎯 Recommendations

High Priority (Before Merge)

  1. Fix Solr injection escaping
  2. Add fetch abort controller
  3. Memoize bin summaries
  4. Add row limit validation
  5. Add null checks for cell data

Medium Priority (Near-Term)

  1. Add edge case tests
  2. Sanitize tooltip HTML
  3. Reset expanded state on matrix change
  4. Document limit differences

Low Priority (Future)

  1. Centralize bin definitions
  2. Add retry logic for network failures
  3. Add grid virtualization for large datasets
  4. Add E2E tests

🏁 Overall Assessment

Summary Ratings:

  • Architecture: ⭐⭐⭐⭐⭐ (Excellent design, highly reusable)
  • Code Quality: ⭐⭐⭐⭐ (Very good, minor improvements needed)
  • Performance: ⭐⭐⭐⭐ (Great optimization, some edge cases)
  • Security: ⭐⭐⭐ (Good validation, injection risks to address)
  • Testing: ⭐⭐⭐⭐ (Comprehensive, needs edge cases)
  • Documentation: ⭐⭐⭐⭐⭐ (Excellent PR description)

This is high-quality work delivering significant value. The architecture is sound and sets a great foundation for future grid visualizations. The security issues and race condition should be addressed before production deployment.

Estimated time for high-priority fixes: 4-8 hours

Great work overall! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant