Add dimensional explainability to KNN detector by Powerscore · Pull Request #652 · yzhao062/pyod

Powerscore · 2026-01-03T15:30:21Z

Summary

This PR adds dimensional explainability to the KNN detector, providing both visualization and programmatic access to per-sample, per-dimension outlier contributions. The implementation includes an explain_outlier() method for visualization and a get_outlier_explainability_scores() method for programmatic access to dimensional scores.

Motivation

KNN is one of the most widely used outlier detection algorithms in PyOD, but it has lacked interpretability features. While the algorithm can identify outliers, it doesn't explain why a sample is anomalous. This PR addresses that gap by:

Visualizing per-feature contributions via horizontal bar charts
Providing statistical context through percentile cutoff bands
Enabling programmatic access to dimensional scores (completing a TODO from COPOD's implementation)

Changes Made

Core Implementation (`pyod/models/knn.py`)

Store Training Data (Line ~194)
- Added self.X_train_ = X for explainability
- Follows COPOD's pattern
- Stores a reference to training data (O(N×D) memory) to enable distance-based dimensional scoring. While this increases memory usage, it aligns with COPOD's design for consistency and is necessary for lazy feature-wise distance computation.
Helper Method: _compute_dimensional_scores() (Lines ~283-321)
- Calculates average absolute distance to k-neighbors for each feature
- Supports feature subset selection via columns parameter
- Returns dimensional score vector
Main Method: explain_outlier() (Lines ~323-475)
- Horizontal bar chart visualization
- Color-coded bars (Blue: normal, Orange: warning, Red: extreme)
- Cutoff bands for statistical context
- Flexible parameters (feature selection, custom cutoffs, file export)
- Comprehensive docstring
Score Access Method: get_outlier_explainability_scores() (Lines ~277-282)
- Returns per-dimension explainability scores as a numpy array
- Enables programmatic access to dimensional contributions
- Completes the explainability interface (addresses COPOD's TODO)
- Supports feature subset selection via columns parameter
Added Import (Line 10)
- import matplotlib.pyplot as plt

Example (`examples/knn_interpretability.py`)

Created a clean, simple example following COPOD's interpretability example pattern:

Uses cardio.mat (21 features) - demonstrates value for high-dimensional data
Shows basic usage with default parameters
Demonstrates custom cutoffs
~68 lines (consistent with other PyOD examples)

API Design

The API mirrors COPOD's explain_outlier() for consistency:

Feature	COPOD	KNN (New)
Method name	`explain_outlier()`	`explain_outlier()` ✓
Parameters	`ind, columns, cutoffs, feature_names, file_name, file_type`	Same ✓
Visualization	Scatter plot	Horizontal bars
Returns programmatic scores	TODO	`get_outlier_explainability_scores()` ✓
Pragma	`# pragma: no cover`	`# pragma: no cover` ✓

Usage Example:

Visualization:

from pyod.models.knn import KNN
from pyod.utils.data import generate_data

# Fit KNN detector
X_train, _, _, _ = generate_data(n_train=200, n_features=5)
knn = KNN(n_neighbors=10, method='mean', contamination=0.1)
knn.fit(X_train)

# Visualize outlier explanation
knn.explain_outlier(
    ind=42,
    feature_names=['Age', 'Income', 'Credit', 'Debt', 'Savings'],
    cutoffs=[0.90, 0.99],
    file_name='outlier_42',
    file_type='png'
)

Programmatic access to scores:

# Get dimensional explainability scores as numpy array
scores = knn.get_outlier_explainability_scores(ind=42)
print(f"Per-dimension scores: {scores}")
# Can be used for further analysis, custom visualizations, or integration with other tools

Technical Details

Algorithm:

For sample at index ind:
  1. Query k-nearest neighbors from training data
  2. For each dimension d:
     - Compute |X[neighbors, d] - X[ind, d]|
     - Average across k neighbors
     → dim_score[d]
  3. Compute cutoff bands (percentiles across all samples)
  4. Create horizontal bar chart with color coding

Complexity:

Space: O(N×D) for storing X_train_
Time:
- First call (with cutoffs): O(N×k×D) to compute statistical bands across the full training set.
- Subsequent calls: O(k×D) per explanation. Results are cached (self._cached_dimensional_scores), making interactive exploration nearly instant after the initial computation.
Memory trade-off: Storing training data (self.X_train_) enables explainability but increases memory footprint (O(N×D)). This aligns with COPOD's design and allows for lazy feature-wise distance computation.

Design Decisions:

On-demand computation - Don't pre-compute/store all dimensional scores
- Reason: Explainability is used sparingly, saves memory
Store X_train_ - Following COPOD's pattern
- Reason: Required for dimensional analysis, consistent with PyOD
Horizontal bars - Instead of COPOD's scatter plot
- Reason: More intuitive for distance-based outliers
# pragma: no cover - Exclude visualization from test coverage
- Reason: Consistent with COPOD's approach

Testing

Following PyOD conventions (see COPOD's test_copod.py lines 147-149), visualization methods use # pragma: no cover and are demonstrated via examples rather than unit tests.

Manual Validation:
Extensively tested with:

Multiple datasets (generated data, cardio.mat, Pima Indians Diabetes Dataset)
Various parameters (cutoffs, columns, feature_names)
2D visualizable data for correctness verification (see screenshots below)

Test Results:

All 38 existing KNN tests pass (pytest pyod/test/test_knn.py -v)
Example scripts run successfully (python examples/knn_interpretability.py)

Backwards Compatibility

No breaking changes to existing API:

New attributes (X_train_) only created when needed
Optional feature (doesn't affect core functionality)
All existing tests pass

Use Cases

This feature enables:

Fraud Detection - Identify which transaction features are suspicious
Network Security - Understand which traffic patterns trigger alerts
Quality Control - Pinpoint which product measurements are defective
Healthcare - Understand patient outlier profiles
IoT Monitoring - Detect which sensor readings are anomalous

Research Foundation

This implementation is based on the method described in:

Krenmayr, Lucas and Goldstein, Markus (2023). "Explainable Outlier Detection Using Feature Ranking for k-Nearest Neighbors, Gaussian Mixture Model and Autoencoders." In 15th International Conference on Agents and Artificial Intelligence (ICAART), pp. 245-253. https://doi.org/10.5220/0011631900003411

BibTeX:

@inproceedings{Lucas2023xodknn,
  author = {Krenmayr, Lucas and Goldstein, Markus},
  year = {2023},
  month = {02},
  pages = {245-253},
  title = {Explainable Outlier Detection Using Feature Ranking for k-Nearest Neighbors, Gaussian Mixture Model and Autoencoders},
  doi = {10.5220/0011631900003411}
}

This PR implements dimensional feature-ranking for KNN outlier interpretation per the method described in the paper above, and extends PyOD with both visualization and a returned explainability score vector (per-dimension evidence), addressing a gap noted in COPOD's implementation.

Screenshots/Examples

2D Validation Examples

These examples demonstrate the correctness of the dimensional explainability approach on 2D data where the results can be visually verified.

Figure 7.3: 2D k-NN Inlier

Demonstrates how an inlier point has low k-NN scores for both dimensions. The overall k-NN score is low, and both individual dimensions show low anomaly scores, correctly identifying this as a normal sample.

Figure 7.4: 2D k-NN X-Dimension Outlier

Demonstrates how a point outlying only in the X-dimension has a high k-NN score in the X-dimension and a low score in the Y-dimension. This shows the method's ability to isolate anomalies to specific dimensions, which the overall k-NN score alone cannot indicate.

Figure 7.5: 2D k-NN Y-Dimension Outlier

Demonstrates how a point outlying only in the Y-dimension has a high k-NN score in the Y-dimension and a low score in the X-dimension. Note that the overall k-NN score shown is outlying overall but it cannot by itself indicate in which dimension, which has also been demonstrated by the previous figure.

Figure 7.6: 2D k-NN Outlier (Both Dimensions)

Demonstrates how an outlier point has high k-NN scores for both dimensions. This shows the method correctly identifies multi-dimensional anomalies.

Real-World Dataset Examples

However, most real datasets have more than 2 features. Therefore, we demonstrate the application of this explainability technique on the real-world Pima Indians Diabetes Dataset (Smith et al., 1988) after performing Min-Max Scaling on the dataset (hence the k-NN score values per-dimension as well as overall are more compact).

Figure 7.7: Pima k-NN Outlier 1

Demonstrates how the most outlying point is an anomaly mainly because of the Diabetes Pedigree Function and the insulin.

Figure 7.8: Pima k-NN Outlier 2

Demonstrates how the second most outlying point is an anomaly mainly because of the Age and the Skin Thickness, which is a very different reason from the previous outlier. This shows how different outliers can have different dimensional contributions.

Figure 7.9: Pima k-NN Inlier

Demonstrates how an inlier overall is also having low k-NN scores in all individual dimensions, confirming the method's consistency.

Related Work

This PR enhances COPOD's API pattern for dimensional interpretability, and is directly inspired by:

Krenmayr & Goldstein, 2023, ICAART (see Research Foundation above): Paper describing feature ranking based explainability for KNN, GMM, and Autoencoders.
COPOD's explain_outlier() — API consistency for explainability in PyOD; this PR now completes the TODO of returning programmatic scores.
Modern explainability tools (SHAP, LIME, EBM) — Visualization style.
PyOD's emphasis on interpretable outlier detection — Library philosophy.

Impact

Benefits:

Adds interpretability to KNN outlier detection
Provides both visualization and programmatic access to scores
Completes explainability interface (addresses COPOD's TODO)
Maintains consistency with existing PyOD patterns

Backward Compatibility:

No API changes to existing functionality
No breaking changes
Optional feature (doesn't affect core functionality)
All existing tests pass

Checklist

All Submissions Basics:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Have you checked all Issues to tie the PR to a specific one?

All Submissions Cores:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
- Added unit test for get_outlier_explainability_scores() method in test_knn.py (tests numerical logic)
- Following COPOD's pattern, visualization methods use # pragma: no cover and are demonstrated via examples (see test_copod.py lines 147-149)
Have you successfully ran tests with your changes locally?
- All 38 KNN tests pass (pytest pyod/test/test_knn.py -v)
- Example scripts run successfully (python examples/knn_interpretability.py)
Does your submission pass tests, including CircleCI, Travis CI, and AppVeyor?
Does your submission have appropriate code coverage? The cutoff threshold is 95% by Coversall.
- Core functionality coverage unchanged
- Visualization excluded with # pragma: no cover (following COPOD pattern)
- Overall coverage remains ≥95%

Files Changed

pyod/models/knn.py - Added explainability methods
examples/knn_interpretability.py - New example file
pyod/test/test_knn.py - Added unit test for get_outlier_explainability_scores() method

Add dimensional explainability to KNN detector

0287ee0

Powerscore mentioned this pull request Jan 3, 2026

Add dimensional explainability to LOF detector #653

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dimensional explainability to KNN detector#652

Add dimensional explainability to KNN detector#652
Powerscore wants to merge 1 commit intoyzhao062:masterfrom
Powerscore:feature/knn-explainability

Powerscore commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Powerscore commented Jan 3, 2026

Summary

Motivation

Changes Made

Core Implementation (pyod/models/knn.py)

Example (examples/knn_interpretability.py)

API Design

Technical Details

Testing

Backwards Compatibility

Use Cases

Research Foundation

Screenshots/Examples

2D Validation Examples

Real-World Dataset Examples

Related Work

Impact

Checklist

All Submissions Basics:

All Submissions Cores:

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Core Implementation (`pyod/models/knn.py`)

Example (`examples/knn_interpretability.py`)