Using CISA Known Exploited Vulnerabilities for Data-Driven Security Operations
This repository contains the complete implementation of our research on reinforcement learning-based vulnerability prioritization using the CISA KEV catalog, NVD CVSS scores, and EPSS data.
Modern organizations face thousands of published vulnerabilities with limited resources for remediation. This work demonstrates that reinforcement learning can learn optimal prioritization policies from real-world exploitation data, achieving:
- 98.4% classification accuracy (DQN)
- 3,587.50 average reward (3,663 points improvement over random baseline)
- 10-minute training time (production-ready)
- Balanced prioritization (52% medium, 48% immediate)
| Method | Accuracy | Avg Reward | F1 (Macro) | Training Time |
|---|---|---|---|---|
| Random Baseline | N/A | -75.50 | N/A | N/A |
| XGBoost | 100.0% | N/A | 100.0% | <1 min |
| DQN (Ours) | 98.4% | 3,587.50 | 65.7% | ~10 min |
| PPO | 46.9% | 2,822.00 | 21.3% | ~15 min |
DQN Performance Highlights:
- β 3,663 reward point improvement over random baseline
- β Balanced strategy: 52% medium priority, 48% immediate
- β 100% recall on high-urgency vulnerabilities
- β Production-ready: 10-minute training time
- β Reproducible: Fixed random seeds, public code
Feature Importance:
- π΄ Ransomware flag: 77.9%
- π CVSS score: 19.9%
- π‘ Days since added: 2.2%
- π’ EPSS: <1%
# Clone repository
git clone https://github.com/GitSene/RL-KEV-Vulnerability-Prioritization.git
cd RL-KEV-Vulnerability-Prioritization
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Download CISA KEV catalog
python data/download_kev.py
# Enrich with NVD CVSS and EPSS (requires API key)
python src/data_processing/kev_enrichment.py --api-key YOUR_NVD_API_KEY# Train DQN agent
python scripts/train_dqn.py --episodes 200
# Train PPO agent
python scripts/train_ppo.py --episodes 200
# Train XGBoost baseline
python scripts/train_xgboost.py# Run complete evaluation
python scripts/evaluate_all.py
# Quick demo with trained model
python scripts/demo.pyRL-KEV-Vulnerability-Prioritization/
βββ data/ # Data download and storage
β βββ download_kev.py # KEV catalog downloader
β βββ kev_enriched.csv # Enriched dataset (1,464 CVEs)
β βββ .gitkeep
βββ src/ # Source code
β βββ data_processing/ # KEV enrichment pipeline
β β βββ kev_enrichment.py
β βββ environment/ # RL environment (Gymnasium)
β β βββ vuln_env.py
β βββ agents/ # DQN and PPO implementations
β β βββ dqn_agent.py
β β βββ ppo_agent.py
β βββ baselines/ # XGBoost baseline
β β βββ xgboost_baseline.py
β βββ evaluation/ # Evaluation scripts
β βββ evaluate.py
βββ models/ # Trained model weights
β βββ dqn_model.pth
β βββ ppo_model.pth
β βββ .gitkeep
βββ notebooks/ # Jupyter notebooks for analysis
βββ results/ # Output figures and tables
β βββ figures/ # PNG visualizations
β βββ tables/ # CSV results
βββ scripts/ # Executable training scripts
β βββ train_dqn.py
β βββ train_ppo.py
β βββ train_xgboost.py
β βββ evaluate_all.py
β βββ reproduce_results.py
β βββ demo.py
βββ tests/ # Unit tests
βββ docs/ # Documentation
β βββ INSTALL.md
βββ requirements.txt # Python dependencies
βββ .gitignore
βββ LICENSE # MIT License
βββ CITATION.cff # Citation metadata
βββ CONTRIBUTING.md # Contribution guidelines
βββ README.md # This file
- CISA KEV Catalog: 1,464 confirmed exploited vulnerabilities (November 2025)
- NVD CVSS v3.1: Technical severity scores (0β10)
- EPSS: Exploitation probability predictions (0β1)
- Features:
cvss_score,epss,epss_percentile,days_since_added,ransomware_flag
Urgency Distribution:
- Low urgency: 7 (0.5%)
- Medium urgency: 771 (52.7%)
- High urgency: 686 (46.8%)
Markov Decision Process (MDP):
- State Space: 5-dimensional continuous feature vector
- Action Space: 4 discrete actions
- aβ: Monitor only
- aβ: Patch within 30 days
- aβ: Patch within 7 days
- aβ: Patch immediately
- Reward Function: Urgency alignment + SLA compliance penalties
- Environment: Custom Gymnasium-compatible implementation
- DQN: Value-based learning with experience replay (50k buffer)
- PPO: Policy-gradient learning with actor-critic architecture
- XGBoost: Traditional ML baseline for comparison
- Classification: 98.4% accuracy with balanced prioritization
- Recall: 100% on high-urgency vulnerabilities (no critical CVEs missed)
- Action Distribution:
- Patch within 30 days: 52.0%
- Patch immediately: 47.9%
- Patch within 7 days: 0.1%
- Monitor only: 0.0%
- Convergence: ~175 episodes (~10 minutes on CPU)
- Strategy: Aggressive safety-first (100% immediate patching)
- Convergence: 8Γ faster than DQN (~20 episodes)
- Trade-off: Lower reward but faster learning
XGBoost feature importance reveals:
- Ransomware flag: 77.9% (dominant predictor)
- CVSS score: 19.9%
- Days since added: 2.2%
- EPSS probability: <1%
- EPSS percentile: <1%
Insight: Binary exploitation evidence (KEV membership, ransomware campaigns) dominates over probabilistic predictions (EPSS) in the KEV context.
If you use this code or data in your research, please cite:
@article{habibi2025rl,
title={Reinforcement Learning-Based Vulnerability Prioritization Using {CISA} Known Exploited Vulnerabilities},
author={Habibi, Babek},
journal={IEEE Transactions on Information Forensics and Security},
year={2025},
note={Under Review}
}This project is licensed under the MIT License - see the LICENSE file for details.
- CISA for maintaining the Known Exploited Vulnerabilities catalog
- NIST for the National Vulnerability Database
- FIRST for the Exploit Prediction Scoring System (EPSS)
- Author: Babek Habibi
- Email: bnorouzlou19519@ucumberlands.edu
- Institution: University of the Cumberlands, Department of Computer Science
- GitHub: @GitSene
This work builds upon and complements recent advances in vulnerability prioritization:
- Shimizu & Hashimoto (2025): Vulnerability Management Chaining - Decision tree integration of CVSS, EPSS, and KEV
- NIST LEV (2025): Likely Exploited Vulnerabilities metric
Our RL-based approach is complementary: while decision trees provide static filtering rules, reinforcement learning optimizes sequential decisions under operational constraints.
β If you find this work useful, please consider starring the repository!
π Project Status: Under review for IEEE TIFS
π Last Updated: December 2025
π» Maintained by: @GitSene