Skip to content

reinforcement Learning for Vulnerability Prioritization using CISA KEV - DQN achieves 98.4% accuracy with 3,587.50 reward

License

Notifications You must be signed in to change notification settings

GitSene/RL-KEV-Vulnerability-Prioritization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Reinforcement Learning-Based Vulnerability Prioritization

GitHub License: MIT Python 3.11+ PyTorch Paper

Using CISA Known Exploited Vulnerabilities for Data-Driven Security Operations

This repository contains the complete implementation of our research on reinforcement learning-based vulnerability prioritization using the CISA KEV catalog, NVD CVSS scores, and EPSS data.

🎯 Overview

Modern organizations face thousands of published vulnerabilities with limited resources for remediation. This work demonstrates that reinforcement learning can learn optimal prioritization policies from real-world exploitation data, achieving:

  • 98.4% classification accuracy (DQN)
  • 3,587.50 average reward (3,663 points improvement over random baseline)
  • 10-minute training time (production-ready)
  • Balanced prioritization (52% medium, 48% immediate)

πŸ“Š Key Results

Method Accuracy Avg Reward F1 (Macro) Training Time
Random Baseline N/A -75.50 N/A N/A
XGBoost 100.0% N/A 100.0% <1 min
DQN (Ours) 98.4% 3,587.50 65.7% ~10 min
PPO 46.9% 2,822.00 21.3% ~15 min

DQN Performance Highlights:

  • βœ… 3,663 reward point improvement over random baseline
  • βœ… Balanced strategy: 52% medium priority, 48% immediate
  • βœ… 100% recall on high-urgency vulnerabilities
  • βœ… Production-ready: 10-minute training time
  • βœ… Reproducible: Fixed random seeds, public code

Feature Importance:

  • πŸ”΄ Ransomware flag: 77.9%
  • 🟠 CVSS score: 19.9%
  • 🟑 Days since added: 2.2%
  • 🟒 EPSS: <1%

πŸš€ Quick Start

Installation

# Clone repository
git clone https://github.com/GitSene/RL-KEV-Vulnerability-Prioritization.git
cd RL-KEV-Vulnerability-Prioritization

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Download and Enrich Data

# Download CISA KEV catalog
python data/download_kev.py

# Enrich with NVD CVSS and EPSS (requires API key)
python src/data_processing/kev_enrichment.py --api-key YOUR_NVD_API_KEY

Train Models

# Train DQN agent
python scripts/train_dqn.py --episodes 200

# Train PPO agent
python scripts/train_ppo.py --episodes 200

# Train XGBoost baseline
python scripts/train_xgboost.py

Evaluate

# Run complete evaluation
python scripts/evaluate_all.py

# Quick demo with trained model
python scripts/demo.py

πŸ“ Repository Structure

RL-KEV-Vulnerability-Prioritization/
β”œβ”€β”€ data/                    # Data download and storage
β”‚   β”œβ”€β”€ download_kev.py     # KEV catalog downloader
β”‚   β”œβ”€β”€ kev_enriched.csv    # Enriched dataset (1,464 CVEs)
β”‚   └── .gitkeep
β”œβ”€β”€ src/                     # Source code
β”‚   β”œβ”€β”€ data_processing/    # KEV enrichment pipeline
β”‚   β”‚   └── kev_enrichment.py
β”‚   β”œβ”€β”€ environment/        # RL environment (Gymnasium)
β”‚   β”‚   └── vuln_env.py
β”‚   β”œβ”€β”€ agents/             # DQN and PPO implementations
β”‚   β”‚   β”œβ”€β”€ dqn_agent.py
β”‚   β”‚   └── ppo_agent.py
β”‚   β”œβ”€β”€ baselines/          # XGBoost baseline
β”‚   β”‚   └── xgboost_baseline.py
β”‚   └── evaluation/         # Evaluation scripts
β”‚       └── evaluate.py
β”œβ”€β”€ models/                  # Trained model weights
β”‚   β”œβ”€β”€ dqn_model.pth
β”‚   β”œβ”€β”€ ppo_model.pth
β”‚   └── .gitkeep
β”œβ”€β”€ notebooks/              # Jupyter notebooks for analysis
β”œβ”€β”€ results/                # Output figures and tables
β”‚   β”œβ”€β”€ figures/           # PNG visualizations
β”‚   └── tables/            # CSV results
β”œβ”€β”€ scripts/                # Executable training scripts
β”‚   β”œβ”€β”€ train_dqn.py
β”‚   β”œβ”€β”€ train_ppo.py
β”‚   β”œβ”€β”€ train_xgboost.py
β”‚   β”œβ”€β”€ evaluate_all.py
β”‚   β”œβ”€β”€ reproduce_results.py
β”‚   └── demo.py
β”œβ”€β”€ tests/                  # Unit tests
β”œβ”€β”€ docs/                   # Documentation
β”‚   └── INSTALL.md
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE                # MIT License
β”œβ”€β”€ CITATION.cff           # Citation metadata
β”œβ”€β”€ CONTRIBUTING.md        # Contribution guidelines
└── README.md              # This file

πŸ”¬ Methodology

Dataset Construction

  1. CISA KEV Catalog: 1,464 confirmed exploited vulnerabilities (November 2025)
  2. NVD CVSS v3.1: Technical severity scores (0–10)
  3. EPSS: Exploitation probability predictions (0–1)
  4. Features: cvss_score, epss, epss_percentile, days_since_added, ransomware_flag

Urgency Distribution:

  • Low urgency: 7 (0.5%)
  • Medium urgency: 771 (52.7%)
  • High urgency: 686 (46.8%)

RL Formulation

Markov Decision Process (MDP):

  • State Space: 5-dimensional continuous feature vector
  • Action Space: 4 discrete actions
    • aβ‚€: Monitor only
    • a₁: Patch within 30 days
    • aβ‚‚: Patch within 7 days
    • a₃: Patch immediately
  • Reward Function: Urgency alignment + SLA compliance penalties
  • Environment: Custom Gymnasium-compatible implementation

Algorithms

  • DQN: Value-based learning with experience replay (50k buffer)
  • PPO: Policy-gradient learning with actor-critic architecture
  • XGBoost: Traditional ML baseline for comparison

πŸ“ˆ Results Summary

DQN Performance

  • Classification: 98.4% accuracy with balanced prioritization
  • Recall: 100% on high-urgency vulnerabilities (no critical CVEs missed)
  • Action Distribution:
    • Patch within 30 days: 52.0%
    • Patch immediately: 47.9%
    • Patch within 7 days: 0.1%
    • Monitor only: 0.0%
  • Convergence: ~175 episodes (~10 minutes on CPU)

PPO Performance

  • Strategy: Aggressive safety-first (100% immediate patching)
  • Convergence: 8Γ— faster than DQN (~20 episodes)
  • Trade-off: Lower reward but faster learning

Feature Importance Analysis

XGBoost feature importance reveals:

  1. Ransomware flag: 77.9% (dominant predictor)
  2. CVSS score: 19.9%
  3. Days since added: 2.2%
  4. EPSS probability: <1%
  5. EPSS percentile: <1%

Insight: Binary exploitation evidence (KEV membership, ransomware campaigns) dominates over probabilistic predictions (EPSS) in the KEV context.

πŸ“„ Citation

If you use this code or data in your research, please cite:

@article{habibi2025rl,
  title={Reinforcement Learning-Based Vulnerability Prioritization Using {CISA} Known Exploited Vulnerabilities},
  author={Habibi, Babek},
  journal={IEEE Transactions on Information Forensics and Security},
  year={2025},
  note={Under Review}
}

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • CISA for maintaining the Known Exploited Vulnerabilities catalog
  • NIST for the National Vulnerability Database
  • FIRST for the Exploit Prediction Scoring System (EPSS)

πŸ“§ Contact

πŸ”— Resources

🌟 Related Work

This work builds upon and complements recent advances in vulnerability prioritization:

Our RL-based approach is complementary: while decision trees provide static filtering rules, reinforcement learning optimizes sequential decisions under operational constraints.


⭐ If you find this work useful, please consider starring the repository!


πŸ“Š Project Status: Under review for IEEE TIFS
πŸ”„ Last Updated: December 2025
πŸ’» Maintained by: @GitSene