Advanced Portfolio Hedging & Risk Analytics

📌 Project Overview

This project implements a sophisticated risk management system designed for Ultra High Net Worth (UHNW) clients holding concentrated single-stock positions. The core constraint is tax efficiency: reducing portfolio risk without triggering capital gains taxes by selling the underlying asset.

We compare two distinct hedging approaches:

Quantitative Factor Model: A traditional Barra-style risk model using Bloomberg factors (Size, Value, Momentum) and robust regression.
NLP Semantic Model: A novel approach using Large Language Model (LLM) embeddings (Nomic v1.5) to identify "fundamental peers" based on semantic business similarity.

🚀 Key Results

The backtest results (Notebook 03) reveal that NLP-based hedging outperforms traditional factor models for idiosyncratic companies where sector labels are insufficient.

Strategy	Win Case Example	Rationale
NLP Hedge	Flextronics (FLEX)	+402 bps risk reduction. NLP correctly identified niche electronics manufacturing peers that generic "Tech" factors missed.
NLP Hedge	Mosaic (MOS)	+392 bps risk reduction. Semantic search captured the specific fertilizer/commodity risk better than broad "Materials" sector factors.
Factor Hedge	Apple (AAPL)	-200 bps. For mega-cap stocks driven by broad market flows, the systematic factor model proved superior to semantic matching.

📂 Repository Structure

.
├── data/                   # Raw and Processed data (Bloomberg, Wikipedia, Embeddings)
├── notebooks/              # Jupyter Notebooks (Sequential Logic)
│   ├── 00_exploratory_data_analysis.ipynb   # Data Cleaning & Veralto/UMB Fixes
│   ├── 01_factor_model_construction.ipynb   # Huber Robust Regression & Factor Returns
│   ├── 02_nlp_embedding_generation.ipynb    # Nomic v1.5 Embeddings & Context-Aware Chunking
│   ├── 03_hedging_strategy_comparison.ipynb # The Backtest: Factor Optimization vs. NLP
│   └── 04_ai_revolution_clustering.ipynb    # Extra Credit: Unsupervised AI Clustering
├── scripts/                # Production scripts for batch jobs
├── src/                    # Source code package (adv_hedging)
│   ├── hedging/            # Optimization & Metrics
│   ├── nlp/                # Text Processing & Embeddings
│   └── risk_model/         # Factor Engine
├── environment.yml         # Conda environment definition
└── pyproject.toml          # Python dependencies

🛠 Installation & Setup

This project uses a custom Conda environment (hedging_clean) with Python 3.10.

Clone the Repository:

git clone [https://github.com/your-username/advanced-portfolio-hedging.git](https://github.com/your-username/advanced-portfolio-hedging.git)

cd advanced-portfolio-hedging ```

Create Environment:

conda env create -f environment.yml
conda activate hedging_clean

Install Local Package:
```
pip install -e .
```

🧠 Methodology Details

Factor Risk Model
- Data: 7 Bloomberg Risk Factors (Size, Value, Momentum, Volatility, Profitability, Leverage, Trading Activity).
- Estimation: Uses Huber Robust Regression (epsilon=1.35) to estimate daily factor returns, minimizing the impact of outliers (meme stocks).
- Covariance: Factor covariance matrix estimated on a 2-year rolling window.
NLP Engine
- Model: nomic-ai/nomic-embed-text-v1.5 (Matryoshka embeddings).
- Innovation: Implements Context-Aware Chunking. Every text chunk includes the company metadata header ("Title: Apple Inc...") to prevent context loss in long documents.
- Evaluation: Validated using Silhouette Scores on GICS sectors, outperforming standard MPNet and BGE models.
Hedging Optimization (Part 3)
- Objective: Minimize Active Risk (Tracking Error) against the target stock.
- Constraints:
  - Max 10 positions (Cardinality constraint for operational simplicity).
  - Max weight 25% per position.
  - Hedge Ratio: 100% (Dollar Neutral).
AI Revolution Clustering (Extra)
- Goal: Challenge expert "Maker vs. User" labels using unsupervised learning.
- Technique: UMAP dimensionality reduction + HDBScan density clustering.
- Insight: The model identified "Hybrid" clusters (e.g., Cloud Hyperscalers like AMZN/GOOGL) that act as both Makers and Users, defying binary classification.

📊 Usage

To reproduce the full analysis, run the notebooks in order:

00_exploratory_data_analysis.ipynb: Verifies data integrity.
01_factor_model_construction.ipynb: Builds the risk model.
02_nlp_embedding_generation.ipynb: Generates Nomic embeddings (requires GPU/MPS).
03_hedging_strategy_comparison.ipynb: Runs the 50-stock backtest loop.

Alternatively, use the command-line interface:

python scripts/run_hedge_backtest.py

-- Last modified Dec 20, 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
notebooks		notebooks
scripts		scripts
src/adv_hedging		src/adv_hedging
tests		tests
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Portfolio Hedging & Risk Analytics

📌 Project Overview

🚀 Key Results

📂 Repository Structure

🛠 Installation & Setup

🧠 Methodology Details

📊 Usage

About

Uh oh!

Releases

Packages

Languages

aengusmartindonaire/advanced-portfolio-hedging

Folders and files

Latest commit

History

Repository files navigation

Advanced Portfolio Hedging & Risk Analytics

📌 Project Overview

🚀 Key Results

📂 Repository Structure

🛠 Installation & Setup

🧠 Methodology Details

📊 Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages