Resilient RAP Framework

A production-grade framework for Reproducible Analytical Pipelines (RAPs) with autonomous schema drift resolution. Built for PhD research in data engineering and trustworthy analytics.

Designed for high-velocity data streams (sports telemetry, clinical data) with built-in semantic reconciliation, tamper-evident audit trails, and human-in-the-loop validation.

Production-Ready Features

Semantic Schema Reconciliation: BERT-based drift detection and field mapping for evolving data schemas
Tamper-Evident Lineage: SHA-256 linked audit records with full provenance tracking
Reproducible Ingestion: Deterministic pipeline execution with run IDs and checkpointing
Multi-Domain Adapters: Pre-built connectors for F1 telemetry, NHL play-by-play, and clinical streams
HITL Analytics: Human-in-the-loop feedback integration with learning curve analysis
Production Logging: Structured audit trails for regulatory compliance and forensic analysis

Quick Start

Prerequisites

Python 3.10 or higher
macOS, Linux, or Windows with WSL2

Installation

# Clone repository
git clone https://github.com/tarek-clarke/resilient-rap-framework
cd resilient-rap-framework

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Basic Usage

Example 1: OpenF1 Telemetry Pipeline

PYTHONPATH="." python tools/demo_openf1.py --session 9158 --driver 1

Example 2: Clinical Data Stream Processing

from adapters.clinical.ingestion_clinical import ClinicalIngestor

# Initialize ingestor with synthetic stream
ingestor = ClinicalIngestor(
    use_stream_generator=True,
    stream_vendor="GE",
    stream_batch_size=25,
)

# Execute pipeline
ingestor.connect()
df = ingestor.run()

# Export audit trail
ingestor.export_audit_log("data/clinical_audit.json")
print(df.head())

Example 3: Hockey Play-by-Play Analytics

PYTHONPATH="." python tools/demo_nhl.py --game 2024020001

Key Directories

resilient-rap-framework/
├── adapters/           # Domain-specific data ingestion (F1, NHL, Clinical)
├── modules/            # Core framework (ingestion, reconciliation, lineage)
├── src/                # Provenance tracking and analytics utilities
├── tools/              # Production pipelines and utilities
├── tests/              # Test suite (unit and integration)
├── data/               # Audit logs, reports, and synthetic datasets
├── reporting/          # PDF report generation
└── docs/               # Extended documentation

Configuration & Output

Audit & Provenance Logs (Automatic)

data/reproducibility_audit.json - Full execution audit trail
data/provenance_log.jsonl - Lineage records (input → output hashing)
data/reports/ - Generated analysis reports

Environment Setup No external environment variables required for baseline operation. Network access needed for upstream API calls (OpenF1, NHL).

Testing

Run the full test suite:

pytest tests/ -v

Run specific test module:

pytest tests/test_semantic_reconciliation.py -v

Core Concepts for PhD Research

Schema Drift Resolution

The framework detects and resolves schema changes in real-time:

Detection: Field addition, deletion, type changes captured via semantic hashing
Reconciliation: BERT embeddings map old schema to new schema
Validation: HITL feedback refines mappings for future runs
Audit: Full lineage maintained for publication and reproduction

Reproducibility & Auditability

Every ingestion step is logged:

# Access audit trail programmatically
audit_log = ingestor.export_audit_log()
for record in audit_log:
    print(f"Input: {record['input_hash']} → Output: {record['output_hash']}")

Human-in-the-Loop Integration

Validate semantic mappings interactively:

from modules.hitl_orchestrator import HumanInTheLoopOrchestrator

orchestrator = HumanInTheLoopOrchestrator()
orchestrator.display_feedback_summary()

Running Benchmarks

Evaluate performance against synthetic data with known drift:

PYTHONPATH="." python tools/benchmark_semantic_layer.py

Documentation

LEARN.md - Detailed system architecture and concepts
QUICK_REFERENCE.md - Common operations
HITL_RETRAINING_GUIDE.md - Human feedback integration
IMPLEMENTATION_SUMMARY.md - Implementation details

Publication & Citation

If you use this framework in published research, please cite:

Clarke, T. (2026). Engineering Resilient RAP Frameworks. engrXiv. https://doi.org/10.31224/6466

See CITATION.cff for additional formats.

Licensing & Contact

License: PolyForm Noncommercial 1.0.0 (see LICENSE)

Academic use: Fully permitted
Commercial use: Requires separate licensing agreement
Contact: tclarke91@proton.me

See CONTRIBUTING.md for contribution guidelines.

Maintained for the PhD program in Reproducible Data Engineering

Experimental Results (Auto-Generated)

Method	Low Drift Accuracy	High Drift Accuracy
Semantic Layer	98%	>85%
Levenshtein Baseline	95%	<15%
RegEx Baseline	100%	0%

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
.devcontainer		.devcontainer
.github		.github
adapters		adapters
archive		archive
benchmarks		benchmarks
data		data
docs		docs
examples		examples
experiments		experiments
modules		modules
reporting		reporting
results		results
scripts		scripts
src		src
tests		tests
tools		tools
.gitignore		.gitignore
CITATION.cff		CITATION.cff
COMPLETION_CHECKLIST.md		COMPLETION_CHECKLIST.md
COMPLETION_STRESS_TEST_CHECKLIST.md		COMPLETION_STRESS_TEST_CHECKLIST.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
PHD_VALIDATION_README.md		PHD_VALIDATION_README.md
PHD_VALIDATION_SUMMARY.md		PHD_VALIDATION_SUMMARY.md
PRODUCTION.md		PRODUCTION.md
QUICK_START_VALIDATION.py		QUICK_START_VALIDATION.py
README.md		README.md
START_HERE.md		START_HERE.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pytest.ini		pytest.ini
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Resilient RAP Framework

Production-Ready Features

Quick Start

Prerequisites

Installation

Basic Usage

Example 1: OpenF1 Telemetry Pipeline

Example 2: Clinical Data Stream Processing

Example 3: Hockey Play-by-Play Analytics

Key Directories

Configuration & Output

Testing

Core Concepts for PhD Research

Schema Drift Resolution

Reproducibility & Auditability

Human-in-the-Loop Integration

Running Benchmarks

Documentation

Publication & Citation

Licensing & Contact

Experimental Results (Auto-Generated)

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

tarek-clarke/resilient-rap-framework

Folders and files

Latest commit

History

Repository files navigation

Resilient RAP Framework

Production-Ready Features

Quick Start

Prerequisites

Installation

Basic Usage

Example 1: OpenF1 Telemetry Pipeline

Example 2: Clinical Data Stream Processing

Example 3: Hockey Play-by-Play Analytics

Key Directories

Configuration & Output

Testing

Core Concepts for PhD Research

Schema Drift Resolution

Reproducibility & Auditability

Human-in-the-Loop Integration

Running Benchmarks

Documentation

Publication & Citation

Licensing & Contact

Experimental Results (Auto-Generated)

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages