Competitive intelligence system for AI training datasets. CLI + MCP ready.
-
Updated
Feb 12, 2026 - Python
Competitive intelligence system for AI training datasets. CLI + MCP ready.
Agent trajectory data engineering monorepo — sandbox execution, trajectory recording, process reward scoring & pipeline orchestration. CLI + MCP ready.
Process-level rubric-based reward engine for Code Agent trajectories. CLI + MCP ready.
Reproducible Docker sandbox for Code Agent task execution and trajectory replay. CLI + MCP ready.
Standardized trajectory recording for Code Agent frameworks with adapter pattern. CLI + MCP ready.
Seed-to-scale synthetic data engine for LLM training workflows. CLI + MCP ready.
Pipeline orchestrator for Code Agent trajectory data — sandbox, recording, and reward in one flow. CLI + MCP ready.
LLM distillation detection & model fingerprinting — detect text source, verify model identity, audit distillation. CLI + MCP ready.
Reverse-engineering framework for AI datasets — extract annotation specs, cost models & reproducibility. CLI + MCP ready.
Lightweight, serverless HTML labeling tool for offline annotation teams. CLI + MCP ready.
Automated quality checks, anomaly detection & distribution analysis for LLM datasets. CLI + MCP ready.
Add a description, image, and links to the ai-data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the ai-data-pipeline topic, visit your repo's landing page and select "manage topics."