Repository: https://github.com/chirindaopensource/youtube_compilation_segmenter
Owner: 2026 Craig Chirinda (Open Source Projects)
This repository contains an independent, professional-grade Python implementation of the research methodology from the 2026 Private Whitepaper entitled "Automated Temporal Segmentation and Semantic Annotation of Heterogeneous Audio Streams" by:
- Craig Chirinda (Lead Researcher)
The project provides a complete, end-to-end computational framework for replicating the paper's findings. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from the deterministic acquisition of heterogeneous audio streams (e.g., DJ mixes) to the rigorous Digital Signal Processing (DSP) of multi-cue novelty features, culminating in global boundary optimization via dynamic programming and cryptographic semantic annotation.
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callable:
execute_mir_pipeline - Prerequisites
- Installation
- Input Data Structure
- Usage
- Output Structure
- Project Structure
- Customization
- Contributing
- Recommended Extensions
- License
- Citation
- Acknowledgments
This project provides a Python implementation of the analytical framework presented in Chirinda (2026). The core of this repository is the iPython Notebook youtube_compilation_segmenter_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings. The pipeline addresses the critical challenge of automated segmentation of non-stationary audio streams, treating the segmentation task not as a heuristic silence-detection problem, but as a global optimization problem over a probabilistic novelty surface.
The paper argues that traditional energy-based segmentation fails in the context of "mix culture" due to crossfades and harmonic mixing. This codebase operationalizes the proposed solution: a Multi-Cue Novelty Fusion Engine that:
- Validates signal integrity using strict IEEE-754 sanitization and normalization.
- Synthesizes a unified novelty curve from temporal (RMS, Onset), spectral (Flux, Centroid), and harmonic (Chroma) features.
- Optimizes segmentation boundaries using a Constrained Shortest Path algorithm on a Directed Acyclic Graph (DAG).
- Annotates segments via cryptographically signed requests to the ACRCloud fingerprinting database.
The implemented methods combine techniques from Digital Signal Processing, Combinatorial Optimization, and Cryptography.
1. Multi-Dimensional Feature Engineering:
The system transforms the raw time-domain signal
-
Spectral Flux: Quantifies the rate of change in the spectral magnitude
$|X(m, k)|$ . - Chroma Novelty: Measures shifts in the harmonic/pitch-class distribution, robust to timbral changes.
- Onset Strength: Detects percussive and transient events.
2. Novelty Fusion and Detrending:
Features are fused into a scalar novelty curve
3. Global Optimization via Dynamic Programming:
The system solves the "segmentation problem" by finding a subsequence of boundaries that maximizes total novelty score subject to hard duration constraints
4. Cryptographic Identification: Semantic annotation is secured via HMAC-SHA1 signing of canonical request strings: $$ \sigma = \text{Base64}(\text{HMAC}{\text{SHA1}}(K{secret}, \text{Method} || \text{URI} || K_{access} || \dots || t)) $$
Below is a diagram which summarizes the proposed approach:
The provided iPython Notebook (youtube_compilation_segmenter_draft.ipynb) implements the full research pipeline, including:
- Robust DSP Pipeline: Implementation of STFT, Mel-spectrograms, and Chroma feature extraction with windowing and padding control.
- Configuration-Driven Design: All study parameters (API credentials, duration constraints, DSP hyperparameters) are managed in an external
config.yamlfile. - Idempotent Systems Programming: Safe directory creation, atomic file operations, and rigorous cleanup of temporary resources.
- Deterministic Execution: Enforces reproducibility through fixed random seeds (where applicable) and deterministic sorting of file paths.
- Type Safety: Extensive use of Python
typing(Protocols, Dataclasses) to enforce interface contracts. - Reproducible Artifacts: Generates structured
SegmentResultobjects, WAV assets, and JSON sidecars for every detected segment.
The core analytical steps directly implement the methodology from the whitepaper:
-
Stream Acquisition (Ingestion): Deterministic download via
yt-dlpand conversion to 22.05kHz mono WAV. -
Signal Conditioning: Loading via
librosa, NaN/Inf sanitization, and robust min-max normalization. - Feature Extraction: Computation of RMS, Spectral Centroid, Spectral Flux, Chroma, and Onset Strength.
- Fusion & Peak Picking: Weighted combination of cues, adaptive thresholding using local moving statistics.
-
Boundary Refinement: Dynamic programming to enforce
$min_duration \leq \Delta t \leq max_duration$ . - Snippet Export: Precision slicing of audio assets with padding.
- Semantic Annotation: Probe extraction and HMAC-signed API requests to ACRCloud.
The notebook is structured as a logical pipeline with modular classes and functions. All callables are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
_ensure_dir: Idempotent filesystem primitive._safe_normalize: Robust numerical normalization.SegmentBoundary/SegmentResult: Immutable data structures for topology and metadata.ACRCloudRecognizer: Cryptographic API client.YouTubeCompilationAnalyzer: Monolithic DSP and optimization orchestrator.
The project is designed around a single, top-level user-facing interface function:
execute_mir_pipeline: This master orchestrator function runs the entire automated research pipeline from end-to-end. A single call to this function reproduces the entire computational portion of the project, managing data flow between ingestion, DSP, optimization, and identification modules.
- Python 3.9+
- FFmpeg must be installed and available on the system PATH.
- Core dependencies:
numpy,scipy,librosa,pydub,yt_dlp,requests,pyyaml.
-
Clone the repository:
git clone https://github.com/chirindaopensource/youtube_compilation_segmenter.git cd youtube_compilation_segmenter -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Python dependencies:
pip install numpy scipy librosa pydub yt-dlp requests pyyaml
-
Install FFmpeg:
- Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg - macOS:
brew install ffmpeg - Windows: Download binaries from ffmpeg.org and add to PATH.
- Ubuntu/Debian:
The pipeline requires:
- Primary URI: A valid YouTube URL pointing to a continuous audio stream (e.g., DJ mix).
- Configuration Manifest (
config.yaml): A YAML file in the working directory containing:acr_host,acr_key,acr_secret: ACRCloud credentials.output_dir: Target path for assets.min_dur,max_dur: Segmentation constraints.debug_mode: Boolean flag.
The notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell, which demonstrates how to use the top-level execute_mir_pipeline orchestrator:
# Final cell of the notebook
if __name__ == '__main__':
# 1. Load the master configuration from the YAML file.
import yaml
with open("config.yaml", "r") as f:
study_config = yaml.safe_load(f)
# 2. Define the target stream
target_uri = "https://www.youtube.com/watch?v=example_video_id"
# 3. Execute the entire replication study.
results = execute_mir_pipeline(
youtube_url=target_uri,
acr_host=study_config['acr_host'],
acr_key=study_config['acr_key'],
acr_secret=study_config['acr_secret'],
output_dir=study_config['output_dir'],
min_dur=study_config.get('min_dur', 30.0),
max_dur=study_config.get('max_dur', 600.0),
debug_mode=study_config.get('debug_mode', False)
)
# 4. Access results
for res in results:
print(f"Segment: {res.start_s:.2f}s - {res.end_s:.2f}s | ID: {res.recognition}")The pipeline produces the following artifacts in the output_dir:
seg_XXXXX_start_end.wav: Individual audio snippets for each detected track.seg_XXXXX_start_end.wav.json: Sidecar metadata files containing timestamps, run IDs, and recognition payloads.SegmentResultObjects: In-memory list of dataclasses returned by the function.
youtube_compilation_segmenter/
│
├── youtube_compilation_segmenter_draft.ipynb # Main implementation notebook
├── config.yaml # Master configuration file
├── requirements.txt # Python package dependencies
│
├── LICENSE # MIT Project License File
└── README.md # This file
The pipeline is highly customizable via the config.yaml file and function parameters. Users can modify study parameters such as:
- Temporal Constraints: Adjust
min_durandmax_durto suit different genres (e.g., fast-paced radio mixes vs. long-form progressive sets). - DSP Parameters: Modify
hop_lengthandn_fftwithin theanalyzemethod for different time-frequency resolution trade-offs. - Weights: Adjust the fusion weights in
_fuse_cues_to_noveltyto prioritize specific features (e.g., prioritize Chroma for harmonic mixing analysis).
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
Future extensions could include:
- Beat-Grid Alignment: Integrating beat-tracking to snap segment boundaries to the nearest downbeat.
- Deep Learning Features: Replacing hand-crafted cues with embeddings from pre-trained models (e.g., VGGish, OpenL3).
- Source Separation: Applying Demucs or Spleeter prior to analysis to isolate vocals or drums for cleaner segmentation.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use this code or the methodology in your research, please cite the original whitepaper:
@techreport{chirinda2026automated,
title={Automated Temporal Segmentation and Semantic Annotation of Heterogeneous Audio Streams},
author={Chirinda, CS},
institution={Private Whitepaper},
year={2026}
}For the implementation itself, you may cite this repository:
Chirinda, C. (2026). YouTube Compilation Segmenter: An Open Source Implementation.
GitHub repository: https://github.com/chirindaopensource/youtube_compilation_segmenter
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including Librosa, NumPy, SciPy, Pydub, and yt-dlp.
- Special acknowledgment to ACRCloud for providing the robust audio fingerprinting API used in the semantic annotation layer.
--
This README was generated based on the structure and content of the youtube_compilation_segmenter_draft.ipynb notebook and follows best practices for research software documentation.
