Skip to content

End-to-End Python Music Information Retrieval system for automated DJ mix segmentation via multi-cue novelty fusion (RMS, Chroma, Spectral Flux), Dynamic Programming optimization, and HMAC-SHA1 cryptographic identification through ACRCloud API integration. Solves automated track boundary detection in continuous audio streams.

License

Notifications You must be signed in to change notification settings

chirindaopensource/youtube_compilation_segmenter

Repository files navigation

README.md

Automated Temporal Segmentation and Semantic Annotation of Heterogeneous Audio Streams

License: MIT Python Version Paper Type Year Discipline Data Sources Core Method Optimization Validation Robustness Code style: black Type Checking: mypy NumPy SciPy Librosa FFmpeg yt-dlp Security Jupyter

Repository: https://github.com/chirindaopensource/youtube_compilation_segmenter

Owner: 2026 Craig Chirinda (Open Source Projects)

This repository contains an independent, professional-grade Python implementation of the research methodology from the 2026 Private Whitepaper entitled "Automated Temporal Segmentation and Semantic Annotation of Heterogeneous Audio Streams" by:

  • Craig Chirinda (Lead Researcher)

The project provides a complete, end-to-end computational framework for replicating the paper's findings. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from the deterministic acquisition of heterogeneous audio streams (e.g., DJ mixes) to the rigorous Digital Signal Processing (DSP) of multi-cue novelty features, culminating in global boundary optimization via dynamic programming and cryptographic semantic annotation.

Table of Contents

Introduction

This project provides a Python implementation of the analytical framework presented in Chirinda (2026). The core of this repository is the iPython Notebook youtube_compilation_segmenter_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings. The pipeline addresses the critical challenge of automated segmentation of non-stationary audio streams, treating the segmentation task not as a heuristic silence-detection problem, but as a global optimization problem over a probabilistic novelty surface.

The paper argues that traditional energy-based segmentation fails in the context of "mix culture" due to crossfades and harmonic mixing. This codebase operationalizes the proposed solution: a Multi-Cue Novelty Fusion Engine that:

  • Validates signal integrity using strict IEEE-754 sanitization and normalization.
  • Synthesizes a unified novelty curve from temporal (RMS, Onset), spectral (Flux, Centroid), and harmonic (Chroma) features.
  • Optimizes segmentation boundaries using a Constrained Shortest Path algorithm on a Directed Acyclic Graph (DAG).
  • Annotates segments via cryptographically signed requests to the ACRCloud fingerprinting database.

Theoretical Background

The implemented methods combine techniques from Digital Signal Processing, Combinatorial Optimization, and Cryptography.

1. Multi-Dimensional Feature Engineering: The system transforms the raw time-domain signal $x[n]$ into a set of normalized feature vectors:

  • Spectral Flux: Quantifies the rate of change in the spectral magnitude $|X(m, k)|$.
  • Chroma Novelty: Measures shifts in the harmonic/pitch-class distribution, robust to timbral changes.
  • Onset Strength: Detects percussive and transient events.

2. Novelty Fusion and Detrending: Features are fused into a scalar novelty curve $N[m]$ via weighted linear combination, followed by median-filter baseline subtraction to isolate local events from global trends: $$ N_{detrend}[m] = \text{ReLU}(N_{raw}[m] - \text{MedianFilter}(N_{raw}[m])) $$

3. Global Optimization via Dynamic Programming: The system solves the "segmentation problem" by finding a subsequence of boundaries that maximizes total novelty score subject to hard duration constraints $[\tau_{min}, \tau_{max}]$. This is implemented as a dynamic programming algorithm where the objective function is the sum of boundary confidences along a valid path.

4. Cryptographic Identification: Semantic annotation is secured via HMAC-SHA1 signing of canonical request strings: $$ \sigma = \text{Base64}(\text{HMAC}{\text{SHA1}}(K{secret}, \text{Method} || \text{URI} || K_{access} || \dots || t)) $$

Below is a diagram which summarizes the proposed approach:

Automated Temporal Segmentation IPO Summary

Features

The provided iPython Notebook (youtube_compilation_segmenter_draft.ipynb) implements the full research pipeline, including:

  • Robust DSP Pipeline: Implementation of STFT, Mel-spectrograms, and Chroma feature extraction with windowing and padding control.
  • Configuration-Driven Design: All study parameters (API credentials, duration constraints, DSP hyperparameters) are managed in an external config.yaml file.
  • Idempotent Systems Programming: Safe directory creation, atomic file operations, and rigorous cleanup of temporary resources.
  • Deterministic Execution: Enforces reproducibility through fixed random seeds (where applicable) and deterministic sorting of file paths.
  • Type Safety: Extensive use of Python typing (Protocols, Dataclasses) to enforce interface contracts.
  • Reproducible Artifacts: Generates structured SegmentResult objects, WAV assets, and JSON sidecars for every detected segment.

Methodology Implemented

The core analytical steps directly implement the methodology from the whitepaper:

  1. Stream Acquisition (Ingestion): Deterministic download via yt-dlp and conversion to 22.05kHz mono WAV.
  2. Signal Conditioning: Loading via librosa, NaN/Inf sanitization, and robust min-max normalization.
  3. Feature Extraction: Computation of RMS, Spectral Centroid, Spectral Flux, Chroma, and Onset Strength.
  4. Fusion & Peak Picking: Weighted combination of cues, adaptive thresholding using local moving statistics.
  5. Boundary Refinement: Dynamic programming to enforce $min_duration \leq \Delta t \leq max_duration$.
  6. Snippet Export: Precision slicing of audio assets with padding.
  7. Semantic Annotation: Probe extraction and HMAC-signed API requests to ACRCloud.

Core Components (Notebook Structure)

The notebook is structured as a logical pipeline with modular classes and functions. All callables are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.

  • _ensure_dir: Idempotent filesystem primitive.
  • _safe_normalize: Robust numerical normalization.
  • SegmentBoundary / SegmentResult: Immutable data structures for topology and metadata.
  • ACRCloudRecognizer: Cryptographic API client.
  • YouTubeCompilationAnalyzer: Monolithic DSP and optimization orchestrator.

Key Callable: execute_mir_pipeline

The project is designed around a single, top-level user-facing interface function:

  • execute_mir_pipeline: This master orchestrator function runs the entire automated research pipeline from end-to-end. A single call to this function reproduces the entire computational portion of the project, managing data flow between ingestion, DSP, optimization, and identification modules.

Prerequisites

  • Python 3.9+
  • FFmpeg must be installed and available on the system PATH.
  • Core dependencies: numpy, scipy, librosa, pydub, yt_dlp, requests, pyyaml.

Installation

  1. Clone the repository:

    git clone https://github.com/chirindaopensource/youtube_compilation_segmenter.git
    cd youtube_compilation_segmenter
  2. Create and activate a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install Python dependencies:

    pip install numpy scipy librosa pydub yt-dlp requests pyyaml
  4. Install FFmpeg:

    • Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
    • macOS: brew install ffmpeg
    • Windows: Download binaries from ffmpeg.org and add to PATH.

Input Data Structure

The pipeline requires:

  1. Primary URI: A valid YouTube URL pointing to a continuous audio stream (e.g., DJ mix).
  2. Configuration Manifest (config.yaml): A YAML file in the working directory containing:
    • acr_host, acr_key, acr_secret: ACRCloud credentials.
    • output_dir: Target path for assets.
    • min_dur, max_dur: Segmentation constraints.
    • debug_mode: Boolean flag.

Usage

The notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell, which demonstrates how to use the top-level execute_mir_pipeline orchestrator:

# Final cell of the notebook

if __name__ == '__main__':
    # 1. Load the master configuration from the YAML file.
    import yaml
    with open("config.yaml", "r") as f:
        study_config = yaml.safe_load(f)
    
    # 2. Define the target stream
    target_uri = "https://www.youtube.com/watch?v=example_video_id"

    # 3. Execute the entire replication study.
    results = execute_mir_pipeline(
        youtube_url=target_uri,
        acr_host=study_config['acr_host'],
        acr_key=study_config['acr_key'],
        acr_secret=study_config['acr_secret'],
        output_dir=study_config['output_dir'],
        min_dur=study_config.get('min_dur', 30.0),
        max_dur=study_config.get('max_dur', 600.0),
        debug_mode=study_config.get('debug_mode', False)
    )
    
    # 4. Access results
    for res in results:
        print(f"Segment: {res.start_s:.2f}s - {res.end_s:.2f}s | ID: {res.recognition}")

Output Structure

The pipeline produces the following artifacts in the output_dir:

  • seg_XXXXX_start_end.wav: Individual audio snippets for each detected track.
  • seg_XXXXX_start_end.wav.json: Sidecar metadata files containing timestamps, run IDs, and recognition payloads.
  • SegmentResult Objects: In-memory list of dataclasses returned by the function.

Project Structure

youtube_compilation_segmenter/
│
├── youtube_compilation_segmenter_draft.ipynb  # Main implementation notebook
├── config.yaml                                # Master configuration file
├── requirements.txt                           # Python package dependencies
│
├── LICENSE                                    # MIT Project License File
└── README.md                                  # This file

Customization

The pipeline is highly customizable via the config.yaml file and function parameters. Users can modify study parameters such as:

  • Temporal Constraints: Adjust min_dur and max_dur to suit different genres (e.g., fast-paced radio mixes vs. long-form progressive sets).
  • DSP Parameters: Modify hop_length and n_fft within the analyze method for different time-frequency resolution trade-offs.
  • Weights: Adjust the fusion weights in _fuse_cues_to_novelty to prioritize specific features (e.g., prioritize Chroma for harmonic mixing analysis).

Contributing

Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.

Recommended Extensions

Future extensions could include:

  • Beat-Grid Alignment: Integrating beat-tracking to snap segment boundaries to the nearest downbeat.
  • Deep Learning Features: Replacing hand-crafted cues with embeddings from pre-trained models (e.g., VGGish, OpenL3).
  • Source Separation: Applying Demucs or Spleeter prior to analysis to isolate vocals or drums for cleaner segmentation.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use this code or the methodology in your research, please cite the original whitepaper:

@techreport{chirinda2026automated,
  title={Automated Temporal Segmentation and Semantic Annotation of Heterogeneous Audio Streams},
  author={Chirinda, CS},
  institution={Private Whitepaper},
  year={2026}
}

For the implementation itself, you may cite this repository:

Chirinda, C. (2026). YouTube Compilation Segmenter: An Open Source Implementation.
GitHub repository: https://github.com/chirindaopensource/youtube_compilation_segmenter

Acknowledgments

  • This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including Librosa, NumPy, SciPy, Pydub, and yt-dlp.
  • Special acknowledgment to ACRCloud for providing the robust audio fingerprinting API used in the semantic annotation layer.

--

This README was generated based on the structure and content of the youtube_compilation_segmenter_draft.ipynb notebook and follows best practices for research software documentation.

About

End-to-End Python Music Information Retrieval system for automated DJ mix segmentation via multi-cue novelty fusion (RMS, Chroma, Spectral Flux), Dynamic Programming optimization, and HMAC-SHA1 cryptographic identification through ACRCloud API integration. Solves automated track boundary detection in continuous audio streams.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published