Skip to content

Real-time baby cry detection and reason classification using YAMNet + custom classifier, with Telegram notifications

Notifications You must be signed in to change notification settings

devagupt/baby-cry-classifier

Repository files navigation

Baby Cry Classifier

Real-time baby cry detection and reason classification running locally on your laptop. Uses an Android phone as a remote microphone over Wi-Fi and sends Telegram alerts when a cry is identified.

Architecture

Android Phone ──Wi-Fi──▶ Virtual Audio Device ──▶ audio_streamer.py (ring buffer)
                                                        │
                                                   4s waveform
                                                        ▼
                                                 YAMNet (Stage 1)
                                                  cry detected?
                                                        │ yes
                                                        ▼
                                              Custom Dense Net (Stage 2)
                                              MFCC ➜ reason label
                                                        │ confidence ≥ 80%
                                                        ▼
                                               Telegram notification

Stage 1 — Cry detection: Pre-trained YAMNet model (trained on Google's AudioSet — 2M+ clips, 521 classes) identifies whether the audio contains a baby cry (class 20).

Stage 2 — Reason classification: A trained dense network takes 40 MFCCs and predicts one of 5 cry reasons: hunger, pain, gas, tiredness, discomfort. Trained on the Donate-a-Cry corpus with data augmentation and class weighting to handle imbalanced data.

Prerequisites

  • Python 3.10+
  • PortAudio (brew install portaudio on macOS)
  • Android phone running AudioRelay or WO Mic to stream audio over Wi-Fi to a virtual audio device on your laptop

Setup

# Clone and enter the project
cd baby-cry-classifier

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your Telegram bot token, chat ID, and virtual mic device name

Find your audio device name

python list_devices.py

Copy the name of your virtual mic device into VIRTUAL_MIC_DEVICE_NAME in .env.

Set up Telegram notifications

  1. Create a bot via @BotFather on Telegram (/newbot).
  2. Send a message to your new bot, then visit https://api.telegram.org/bot<TOKEN>/getUpdates to find your chat ID.
  3. Set TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID in .env.

Usage

# Full pipeline with Telegram notifications
python main.py

# Console-only mode (no Telegram)
python main.py --no-notify

# Override the audio device
python main.py --device "MacBook Pro Microphone"

Press Ctrl+C to stop.

Project Structure

baby-cry-classifier/
├── .env.example          # Environment variable template
├── .gitignore
├── requirements.txt
├── README.md
├── config.py             # All constants and settings
├── audio_streamer.py     # PyAudio capture with ring buffer
├── inference_engine.py   # YAMNet + custom classifier (auto-loads trained model)
├── notifier.py           # Telegram Bot API with cooldown
├── main.py               # Entry point
├── list_devices.py       # Audio device enumeration utility
├── train.py              # Stage 2 model training script
├── models/
│   ├── cry_classifier.h5       # Trained Keras model
│   └── cry_classifier.tflite   # Trained TFLite model (used at runtime)
└── data/                 # Dataset (git-ignored)
    └── donateacry-corpus/

Training the Stage 2 Model

A pre-trained model is included in models/. To retrain or improve it:

# 1. Clone the dataset (if not already present)
git clone https://github.com/gveres/donateacry-corpus.git data/donateacry-corpus

# 2. Run training
python train.py

# 3. Customize training
python train.py --epochs 100 --no-tflite

The training script:

  • Loads audio clips from 5 folders: hungry, belly_pain, burping, tired, discomfort
  • Maps them to labels: hunger, pain, gas, tiredness, discomfort
  • Applies data augmentation (pitch shift, time stretch, noise) to handle class imbalance (382 hungry vs 8 burping)
  • Computes class weights for balanced learning
  • Trains a dense network: Input(40) → Dense(128) → BatchNorm → Dropout → Dense(64) → BatchNorm → Dropout → Dense(5, softmax)
  • Saves both .h5 and .tflite models to models/
  • inference_engine.py auto-loads the TFLite model on startup (no code changes needed)

Future Improvements

  • More data: The Donate-a-Cry dataset is small (457 clips, heavily skewed toward "hungry"). Collecting more clips per class would significantly improve accuracy.
  • Lighter deployment: Convert the TF Hub YAMNet to TFLite and switch to tflite-runtime to drop the ~500 MB tensorflow dependency.
  • Additional notification channels: The notifier.py module can be extended to support push notifications, webhooks, or other messaging services.

About

Real-time baby cry detection and reason classification using YAMNet + custom classifier, with Telegram notifications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages