Real-time baby cry detection and reason classification running locally on your laptop. Uses an Android phone as a remote microphone over Wi-Fi and sends Telegram alerts when a cry is identified.
Android Phone ──Wi-Fi──▶ Virtual Audio Device ──▶ audio_streamer.py (ring buffer)
│
4s waveform
▼
YAMNet (Stage 1)
cry detected?
│ yes
▼
Custom Dense Net (Stage 2)
MFCC ➜ reason label
│ confidence ≥ 80%
▼
Telegram notification
Stage 1 — Cry detection: Pre-trained YAMNet model (trained on Google's AudioSet — 2M+ clips, 521 classes) identifies whether the audio contains a baby cry (class 20).
Stage 2 — Reason classification: A trained dense network takes 40 MFCCs and predicts one of 5 cry reasons: hunger, pain, gas, tiredness, discomfort. Trained on the Donate-a-Cry corpus with data augmentation and class weighting to handle imbalanced data.
- Python 3.10+
- PortAudio (
brew install portaudioon macOS) - Android phone running AudioRelay or WO Mic to stream audio over Wi-Fi to a virtual audio device on your laptop
# Clone and enter the project
cd baby-cry-classifier
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy and configure environment variables
cp .env.example .env
# Edit .env with your Telegram bot token, chat ID, and virtual mic device namepython list_devices.pyCopy the name of your virtual mic device into VIRTUAL_MIC_DEVICE_NAME in .env.
- Create a bot via @BotFather on Telegram (
/newbot). - Send a message to your new bot, then visit
https://api.telegram.org/bot<TOKEN>/getUpdatesto find your chat ID. - Set
TELEGRAM_BOT_TOKENandTELEGRAM_CHAT_IDin.env.
# Full pipeline with Telegram notifications
python main.py
# Console-only mode (no Telegram)
python main.py --no-notify
# Override the audio device
python main.py --device "MacBook Pro Microphone"Press Ctrl+C to stop.
baby-cry-classifier/
├── .env.example # Environment variable template
├── .gitignore
├── requirements.txt
├── README.md
├── config.py # All constants and settings
├── audio_streamer.py # PyAudio capture with ring buffer
├── inference_engine.py # YAMNet + custom classifier (auto-loads trained model)
├── notifier.py # Telegram Bot API with cooldown
├── main.py # Entry point
├── list_devices.py # Audio device enumeration utility
├── train.py # Stage 2 model training script
├── models/
│ ├── cry_classifier.h5 # Trained Keras model
│ └── cry_classifier.tflite # Trained TFLite model (used at runtime)
└── data/ # Dataset (git-ignored)
└── donateacry-corpus/
A pre-trained model is included in models/. To retrain or improve it:
# 1. Clone the dataset (if not already present)
git clone https://github.com/gveres/donateacry-corpus.git data/donateacry-corpus
# 2. Run training
python train.py
# 3. Customize training
python train.py --epochs 100 --no-tfliteThe training script:
- Loads audio clips from 5 folders:
hungry,belly_pain,burping,tired,discomfort - Maps them to labels: hunger, pain, gas, tiredness, discomfort
- Applies data augmentation (pitch shift, time stretch, noise) to handle class imbalance (382 hungry vs 8 burping)
- Computes class weights for balanced learning
- Trains a dense network: Input(40) → Dense(128) → BatchNorm → Dropout → Dense(64) → BatchNorm → Dropout → Dense(5, softmax)
- Saves both
.h5and.tflitemodels tomodels/ inference_engine.pyauto-loads the TFLite model on startup (no code changes needed)
- More data: The Donate-a-Cry dataset is small (457 clips, heavily skewed toward "hungry"). Collecting more clips per class would significantly improve accuracy.
- Lighter deployment: Convert the TF Hub YAMNet to TFLite and switch to
tflite-runtimeto drop the ~500 MBtensorflowdependency. - Additional notification channels: The
notifier.pymodule can be extended to support push notifications, webhooks, or other messaging services.