Baby Cry Classifier

Real-time baby cry detection and reason classification running locally on your laptop. Uses an Android phone as a remote microphone over Wi-Fi and sends Telegram alerts when a cry is identified.

Architecture

Android Phone ──Wi-Fi──▶ Virtual Audio Device ──▶ audio_streamer.py (ring buffer)
                                                        │
                                                   4s waveform
                                                        ▼
                                                 YAMNet (Stage 1)
                                                  cry detected?
                                                        │ yes
                                                        ▼
                                              Custom Dense Net (Stage 2)
                                              MFCC ➜ reason label
                                                        │ confidence ≥ 80%
                                                        ▼
                                               Telegram notification

Stage 1 — Cry detection: Pre-trained YAMNet model (trained on Google's AudioSet — 2M+ clips, 521 classes) identifies whether the audio contains a baby cry (class 20).

Stage 2 — Reason classification: A trained dense network takes 40 MFCCs and predicts one of 5 cry reasons: hunger, pain, gas, tiredness, discomfort. Trained on the Donate-a-Cry corpus with data augmentation and class weighting to handle imbalanced data.

Prerequisites

Python 3.10+
PortAudio (brew install portaudio on macOS)
Android phone running AudioRelay or WO Mic to stream audio over Wi-Fi to a virtual audio device on your laptop

Setup

# Clone and enter the project
cd baby-cry-classifier

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and configure environment variables
cp .env.example .env
# Edit .env with your Telegram bot token, chat ID, and virtual mic device name

Find your audio device name

python list_devices.py

Copy the name of your virtual mic device into VIRTUAL_MIC_DEVICE_NAME in .env.

Set up Telegram notifications

Create a bot via @BotFather on Telegram (/newbot).
Send a message to your new bot, then visit https://api.telegram.org/bot<TOKEN>/getUpdates to find your chat ID.
Set TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID in .env.

Usage

# Full pipeline with Telegram notifications
python main.py

# Console-only mode (no Telegram)
python main.py --no-notify

# Override the audio device
python main.py --device "MacBook Pro Microphone"

Press Ctrl+C to stop.

Project Structure

baby-cry-classifier/
├── .env.example          # Environment variable template
├── .gitignore
├── requirements.txt
├── README.md
├── config.py             # All constants and settings
├── audio_streamer.py     # PyAudio capture with ring buffer
├── inference_engine.py   # YAMNet + custom classifier (auto-loads trained model)
├── notifier.py           # Telegram Bot API with cooldown
├── main.py               # Entry point
├── list_devices.py       # Audio device enumeration utility
├── train.py              # Stage 2 model training script
├── models/
│   ├── cry_classifier.h5       # Trained Keras model
│   └── cry_classifier.tflite   # Trained TFLite model (used at runtime)
└── data/                 # Dataset (git-ignored)
    └── donateacry-corpus/

Training the Stage 2 Model

A pre-trained model is included in models/. To retrain or improve it:

# 1. Clone the dataset (if not already present)
git clone https://github.com/gveres/donateacry-corpus.git data/donateacry-corpus

# 2. Run training
python train.py

# 3. Customize training
python train.py --epochs 100 --no-tflite

The training script:

Loads audio clips from 5 folders: hungry, belly_pain, burping, tired, discomfort
Maps them to labels: hunger, pain, gas, tiredness, discomfort
Applies data augmentation (pitch shift, time stretch, noise) to handle class imbalance (382 hungry vs 8 burping)
Computes class weights for balanced learning
Trains a dense network: Input(40) → Dense(128) → BatchNorm → Dropout → Dense(64) → BatchNorm → Dropout → Dense(5, softmax)
Saves both .h5 and .tflite models to models/
inference_engine.py auto-loads the TFLite model on startup (no code changes needed)

Future Improvements

More data: The Donate-a-Cry dataset is small (457 clips, heavily skewed toward "hungry"). Collecting more clips per class would significantly improve accuracy.
Lighter deployment: Convert the TF Hub YAMNet to TFLite and switch to tflite-runtime to drop the ~500 MB tensorflow dependency.
Additional notification channels: The notifier.py module can be extended to support push notifications, webhooks, or other messaging services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Baby Cry Classifier

Architecture

Prerequisites

Setup

Find your audio device name

Set up Telegram notifications

Usage

Project Structure

Training the Stage 2 Model

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
models		models
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
audio_streamer.py		audio_streamer.py
config.py		config.py
inference_engine.py		inference_engine.py
list_devices.py		list_devices.py
main.py		main.py
notifier.py		notifier.py
requirements.txt		requirements.txt
train.py		train.py

devagupt/baby-cry-classifier

Folders and files

Latest commit

History

Repository files navigation

Baby Cry Classifier

Architecture

Prerequisites

Setup

Find your audio device name

Set up Telegram notifications

Usage

Project Structure

Training the Stage 2 Model

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages