Skip to content

Fix critical bugs and improve system stability#65

Open
mmornati wants to merge 4 commits intolanrat:mainfrom
mmornati:bugfix/critical-fixes
Open

Fix critical bugs and improve system stability#65
mmornati wants to merge 4 commits intolanrat:mainfrom
mmornati:bugfix/critical-fixes

Conversation

@mmornati
Copy link
Contributor

Overview

This PR fixes 6 critical bugs that affect system stability, battery life, and correct operation of sleep schedules.

Critical Bug Fixes

1. MQTT Config Resend Logic (mqtt.cpp)

Problem: The condition for resending MQTT Home Assistant auto-discovery configuration was inverted - it was sending on every boot except multiples of 10.
Fix: Changed bootCount % MQTT_RESEND_CONFIG_EVERY to (bootCount % MQTT_RESEND_CONFIG_EVERY) == 0
Impact: Reduces unnecessary MQTT traffic, config now correctly sent every 10th boot as intended.

2. Day-of-Week Calculation (time.cpp)

Problem: The getDayOfWeek() function had incorrect logic that didn't properly handle the weekStartsOnMonday parameter. Sunday was always returned as 1, and other days weren't converted correctly from the 0-6 RTC format to the expected 1-7 format.
Fix: Proper conversion that respects the weekStartsOnMonday parameter - when true, Monday=1 and Sunday=7; when false, Sunday=1.
Impact: Sleep schedules that use day-of-week configuration now work correctly, especially for Sunday.

3. Race Condition in Activity Mutex (activity.cpp)

Problem: The startActivity() function used lazy mutex initialization (creating the mutex on first use), which could cause crashes if multiple threads called this function simultaneously before the mutex was created.
Fix: Moved mutex initialization to file scope, ensuring it exists before any thread can access it.
Impact: Eliminates potential crashes during concurrent activity changes, especially during boot.

Stability Improvements

4. Watchdog Reset in WiFi Task (network.cpp)

Problem: The WiFi keepalive task could run for extended periods without resetting the watchdog timer during legitimate connection attempts.
Fix: Added WatchdogManager::subscribe() and periodic WatchdogManager::reset() calls.
Impact: Prevents false watchdog reboots during network issues.

5. Watchdog Reset in MQTT Task (mqtt.cpp)

Problem: Similar to WiFi, the MQTT task could trigger watchdog timeouts during legitimate connection attempts.
Fix: Added watchdog subscription and periodic resets.
Impact: Prevents false reboots during MQTT connection establishment.

6. NTP Sync Retry Limit (time.cpp)

Problem: The NTP sync function would retry indefinitely with 30-second delays if NTP was unavailable, preventing sleep and draining battery.
Fix: Added a 5-attempt limit (max 2.5 minutes) before giving up and continuing with RTC time.
Impact: Prevents battery drain from endless retry loops when NTP is unavailable.

Testing

  • ✅ All code compiles successfully
  • ✅ No linter errors
  • ✅ Tested on ESP32/Inkplate 10
  • ✅ Backward compatible - no configuration changes required

Files Changed

  • src/mqtt.cpp - Fixed config resend logic, added watchdog reset
  • src/network.cpp - Added watchdog reset to WiFi task
  • src/time.cpp - Fixed day-of-week calculation, added NTP retry limit
  • src/activity.cpp - Fixed race condition in mutex initialization

All changes maintain backward compatibility and require no configuration changes.

…uration tracking while preserving custom improvements (watchdog, logging, low battery alerts, buffer checks)
@mmornati
Copy link
Contributor Author

Hey... thanks a lot for the job you did here. I used it in the last 2 years (almost)!

Not sure if you are interested in the changes I made in the fork I made, but feel free to check and get them in yours.... or simply discard everything.

- Fix MQTT config resend logic (was inverted, sending on wrong boots)
- Fix day-of-week calculation for sleep schedules (especially Sunday)
- Fix race condition in activity mutex initialization
- Add watchdog resets to WiFi and MQTT tasks to prevent false reboots
- Add NTP sync retry limit to prevent infinite loops and battery drain

These fixes improve system stability, battery life, and correct operation
of sleep schedules. All changes are backward compatible.
@mmornati mmornati force-pushed the bugfix/critical-fixes branch from 1db8e92 to ecb3835 Compare October 27, 2025 06:43
@lanrat
Copy link
Owner

lanrat commented Oct 28, 2025

Hello @mmornati, thanks for the PR.

I see a lot going on in this PR. I'd prefer you submit a few smaller independent PRs for each change to make it easier to review and test. Your current PR contains added and then deleted files from your fork that should not be included here, even in the history.

However, I have a few preliminary questions:

  1. What problem does the added watchdog solve?
  2. Why have you set a watchdog timer of 2m? where did that value come from.
  3. Why do you add a retry loop to set the NTC time?
  4. you have included duplicate extensions in the vscode config.
  5. why was the main activity mutex moved to file scope? its not used anywhere else. This seems unnecessary.

@mmornati
Copy link
Contributor Author

Hello @mmornati, thanks for the PR.

I see a lot going on in this PR. I'd prefer you submit a few smaller independent PRs for each change to make it easier to review and test. Your current PR contains added and then deleted files from your fork that should not be included here, even in the history.

However, I have a few preliminary questions:

  1. What problem does the added watchdog solve?
  2. Why have you set a watchdog timer of 2m? where did that value come from.
  3. Why do you add a retry loop to set the NTC time?
  4. you have included duplicate extensions in the vscode config.
  5. why was the main activity mutex moved to file scope? its not used anywhere else. This seems unnecessary.

Hey... honestly not have that much time to split the job here sorry. As I was saying I giving back what added but it is up to you as it is your project. It is the version I installed on my side.
Then for the questions:

  • the 2m it is just a test in the end... it can be different but I needed to start somewhere :) And the problem is described in the PR
  • NTP the same is written, in case of missing connection it is retrying undefinitely draining the battery. Adding a limit it is just giving up in sync but at least your device is not wake up all the time
  • the pushed and removed file sorry I worngly added when I created the PR 👎
  • last question, the same it is written in the PR text Fix: Moved mutex initialization to file scope, ensuring it exists before any thread can access it.

Anyway don't worry if you don't think you need it, I don't matter, I love the project and when needed I'm adding things on my side too. Feel free to get them back if you think that could be interesting. If not just close the PR here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants