|
| 1 | +# ModeManager Safe Mode Finite State Machine |
| 2 | + |
| 3 | +This document describes the finite state machine (FSM) for the ModeManager component's Safe Mode transitions. The ModeManager controls system operational modes and orchestrates transitions between NORMAL and SAFE_MODE based on various triggers. |
| 4 | + |
| 5 | +## State Machine Diagram |
| 6 | + |
| 7 | +```mermaid |
| 8 | +stateDiagram-v2 |
| 9 | + [*] --> NORMAL: System Boot (clean shutdown) |
| 10 | + [*] --> SAFE_MODE: System Boot (unintended reboot) |
| 11 | +
|
| 12 | + NORMAL --> SAFE_MODE: Entry Trigger |
| 13 | + SAFE_MODE --> NORMAL: Exit Trigger |
| 14 | +
|
| 15 | + note right of NORMAL |
| 16 | + Active Components: |
| 17 | + - All 6 face load switches ON |
| 18 | + - Payload switches OFF (manual control) |
| 19 | + - Voltage monitoring active |
| 20 | + - SafeModeReason = NONE |
| 21 | + end note |
| 22 | +
|
| 23 | + note right of SAFE_MODE |
| 24 | + Protected State: |
| 25 | + - All 8 load switches OFF |
| 26 | + - Power consumption minimized |
| 27 | + - SafeModeReason tracks cause |
| 28 | + - Safe mode sequence executed |
| 29 | + end note |
| 30 | +
|
| 31 | + state NORMAL { |
| 32 | + [*] --> VoltageMonitoring |
| 33 | + VoltageMonitoring --> VoltageMonitoring: Voltage OK<br/>(Reset counter) |
| 34 | + VoltageMonitoring --> LowVoltageDebounce: Voltage < 6.7V<br/>(Increment counter) |
| 35 | + LowVoltageDebounce --> VoltageMonitoring: Voltage OK<br/>(Reset counter) |
| 36 | + LowVoltageDebounce --> [*]: Counter >= 10s<br/>(Trigger Safe Mode) |
| 37 | + } |
| 38 | +
|
| 39 | + state SAFE_MODE { |
| 40 | + [*] --> ReasonCheck |
| 41 | + ReasonCheck --> LOW_BATTERY_Mode: Reason = LOW_BATTERY |
| 42 | + ReasonCheck --> MANUAL_EXIT_Mode: Reason != LOW_BATTERY |
| 43 | +
|
| 44 | + LOW_BATTERY_Mode --> RecoveryMonitoring |
| 45 | + RecoveryMonitoring --> RecoveryMonitoring: Voltage < 8.0V<br/>(Reset counter) |
| 46 | + RecoveryMonitoring --> VoltageRecoveryDebounce: Voltage > 8.0V<br/>(Increment counter) |
| 47 | + VoltageRecoveryDebounce --> RecoveryMonitoring: Voltage < 8.0V<br/>(Reset counter) |
| 48 | + VoltageRecoveryDebounce --> [*]: Counter >= 10s<br/>(Auto-Exit) |
| 49 | +
|
| 50 | + MANUAL_EXIT_Mode --> MANUAL_EXIT_Mode: Waiting for<br/>EXIT_SAFE_MODE<br/>command |
| 51 | + MANUAL_EXIT_Mode --> [*]: EXIT_SAFE_MODE<br/>command received |
| 52 | + } |
| 53 | +``` |
| 54 | + |
| 55 | +## State Transitions |
| 56 | + |
| 57 | +### Entry Transitions: NORMAL → SAFE_MODE |
| 58 | + |
| 59 | +The ModeManager can enter SAFE_MODE from NORMAL mode through multiple triggers, each assigned a specific `SafeModeReason`: |
| 60 | + |
| 61 | +| Trigger | SafeModeReason | Description | Debounce | |
| 62 | +|---------|----------------|-------------|----------| |
| 63 | +| **Auto: Low Voltage** | `LOW_BATTERY` (1) | Voltage drops below `SafeModeEntryVoltage` parameter (default: 6.7V) | 10 seconds (configurable) | |
| 64 | +| **Command: FORCE_SAFE_MODE** | `GROUND_COMMAND` (3) | Ground operator issues FORCE_SAFE_MODE command | Immediate | |
| 65 | +| **Port: forceSafeMode** | `EXTERNAL_REQUEST` (4) or custom | External component calls forceSafeMode port (e.g., watchdog timeout) | Immediate | |
| 66 | +| **Auto: Unintended Reboot** | `SYSTEM_FAULT` (2) | System boots with cleanShutdown flag = 0 in NORMAL mode | At boot only | |
| 67 | +| **Port: forceSafeMode (LoRa)** | `LORA` (5) | LoRa driver detects communication timeout/fault | Immediate | |
| 68 | + |
| 69 | +#### Entry Actions |
| 70 | +When entering SAFE_MODE, the ModeManager: |
| 71 | +1. Executes the safe mode radio sequence (`/seq/enter_safe.bin`) |
| 72 | +2. Sets `m_mode = SAFE_MODE` |
| 73 | +3. Increments `m_safeModeEntryCount` (persisted) |
| 74 | +4. Sets `m_safeModeReason` to the trigger reason |
| 75 | +5. Emits one or more events: |
| 76 | + - `EnteringSafeMode(reason: string)` - Severity: WARNING_HI (always emitted with reason string) |
| 77 | + - `AutoSafeModeEntry(reason: SafeModeReason, voltage: F32)` - Severity: WARNING_HI (for LOW_BATTERY trigger) |
| 78 | + - `UnintendedRebootDetected()` - Severity: WARNING_HI (for SYSTEM_FAULT at boot) |
| 79 | + - `ManualSafeModeEntry()` - Severity: ACTIVITY_HI (for FORCE_SAFE_MODE command) |
| 80 | + - `ExternalFaultDetected()` - Severity: WARNING_HI (for forceSafeMode port call) |
| 81 | +6. Turns OFF all 8 load switches via `loadSwitchTurnOff` ports |
| 82 | +7. Notifies other components via `modeChanged` port with SAFE_MODE value |
| 83 | +8. Executes safe mode sequence via `runSequence` port (may emit `SafeModeSequenceCompleted` or `SafeModeSequenceFailed`) |
| 84 | +9. Saves state to persistent storage (`/mode_state.bin`) |
| 85 | + |
| 86 | +### Exit Transitions: SAFE_MODE → NORMAL |
| 87 | + |
| 88 | +Exit from SAFE_MODE depends on the `SafeModeReason`: |
| 89 | + |
| 90 | +| Exit Method | Conditions | Applicable Reasons | |
| 91 | +|-------------|------------|-------------------| |
| 92 | +| **Auto-Recovery** | Voltage > `SafeModeRecoveryVoltage` (default: 8.0V) for 10+ seconds | `LOW_BATTERY` only | |
| 93 | +| **Manual Command** | Ground operator issues EXIT_SAFE_MODE command | All reasons | |
| 94 | + |
| 95 | +#### Exit Actions |
| 96 | +When exiting SAFE_MODE, the ModeManager: |
| 97 | +1. Sets `m_mode = NORMAL` |
| 98 | +2. Clears `m_safeModeReason = NONE` |
| 99 | +3. Emits exit event: |
| 100 | + - `ExitingSafeMode()` - Severity: ACTIVITY_HI (manual EXIT_SAFE_MODE command) |
| 101 | + - `AutoSafeModeExit(voltage: F32)` - Severity: ACTIVITY_HI (auto-recovery for LOW_BATTERY) |
| 102 | +4. Turns ON face load switches (0-5) via `loadSwitchTurnOn` ports |
| 103 | + - **Note**: Payload switches (6-7) remain OFF, requiring separate commands |
| 104 | +5. Notifies other components via `modeChanged` port with NORMAL value |
| 105 | +6. Saves state to persistent storage |
| 106 | + |
| 107 | +### Boot-Time State Restoration |
| 108 | + |
| 109 | +On system initialization, the ModeManager: |
| 110 | +1. Reads persistent state from `/mode_state.bin` |
| 111 | +2. Restores `m_mode`, `m_safeModeEntryCount`, and `m_safeModeReason` |
| 112 | +3. Checks `cleanShutdown` flag: |
| 113 | + - If `cleanShutdown = 1`: Clean boot, restore physical hardware to match saved mode |
| 114 | + - If `cleanShutdown = 0` AND `m_mode = NORMAL`: Unintended reboot detected → Enter SAFE_MODE with reason `SYSTEM_FAULT` |
| 115 | +4. Clears `cleanShutdown` flag (sets to 0) for next boot detection |
| 116 | + |
| 117 | +**Clean Shutdown Protocol**: |
| 118 | +- The `prepareForReboot` port handler sets `cleanShutdown = 1` before intentional reboots |
| 119 | +- This allows detection of crashes, watchdog resets, and power loss events |
| 120 | + |
| 121 | +## State Invariants |
| 122 | + |
| 123 | +### NORMAL Mode |
| 124 | +- `m_mode = NORMAL (2)` |
| 125 | +- `m_safeModeReason = NONE (0)` |
| 126 | +- Face load switches (0-5) are ON |
| 127 | +- Payload switches (6-7) are OFF (default) |
| 128 | +- Voltage monitoring active (1Hz via `run` handler) |
| 129 | +- Low voltage counter (`m_safeModeVoltageCounter`) tracks consecutive low readings |
| 130 | +- Recovery counter (`m_recoveryVoltageCounter`) is reset to 0 |
| 131 | + |
| 132 | +### SAFE_MODE Mode |
| 133 | +- `m_mode = SAFE_MODE (1)` |
| 134 | +- `m_safeModeReason` = {`LOW_BATTERY`, `SYSTEM_FAULT`, `GROUND_COMMAND`, `EXTERNAL_REQUEST`, `LORA`} |
| 135 | +- All 8 load switches are OFF |
| 136 | +- Voltage recovery monitoring active only if `reason = LOW_BATTERY` |
| 137 | +- Recovery counter (`m_recoveryVoltageCounter`) tracks consecutive recovery readings (if LOW_BATTERY) |
| 138 | +- Low voltage counter (`m_safeModeVoltageCounter`) is reset to 0 |
| 139 | + |
| 140 | +## Voltage Hysteresis |
| 141 | + |
| 142 | +The ModeManager implements voltage hysteresis to prevent oscillation between modes: |
| 143 | + |
| 144 | +- **Entry Threshold**: 6.7V (configurable via `SafeModeEntryVoltage` parameter) |
| 145 | +- **Recovery Threshold**: 8.0V (configurable via `SafeModeRecoveryVoltage` parameter) |
| 146 | +- **Gap**: 1.3V hysteresis prevents rapid mode switching |
| 147 | + |
| 148 | +**Rationale**: Battery voltage may fluctuate under load. The higher recovery threshold ensures the system has sufficient margin before resuming normal operations. |
| 149 | + |
| 150 | +## Debouncing Logic |
| 151 | + |
| 152 | +All voltage-triggered transitions use a configurable debounce period (default: 10 seconds): |
| 153 | + |
| 154 | +``` |
| 155 | +NORMAL → SAFE_MODE (Low Voltage): |
| 156 | + - Counter increments each second voltage < 6.7V |
| 157 | + - Counter resets to 0 if voltage >= 6.7V |
| 158 | + - Transition occurs when counter >= 10 |
| 159 | +
|
| 160 | +SAFE_MODE → NORMAL (Auto-Recovery): |
| 161 | + - Counter increments each second voltage > 8.0V AND reason = LOW_BATTERY |
| 162 | + - Counter resets to 0 if voltage <= 8.0V |
| 163 | + - Transition occurs when counter >= 10 |
| 164 | +``` |
| 165 | + |
| 166 | +**Rationale**: Debouncing prevents spurious transitions due to transient voltage spikes/dips, sensor noise, or momentary load changes. |
| 167 | + |
| 168 | +## Reason-Based Recovery Rules |
| 169 | + |
| 170 | +Only `LOW_BATTERY` reason allows automatic recovery. Other reasons require manual intervention: |
| 171 | + |
| 172 | +| Reason | Auto-Recovery | Rationale | |
| 173 | +|--------|---------------|-----------| |
| 174 | +| `LOW_BATTERY` | ✅ Yes | Condition is measurable and reversible; safe to auto-recover when voltage stabilizes | |
| 175 | +| `SYSTEM_FAULT` | ❌ No | Unintended reboot indicates unknown system issue; requires ground investigation | |
| 176 | +| `GROUND_COMMAND` | ❌ No | Operator explicitly commanded safe mode; requires operator approval to exit | |
| 177 | +| `EXTERNAL_REQUEST` | ❌ No | Another component detected a fault; requires component-specific recovery | |
| 178 | +| `LORA` | ❌ No | Communication fault may indicate antenna deployment issue or ground station unavailability | |
| 179 | + |
| 180 | +## Telemetry Channels |
| 181 | + |
| 182 | +The ModeManager publishes telemetry every 1Hz via the `run` handler: |
| 183 | + |
| 184 | +- `CurrentMode`: U8 (1 = SAFE_MODE, 2 = NORMAL) |
| 185 | +- `CurrentSafeModeReason`: SafeModeReason enum (0-5) |
| 186 | +- `SafeModeEntryCount`: U32 (cumulative count, persisted across reboots) |
| 187 | + |
| 188 | +## Implementation Details |
| 189 | + |
| 190 | +### File Locations |
| 191 | +- **Component Definition**: `FprimeZephyrReference/Components/ModeManager/ModeManager.fpp` |
| 192 | +- **Implementation**: `FprimeZephyrReference/Components/ModeManager/ModeManager.cpp` |
| 193 | +- **Header**: `FprimeZephyrReference/Components/ModeManager/ModeManager.hpp` |
| 194 | +- **Integration Tests**: `FprimeZephyrReference/test/int/safe_mode_test.py` |
| 195 | + |
| 196 | +### Key Methods |
| 197 | +- `run_handler()`: 1Hz periodic handler for voltage monitoring and telemetry |
| 198 | +- `enterSafeMode(reason)`: Transition to SAFE_MODE with specified reason |
| 199 | +- `exitSafeMode()`: Transition to NORMAL (manual command) |
| 200 | +- `exitSafeModeAutomatic(voltage)`: Transition to NORMAL (auto-recovery) |
| 201 | +- `forceSafeMode_handler(reason)`: Port handler for external safe mode requests |
| 202 | +- `loadState()`: Restore state from persistent storage at boot |
| 203 | +- `saveState()`: Persist state to non-volatile storage |
| 204 | +- `prepareForReboot_handler()`: Set clean shutdown flag before intentional reboot |
| 205 | + |
| 206 | +### Persistent State Structure |
| 207 | +```cpp |
| 208 | +struct PersistentState { |
| 209 | + U8 mode; // Current mode (1 = SAFE_MODE, 2 = NORMAL) |
| 210 | + U32 safeModeEntryCount; // Number of times safe mode entered |
| 211 | + U8 safeModeReason; // Reason for safe mode entry (0-5) |
| 212 | + U8 cleanShutdown; // Clean shutdown flag (1 = clean, 0 = unclean) |
| 213 | +}; |
| 214 | +``` |
| 215 | +Stored at: `/mode_state.bin` (size is architecture-dependent due to struct padding, typically 8-12 bytes) |
| 216 | +
|
| 217 | +## Testing Validation |
| 218 | +
|
| 219 | +The Safe Mode FSM is validated through integration tests in `safe_mode_test.py`: |
| 220 | +
|
| 221 | +| Test | Validates | Status | |
| 222 | +|------|-----------|--------| |
| 223 | +| `test_safe_01` | Initial reason = NONE in NORMAL mode | ✅ Automated | |
| 224 | +| `test_safe_02` | FORCE_SAFE_MODE sets reason = GROUND_COMMAND | ✅ Automated | |
| 225 | +| `test_safe_03` | EXIT_SAFE_MODE clears reason to NONE | ✅ Automated | |
| 226 | +| `test_safe_04` | GROUND_COMMAND does not auto-recover | ✅ Automated | |
| 227 | +| `test_safe_05` | Auto-entry on low voltage → reason = LOW_BATTERY | ⏸️ Manual | |
| 228 | +| `test_safe_06` | Auto-recovery on voltage recovery (LOW_BATTERY only) | ⏸️ Manual | |
| 229 | +| `test_safe_07` | Unintended reboot → reason = SYSTEM_FAULT | ⏸️ Manual | |
| 230 | +| `test_safe_08` | Clean reboot → no safe mode entry | ⏸️ Manual | |
| 231 | +
|
| 232 | +## Design Rationale |
| 233 | +
|
| 234 | +1. **Two-State System**: Simple FSM with clear separation of concerns between operational and protected states |
| 235 | +2. **Reason Tracking**: Enables intelligent recovery decisions and diagnostics |
| 236 | +3. **Voltage Hysteresis**: Prevents mode oscillation under marginal battery conditions |
| 237 | +4. **Debouncing**: Filters out transient faults and sensor noise |
| 238 | +5. **State Persistence**: Allows unintended reboot detection and mode restoration across power cycles |
| 239 | +6. **Selective Auto-Recovery**: Only measurable/reversible conditions (LOW_BATTERY) auto-recover; others require human decision |
| 240 | +7. **Load Switch Control**: Minimizes power consumption in SAFE_MODE while preserving critical functions |
| 241 | +
|
| 242 | +## References |
| 243 | +
|
| 244 | +- [ModeManager Component Documentation](../../../docs-site/components/ModeManager.md) |
| 245 | +- [ModeManager Software Design Document](./sdd.md) |
| 246 | +- [Safe Mode Integration Tests](../../../test/int/safe_mode_test.py) |
0 commit comments