Battery Boot Failure — Root Cause & Firmware Fix
Overview
During routine field testing of Benlycos bonding router devices, our QA team observed a reproducible failure mode: units would not boot successfully when their internal battery had been fully discharged and then connected to a power source. The symptom appeared intermittently in lab conditions but was consistent at the 0–5% state-of-charge (SoC) threshold.
This article documents the full root cause investigation — covering UART log analysis, hardware-level measurements, firmware code review, and the patch that resolved the issue.
Symptom Description
Field engineers reported that after a complete battery discharge event, devices would either:
- Power on briefly (1–3 seconds) then immediately shut down
- Show an OLED initialization screen but hang at the splash stage
- Fail to enumerate USB modems despite the host controller initializing correctly
- Occasionally enter a rapid power-cycle loop (boot storm)
Normal operation resumed once the battery reached approximately 8–10% SoC after sustained charging — but this required 15–20 minutes of patience from the end user, with no feedback visible on the device itself.
Initial Investigation — UART Logs
The first step was to attach a UART-to-USB probe at 115200 baud to the device's debug header and capture the full boot log during a failure event. The log revealed the following sequence:
[ 0.000] U-Boot 2022.01 — Benlycos BLR-1040 [ 0.012] DRAM: 256 MiB [ 0.031] PMIC: battery_gauge init OK [ 0.033] BATTERY SoC: 2% [ 0.034] BATTERY voltage: 3.21V [ 0.035] WARNING: VBAT below safe threshold (3.30V) [ 0.036] Attempting warm boot... [ 0.450] Kernel loaded [ 1.211] systemd: starting core services [ 1.890] modem-manager: enumerating USB bus... [ 2.041] POWER: Vcap drop detected — 3.18V [ 2.042] PMIC: emergency shutdown triggered [ 2.043] System halting.
The key line was POWER: Vcap drop detected — 3.18V. The PMIC (Power Management IC) was registering a voltage collapse on the capacitor rail immediately after the modem-manager tried to enumerate the USB bus.
Root Cause Analysis
USB modem enumeration requires a brief but significant current surge — typically 500–900 mA for 80–120 ms during device negotiation. At low battery SoC (below ~5%), the internal cell has insufficient current delivery capability to sustain this surge while simultaneously powering the rest of the system.
The battery gauge reported SoC in percentage, but the firmware's boot decision logic only checked whether the gauge had initialised successfully — not whether the reported SoC was above the minimum operational threshold before proceeding with full system init.
In the firmware source (drivers/power/battery.c), the relevant boot gate was:
/* Pre-patch boot gate — insufficient */
if (battery_gauge_ready()) {
boot_proceed(); /* always proceeds regardless of SoC */
}The fix required a minimum SoC check before allowing the kernel to begin loading peripheral drivers:
/* Post-patch — safe threshold enforced */
#define BOOT_MIN_SOC_PERCENT 8
#define BOOT_MIN_VOLTAGE_MV 3350
if (battery_gauge_ready()) {
uint8_t soc = battery_get_soc();
uint32_t vbat = battery_get_voltage_mv();
if (soc < BOOT_MIN_SOC_PERCENT || vbat < BOOT_MIN_VOLTAGE_MV) {
oled_show_charging_screen(); /* show user feedback */
pmic_set_charge_mode(CHARGE_FAST);
halt_and_wait_for_threshold(); /* poll every 30s */
}
boot_proceed();
}The Fix
The firmware patch introduced three behavioural changes:
- 1Added a dual-condition boot gate checking both SoC percentage and raw battery voltage before proceeding past the U-Boot stage.
- 2Enabled the OLED display in a low-power mode to show a "Charging — please wait" message, eliminating the blank-screen UX issue.
- 3Set the PMIC to maximum charge rate (1.5A input) during the wait period to minimise time-to-boot for the end user.
The threshold values (BOOT_MIN_SOC_PERCENT = 8 and BOOT_MIN_VOLTAGE_MV = 3350) were selected based on oscilloscope measurements of USB enumeration current peaks at various SoC levels, with a 15% safety margin added above the measured failure boundary.
Validation & Results
After deploying the patched firmware, the QA team ran a 72-hour soak test across 20 units cycled through full discharge and recharge sequences. Results:
- Zero boot failures observed across all 20 units over 72 hours
- Average time-to-boot from empty battery reduced from 18 minutes (unaware user) to 4–6 minutes (user sees charging screen and waits)
- PMIC thermal readings remained within safe limits during fast-charge hold
- No regression in normal boot time (units at or above 8% SoC boot identically to pre-patch)
The fix was merged into the main firmware branch and shipped as OTA update v3.4.1 to all field units. No field reports of the failure mode have been received since the update rollout.
Takeaways
This failure illustrates a common embedded systems pitfall: treating peripheral initialisation success as a binary ready/not-ready signal without validating the underlying power budget. Hardware peripherals that require current bursts — USB hosts, cellular modems, display controllers — must be sequenced after confirming sufficient power delivery margin, not simply after confirming rail voltage is present.
We've updated our device bringup checklist and firmware review guidelines to include an explicit power-budget gate for all devices with battery-backed operation going forward.