forked from gronod/squeezelite-esp32
16 KiB
16 KiB
Integration Test Worklist
Date: 2026-02-12
Scope: Shared progress tracker for integration and hardware-boundary tests in squeezelite-esp32.
Related Active Goal
documentation/short-term/active/GOAL-002-hut-surface-first-test.mddrivesHW-BOOT-001across all available HUT slots on the target system.
How To Use
- Update this file in every integration-test PR that changes status.
- Keep entries ordered by priority (
P0, thenP1, thenP2). - Status values:
todo,in_progress,blocked,done. - Add
Owner,Last Update, andEvidence(PR, CI run, or log path) when status changes. - Do not remove completed rows; keep history visible.
Agent Contract
Use this contract at the start of any new conversation so execution is consistent.
MODE: guided+freeform
GOAL: implement test roadmap execution using documentation/agents/integration_test_worklist.md
START_ITEM: <ID or auto> # e.g. UT-CHUNK-001 or auto
CONTROL: stepwise # one step at a time
VALIDATION: fast|full # default validation level
CONSTRAINTS:
- short answers
- precise control points
- update worklist status/evidence/handoff on every step
FIRST_ACTION: propose next step with A/B/C + freeform option
Minimal kickoff:
Use documentation/agents/integration_test_worklist.md as orchestrator.
Run guided+freeform, short responses, one step at a time.
Start with UT-CHUNK-001, validation=fast.
Give A/B/C plus freeform each step.
Short-Hand Hints
kickoff auto fast-> start from highest-priority unclaimed item with fast checkskickoff <ID> full-> start from specific item with full checkspick A|B|C-> choose one proposed optiondo: <plain instruction>-> freeform instruction instead of multiple choiceswitch <ID>-> change active itempause-> stop changes and waitcontinue-> proceed with current plantighten-> stricter done criteria and evidence barstatus-> one-screen summary of active item, blockers, next actionhandoff-> force handoff update now (status, evidence, next)
idf.py Usage Hints
- Baseline test-build invocation in this repo:
source /opt/esp/idf/export.sh >/tmp/idf_export.log 2>&1 && idf.py -C test build
- Why this is appropriate:
test/CMakelists.txtdefines a standalone ESP-IDF test project, so-C testis the expected entry point.
For long/chatty builds, redirect to a temporary log to avoid context overload:
build_log="$(mktemp /tmp/idf_test_build.XXXXXX.log)"
source /opt/esp/idf/export.sh >/tmp/idf_export.log 2>&1 && idf.py -C test build >"$build_log" 2>&1
tail -n 200 "$build_log"
Log retention and cleanup rule:
- Keep temp log files only while actively analyzing a failure.
- Remove when no longer needed:
rm -f "$build_log" /tmp/idf_export.log
Agent Handoff Protocol
Use this protocol so any agent can continue work with minimal context loading.
Claiming
- Pick one
todoitem with highest priority and no unresolved dependency. - Set
Statustoin_progress, setOwner, setLast Update(YYYY-MM-DD). - In
Notes, add:
Context:short current state (1 line)Next:single next actionBlockers:noneor short blocker text
During Work
- Keep updates compact and factual.
- If scope expands, add new IDs instead of rewriting existing IDs.
- If blocked, set
Statustoblockedand state unblock condition inNotes.
Handoff
Before stopping work on an item, update:
Evidence: latest PR/commit/CI/log reference.Notes:
Done:what was completedNext:exact next action for the next agentRisks:any known regression risk or uncertainty
- Add a one-line entry in
Activity Log.
Done Criteria For Any Agent-Closed Item
- Contract tested at stable boundary (
documentation/TESTING_CHARTER.md). - Regression case included for a realistic failure mode.
- Runnable command listed and passing evidence attached.
- Handoff
Nextis eithernoneor a linked follow-up ID.
Dependency Keys
Use these keys in Notes when a task depends on another:
DEP:HW-*for hardware matrix dependenciesDEP:UT-*for unit chunk dependenciesDEP:CI-*for CI/workflow dependenciesDEP:DOC-*for required documentation updates
Activity Log
Append-only, newest first.
| Date | Agent | Item ID | Change | Evidence |
|---|---|---|---|---|
| 2026-02-12 | codex | HW-BOOT-001 | GOAL-002 parked by request; execution deferred until GOAL-001 is implemented and LXD backend is available | documentation/short-term/coordination/workstream_board.md |
| 2026-02-12 | codex | HW-BOOT-001 | GOAL-002 WS1 claimed and inventory probe executed; current workspace has no serial devices, so slot mapping remains blocked pending run on LXD HIL host | test/build/log/hut_slot_inventory_20260212.log |
| 2026-02-12 | codex | HW-BOOT-001 | Retried with updated IDF instructions; idf.py -C test build passed after sourcing /opt/esp/idf/export.sh; blocker narrowed to pending HIL execution |
test/build/log/idf_py_stdout_output_20260212_2.log |
| 2026-02-12 | codex | HW-BOOT-001 | Auto-fast kickoff claimed top P0 hardware item; fast validation blocked by missing local idf.py toolchain |
test/build/log/idf_py_missing_20260212.txt |
| 2026-02-12 | codex | UT-CHUNK-001 | Unblocked test-build path for current IDF and recorded passing fast validation | test/build/log/idf_py_stdout_output_20413 |
| 2026-02-12 | codex | UT-CHUNK-001 | Added bootstate regression tests; fixed test harness recovery path typo; fast build now blocked on missing mdns dependency |
components/tools/test/test_bootstate.cpp, test/CMakelists.txt, test/build/log/idf_py_stderr_output_2477 |
| 2026-02-12 | codex | UT-CHUNK-001 | Claimed item and initiated guided+freeform fast kickoff | documentation/agents/integration_test_worklist.md |
| 2026-02-12 | codex | DOC-TEST-ROADMAP-001 | Added no-prune roadmap and unit-test chunk structure for multi-agent execution | documentation/agents/integration_test_worklist.md |
Comprehensive Roadmap (No-Prune)
This roadmap is intentionally exhaustive. No subsystem is excluded at this stage.
Layer Definitions
U: unit tests (contract-level logic and error semantics)I: integration tests (cross-component behavior)H: hardware/HIL tests (real device and peripheral behavior)S: soak/endurance and recovery testing
Full Component Coverage Map
| Component | Required Layers | Must-Hold Contracts (Minimum) | Priority Wave |
|---|---|---|---|
audio |
U, I, H | init/play/stop lifecycle stability; no panic on format changes | Wave 1 |
codecs |
U, I | decode errors are bounded and recoverable; no invalid memory access on malformed frames | Wave 2 |
display |
U, I, H | rendering bounds safety; device init/update robustness | Wave 1 |
driver_bt |
U, I, H, S | pair/connect/disconnect stability; recoverable stack restart | Wave 2 |
esp_http_server |
U, I | route registration/error handling remains stable under malformed requests | Wave 2 |
led_strip |
U, I, H | LED state transitions deterministic; invalid config handled safely | Wave 3 |
metrics |
U, I | telemetry payload correctness; metrics publication never blocks critical paths | Wave 2 |
platform_config |
U, I | config defaulting and schema validation; malformed payload rejection | Wave 1 |
platform_console |
U, I, H | command behavior contracts stable; failure paths return deterministic errors | Wave 2 |
raop |
U, I, H | session lifecycle and stream control resilience; error recovery on network churn | Wave 3 |
services |
U, I, H | queue/event/state contracts deterministic; no deadlock under pressure | Wave 1 |
spotify |
U, I, H | connect/playback lifecycle and error handling remain recoverable | Wave 3 |
squeezelite |
U, I, H, S | stream/decode/output stability; underrun/rebuffer recovery | Wave 1 |
squeezelite-ota |
U, I, H, S | OTA success/failure/rollback safety; never brick | Wave 1 |
targets |
U, I, H | target-specific init and mapping correctness (i2s, muse, squeezeamp) |
Wave 2 |
telnet |
U, I | command channel lifecycle and invalid input handling | Wave 3 |
tjpgd |
U, I | image decode bounds and failure safety | Wave 3 |
tools |
U, I | utility and storage helper correctness; safe error handling | Wave 1 |
wifi-manager |
U, I, H, S | connection/reconnect/credential flow stability; bounded retries | Wave 1 |
_override |
I, H | override behavior compatibility with base driver contracts | Wave 3 |
esp-dsp (vendor) |
I, H | integration compatibility and runtime stability only | Wave 3 |
spotify/cspot (vendor) |
I, H | integration compatibility and runtime stability only | Wave 3 |
telnet/libtelnet (vendor) |
I, H | integration compatibility and runtime stability only | Wave 3 |
Execution Waves
| Wave | Scope | Exit Criteria |
|---|---|---|
| Wave 1 | Release-critical contracts (services, wifi-manager, squeezelite-ota, squeezelite, platform_config, display, tools, audio) |
Required P0 chunks complete; no unresolved P0 regressions |
| Wave 2 | Stability amplification (metrics, platform_console, driver_bt, targets, esp_http_server, codecs) |
P1 chunks for these modules complete; nightly pass signal stable |
| Wave 3 | Extended and compatibility coverage (raop, spotify, telnet, tjpgd, led_strip, _override, vendor integrations) |
P2 chunks and targeted soak coverage complete |
Required Artifacts Per Completed Chunk
- test file(s) and contract statement
- run command(s) and CI job reference
- pass/fail evidence (logs, run link, or artifact path)
- regression linkage (bug/issue/incident id if applicable)
Priority Work Queue
| ID | Priority | Status | Test | Platforms | Owner | Last Update | Evidence | Notes |
|---|---|---|---|---|---|---|---|---|
| HW-BOOT-001 | P0 | blocked | Cold boot to operational state | all | - | 2026-02-12 | documentation/short-term/coordination/workstream_board.md, test/build/log/hut_slot_inventory_20260212.log, test/build/log/idf_py_stdout_output_20260212_2.log |
Context: GOAL-002 is intentionally parked after accidental kickoff. Done: preserved prior inventory/build evidence and cleared active owner. Next: resume when GOAL-001 is complete and LXD hardware backend is available; then rerun slot inventory and continue WS2/WS3. Risks: none beyond explicit dependency delay. Blockers: DEP:GOAL-001 backend availability prerequisite. |
| HW-BOOT-002 | P0 | todo | Warm reboot loop x50 | all | - | - | - | |
| HW-BOOT-003 | P0 | todo | Platform profile/GPIO sanity | all | - | - | - | |
| HW-STOR-001 | P0 | todo | NVS read/write/reset cycle | all | - | - | - | |
| HW-STOR-003 | P0 | todo | SPIFFS mount + required defaults | all | - | - | - | |
| HW-NET-001 | P0 | todo | Wi-Fi connect + DHCP + DNS | all | - | - | - | |
| HW-NET-002 | P0 | todo | Wi-Fi AP loss/recovery reconnect | all | - | - | - | |
| HW-AUD-001 | P0 | todo | Playback start/stop lifecycle | all | - | - | - | |
| HW-OTA-001 | P0 | todo | OTA happy path | all | - | - | - | |
| HW-OTA-002 | P0 | todo | OTA interrupted update recovery | all | - | - | - | |
| HW-OTA-003 | P0 | todo | Recovery partition entry/exit | all | - | - | - | |
| HW-PWRF-001 | P0 | todo | Power-cut/brownout recovery | all | - | - | - | |
| HW-STOR-002 | P1 | todo | Corrupt/partial NVS recovery | all | - | - | - | |
| HW-NET-003 | P1 | todo | mDNS announce/discover | all | - | - | - | |
| HW-NET-004 | P1 | todo | Ethernet link up/down + DHCP traffic | ethernet-capable | - | - | - | |
| HW-AUD-002 | P1 | todo | Format/rate transitions | all | - | - | - | |
| HW-AUD-003 | P1 | todo | Underrun/rebuffer recovery | all | - | - | - | |
| HW-AUD-004 | P1 | todo | Volume/mute/jack/speaker controls | platform-specific | - | - | - | |
| HW-UI-001 | P1 | todo | Button/rotary/IR input mapping | platform-specific | - | - | - | |
| HW-UI-002 | P1 | todo | Display init + update loop | display-capable | - | - | - | |
| HW-PWR-001 | P1 | todo | Battery telemetry/status logic | battery-capable | - | - | - | |
| HW-BT-001 | P1 | todo | Bluetooth pair/connect/disconnect cycles | bt-enabled | - | - | - | |
| HW-BT-002 | P2 | todo | Bluetooth stack restart/recovery | bt-enabled | - | - | - | |
| HW-SOAK-001 | P2 | todo | 12h playback + periodic reconnect | all | - | - | - | |
| HW-SOAK-002 | P2 | todo | 24h mixed load soak | all | - | - | - |
Needed Unit Test Chunks (Short-Lived Backlog)
Purpose: define the minimum unit-test chunks needed now to de-risk integration work. Remove this section once all rows are done.
| Chunk ID | Priority | Status | Required Tests | Target Component(s) | Suggested Command | Owner | Last Update | Evidence | Notes |
|---|---|---|---|---|---|---|---|---|---|
| UT-CHUNK-001 | P0 | in_progress | Boot/partition decision logic: normal boot, forced recovery, invalid state fallback | services, bootstate path in test_main |
idf.py -C test build |
codex | 2026-02-12 | test/build/log/idf_py_stdout_output_20413 |
Contract: never enters non-recoverable boot loop. Context: added components/tools/test/test_bootstate.cpp and updated test-build compatibility for current IDF/CMake tooling. Done: regression tests for normal counter path, forced recovery threshold boundary (5), invalid-state counter normalization (>100), and recovery reset semantics; fast validation build now passes. Next: execute/collect runtime Unity test evidence on target for chunk closure. Risks: current evidence is build-pass in fast mode; runtime execution evidence still pending. Blockers: none. |
| UT-CHUNK-002 | P0 | todo | OTA decision and error mapping: success path, transport failure, invalid image metadata | squeezelite-ota, platform_console/cmd_ota |
idf.py -C test -T tools build |
- | - | - | Contract: failed OTA remains recoverable |
| UT-CHUNK-003 | P0 | todo | Messaging queue contracts: publish/subscribe ordering, timeout behavior, overflow handling | services/messaging |
idf.py -C test -T tools build |
- | - | - | Contract: no crash or deadlock on queue pressure |
| UT-CHUNK-004 | P1 | todo | Wi-Fi manager state transitions: connect, reconnect backoff, credential update, failure exhaustion | wifi-manager |
idf.py -C test -T wifi-manager build |
- | - | - | Contract: bounded retries and deterministic state |
| UT-CHUNK-005 | P1 | todo | Display text/render boundaries: clipping, wrapping, out-of-bounds coordinates, null font/data guards | display/core (gds_text, gds_draw, gds_font) |
idf.py -C test -T tools build |
- | - | - | Contract: renderer never writes outside target buffer |
| UT-CHUNK-006 | P1 | todo | Platform config schema handling: defaulting, unknown fields, malformed payload rejection | platform_config |
idf.py -C test -T platform_config build |
- | - | - | Extend existing components/platform_config/test/ coverage |
| UT-CHUNK-007 | P2 | todo | Input event normalization: button/rotary/IR debounce and duplicate suppression | services/buttons, services/rotary_encoder, services/infrared |
idf.py -C test -T tools build |
- | - | - | Contract: no event storm from bounce/repeat |
| UT-CHUNK-008 | P2 | todo | Battery/telemetry bounds: invalid sensor values, low-battery transitions, status publication | services/battery, metrics |
idf.py -C test -T tools build |
- | - | - | Contract: invalid telemetry never triggers invalid state loops |
Chunk Completion Rule
- Each chunk must add at least one regression test for a realistic failure mode.
- Each chunk must reference contract text from
documentation/CONTRACT_TEST_TEMPLATE.mdin PR notes. - Mark chunk
doneonly after test pass evidence is attached.
Agent Startup Checklist
- Read only these sections first:
Activity Log,Needed Unit Test Chunks,Priority Work Queue. - Choose one highest-priority unclaimed item.
- Claim it using
Agent Handoff Protocol. - Execute targeted tests first; avoid full-matrix runs unless required by the item.
- Leave a complete handoff entry before ending session.
Definition Of Done
- Test case exists at a stable contract boundary and follows
documentation/TESTING_CHARTER.md. - Platform scope is explicit (
allor constrained target set). - Execution path is documented (local command or CI job).
- Pass evidence is linked in
Evidence.
Update Example
| ID | Priority | Status | Test | Platforms | Owner | Last Update | Evidence | Notes |
|---|---|---|---|---|---|---|---|---|
| HW-NET-001 | P0 | done | Wi-Fi connect + DHCP + DNS | all | @agent-name | 2026-02-12 | PR #123, CI run #456 | Added regression for reconnect timeout handling |