# Integration Test Worklist

Date: 2026-02-12
Scope: Shared progress tracker for integration and hardware-boundary tests in `squeezelite-esp32`.

## Related Active Goal

- `documentation/short-term/active/GOAL-002-hut-surface-first-test.md`
  drives `HW-BOOT-001` across all available HUT slots on the target
  system.

## How To Use

- Update this file in every integration-test PR that changes status.
- Keep entries ordered by priority (`P0`, then `P1`, then `P2`).
- Status values: `todo`, `in_progress`, `blocked`, `done`.
- Add `Owner`, `Last Update`, and `Evidence` (PR, CI run, or log path) when status changes.
- Do not remove completed rows; keep history visible.

## Agent Contract

Use this contract at the start of any new conversation so execution is consistent.

```text
MODE: guided+freeform
GOAL: implement test roadmap execution using documentation/agents/integration_test_worklist.md
START_ITEM: <ID or auto>      # e.g. UT-CHUNK-001 or auto
CONTROL: stepwise              # one step at a time
VALIDATION: fast|full          # default validation level
CONSTRAINTS:
- short answers
- precise control points
- update worklist status/evidence/handoff on every step
FIRST_ACTION: propose next step with A/B/C + freeform option
```

Minimal kickoff:

```text
Use documentation/agents/integration_test_worklist.md as orchestrator.
Run guided+freeform, short responses, one step at a time.
Start with UT-CHUNK-001, validation=fast.
Give A/B/C plus freeform each step.
```

### Short-Hand Hints

- `kickoff auto fast` -> start from highest-priority unclaimed item with fast checks
- `kickoff <ID> full` -> start from specific item with full checks
- `pick A|B|C` -> choose one proposed option
- `do: <plain instruction>` -> freeform instruction instead of multiple choice
- `switch <ID>` -> change active item
- `pause` -> stop changes and wait
- `continue` -> proceed with current plan
- `tighten` -> stricter done criteria and evidence bar
- `status` -> one-screen summary of active item, blockers, next action
- `handoff` -> force handoff update now (status, evidence, next)

### `idf.py` Usage Hints

- Baseline test-build invocation in this repo:
  - `source /opt/esp/idf/export.sh >/tmp/idf_export.log 2>&1 && idf.py -C test build`
- Why this is appropriate:
  - `test/CMakelists.txt` defines a standalone ESP-IDF test project, so `-C test` is the expected entry point.

For long/chatty builds, redirect to a temporary log to avoid context overload:

```bash
build_log="$(mktemp /tmp/idf_test_build.XXXXXX.log)"
source /opt/esp/idf/export.sh >/tmp/idf_export.log 2>&1 && idf.py -C test build >"$build_log" 2>&1
tail -n 200 "$build_log"
```

Log retention and cleanup rule:

- Keep temp log files only while actively analyzing a failure.
- Remove when no longer needed:
  - `rm -f "$build_log" /tmp/idf_export.log`

## Agent Handoff Protocol

Use this protocol so any agent can continue work with minimal context loading.

### Claiming

1. Pick one `todo` item with highest priority and no unresolved dependency.
2. Set `Status` to `in_progress`, set `Owner`, set `Last Update` (YYYY-MM-DD).
3. In `Notes`, add:
- `Context:` short current state (1 line)
- `Next:` single next action
- `Blockers:` `none` or short blocker text

### During Work

1. Keep updates compact and factual.
2. If scope expands, add new IDs instead of rewriting existing IDs.
3. If blocked, set `Status` to `blocked` and state unblock condition in `Notes`.

### Handoff

Before stopping work on an item, update:

1. `Evidence`: latest PR/commit/CI/log reference.
2. `Notes`:
- `Done:` what was completed
- `Next:` exact next action for the next agent
- `Risks:` any known regression risk or uncertainty
3. Add a one-line entry in `Activity Log`.

### Done Criteria For Any Agent-Closed Item

- Contract tested at stable boundary (`documentation/TESTING_CHARTER.md`).
- Regression case included for a realistic failure mode.
- Runnable command listed and passing evidence attached.
- Handoff `Next` is either `none` or a linked follow-up ID.

## Dependency Keys

Use these keys in `Notes` when a task depends on another:

- `DEP:HW-*` for hardware matrix dependencies
- `DEP:UT-*` for unit chunk dependencies
- `DEP:CI-*` for CI/workflow dependencies
- `DEP:DOC-*` for required documentation updates

## Activity Log

Append-only, newest first.

| Date | Agent | Item ID | Change | Evidence |
|---|---|---|---|---|
| 2026-02-12 | codex | HW-BOOT-001 | GOAL-002 parked by request; execution deferred until GOAL-001 is implemented and LXD backend is available | `documentation/short-term/coordination/workstream_board.md` |
| 2026-02-12 | codex | HW-BOOT-001 | GOAL-002 WS1 claimed and inventory probe executed; current workspace has no serial devices, so slot mapping remains blocked pending run on LXD HIL host | `test/build/log/hut_slot_inventory_20260212.log` |
| 2026-02-12 | codex | HW-BOOT-001 | Retried with updated IDF instructions; `idf.py -C test build` passed after sourcing `/opt/esp/idf/export.sh`; blocker narrowed to pending HIL execution | `test/build/log/idf_py_stdout_output_20260212_2.log` |
| 2026-02-12 | codex | HW-BOOT-001 | Auto-fast kickoff claimed top P0 hardware item; fast validation blocked by missing local `idf.py` toolchain | `test/build/log/idf_py_missing_20260212.txt` |
| 2026-02-12 | codex | UT-CHUNK-001 | Unblocked test-build path for current IDF and recorded passing fast validation | `test/build/log/idf_py_stdout_output_20413` |
| 2026-02-12 | codex | UT-CHUNK-001 | Added bootstate regression tests; fixed test harness recovery path typo; fast build now blocked on missing `mdns` dependency | `components/tools/test/test_bootstate.cpp`, `test/CMakelists.txt`, `test/build/log/idf_py_stderr_output_2477` |
| 2026-02-12 | codex | UT-CHUNK-001 | Claimed item and initiated guided+freeform fast kickoff | `documentation/agents/integration_test_worklist.md` |
| 2026-02-12 | codex | DOC-TEST-ROADMAP-001 | Added no-prune roadmap and unit-test chunk structure for multi-agent execution | `documentation/agents/integration_test_worklist.md` |

## Comprehensive Roadmap (No-Prune)

This roadmap is intentionally exhaustive. No subsystem is excluded at this stage.

### Layer Definitions

- `U`: unit tests (contract-level logic and error semantics)
- `I`: integration tests (cross-component behavior)
- `H`: hardware/HIL tests (real device and peripheral behavior)
- `S`: soak/endurance and recovery testing

### Full Component Coverage Map

| Component | Required Layers | Must-Hold Contracts (Minimum) | Priority Wave |
|---|---|---|---|
| `audio` | U, I, H | init/play/stop lifecycle stability; no panic on format changes | Wave 1 |
| `codecs` | U, I | decode errors are bounded and recoverable; no invalid memory access on malformed frames | Wave 2 |
| `display` | U, I, H | rendering bounds safety; device init/update robustness | Wave 1 |
| `driver_bt` | U, I, H, S | pair/connect/disconnect stability; recoverable stack restart | Wave 2 |
| `esp_http_server` | U, I | route registration/error handling remains stable under malformed requests | Wave 2 |
| `led_strip` | U, I, H | LED state transitions deterministic; invalid config handled safely | Wave 3 |
| `metrics` | U, I | telemetry payload correctness; metrics publication never blocks critical paths | Wave 2 |
| `platform_config` | U, I | config defaulting and schema validation; malformed payload rejection | Wave 1 |
| `platform_console` | U, I, H | command behavior contracts stable; failure paths return deterministic errors | Wave 2 |
| `raop` | U, I, H | session lifecycle and stream control resilience; error recovery on network churn | Wave 3 |
| `services` | U, I, H | queue/event/state contracts deterministic; no deadlock under pressure | Wave 1 |
| `spotify` | U, I, H | connect/playback lifecycle and error handling remain recoverable | Wave 3 |
| `squeezelite` | U, I, H, S | stream/decode/output stability; underrun/rebuffer recovery | Wave 1 |
| `squeezelite-ota` | U, I, H, S | OTA success/failure/rollback safety; never brick | Wave 1 |
| `targets` | U, I, H | target-specific init and mapping correctness (`i2s`, `muse`, `squeezeamp`) | Wave 2 |
| `telnet` | U, I | command channel lifecycle and invalid input handling | Wave 3 |
| `tjpgd` | U, I | image decode bounds and failure safety | Wave 3 |
| `tools` | U, I | utility and storage helper correctness; safe error handling | Wave 1 |
| `wifi-manager` | U, I, H, S | connection/reconnect/credential flow stability; bounded retries | Wave 1 |
| `_override` | I, H | override behavior compatibility with base driver contracts | Wave 3 |
| `esp-dsp` (vendor) | I, H | integration compatibility and runtime stability only | Wave 3 |
| `spotify/cspot` (vendor) | I, H | integration compatibility and runtime stability only | Wave 3 |
| `telnet/libtelnet` (vendor) | I, H | integration compatibility and runtime stability only | Wave 3 |

### Execution Waves

| Wave | Scope | Exit Criteria |
|---|---|---|
| Wave 1 | Release-critical contracts (`services`, `wifi-manager`, `squeezelite-ota`, `squeezelite`, `platform_config`, `display`, `tools`, `audio`) | Required `P0` chunks complete; no unresolved `P0` regressions |
| Wave 2 | Stability amplification (`metrics`, `platform_console`, `driver_bt`, `targets`, `esp_http_server`, `codecs`) | `P1` chunks for these modules complete; nightly pass signal stable |
| Wave 3 | Extended and compatibility coverage (`raop`, `spotify`, `telnet`, `tjpgd`, `led_strip`, `_override`, vendor integrations) | `P2` chunks and targeted soak coverage complete |

### Required Artifacts Per Completed Chunk

- test file(s) and contract statement
- run command(s) and CI job reference
- pass/fail evidence (logs, run link, or artifact path)
- regression linkage (bug/issue/incident id if applicable)

## Priority Work Queue

| ID | Priority | Status | Test | Platforms | Owner | Last Update | Evidence | Notes |
|---|---|---|---|---|---|---|---|---|
| HW-BOOT-001 | P0 | blocked | Cold boot to operational state | all | - | 2026-02-12 | `documentation/short-term/coordination/workstream_board.md`, `test/build/log/hut_slot_inventory_20260212.log`, `test/build/log/idf_py_stdout_output_20260212_2.log` | Context: GOAL-002 is intentionally parked after accidental kickoff. Done: preserved prior inventory/build evidence and cleared active owner. Next: resume when GOAL-001 is complete and LXD hardware backend is available; then rerun slot inventory and continue WS2/WS3. Risks: none beyond explicit dependency delay. Blockers: DEP:GOAL-001 backend availability prerequisite. |
| HW-BOOT-002 | P0 | todo | Warm reboot loop x50 | all | - | - | - | |
| HW-BOOT-003 | P0 | todo | Platform profile/GPIO sanity | all | - | - | - | |
| HW-STOR-001 | P0 | todo | NVS read/write/reset cycle | all | - | - | - | |
| HW-STOR-003 | P0 | todo | SPIFFS mount + required defaults | all | - | - | - | |
| HW-NET-001 | P0 | todo | Wi-Fi connect + DHCP + DNS | all | - | - | - | |
| HW-NET-002 | P0 | todo | Wi-Fi AP loss/recovery reconnect | all | - | - | - | |
| HW-AUD-001 | P0 | todo | Playback start/stop lifecycle | all | - | - | - | |
| HW-OTA-001 | P0 | todo | OTA happy path | all | - | - | - | |
| HW-OTA-002 | P0 | todo | OTA interrupted update recovery | all | - | - | - | |
| HW-OTA-003 | P0 | todo | Recovery partition entry/exit | all | - | - | - | |
| HW-PWRF-001 | P0 | todo | Power-cut/brownout recovery | all | - | - | - | |
| HW-STOR-002 | P1 | todo | Corrupt/partial NVS recovery | all | - | - | - | |
| HW-NET-003 | P1 | todo | mDNS announce/discover | all | - | - | - | |
| HW-NET-004 | P1 | todo | Ethernet link up/down + DHCP traffic | ethernet-capable | - | - | - | |
| HW-AUD-002 | P1 | todo | Format/rate transitions | all | - | - | - | |
| HW-AUD-003 | P1 | todo | Underrun/rebuffer recovery | all | - | - | - | |
| HW-AUD-004 | P1 | todo | Volume/mute/jack/speaker controls | platform-specific | - | - | - | |
| HW-UI-001 | P1 | todo | Button/rotary/IR input mapping | platform-specific | - | - | - | |
| HW-UI-002 | P1 | todo | Display init + update loop | display-capable | - | - | - | |
| HW-PWR-001 | P1 | todo | Battery telemetry/status logic | battery-capable | - | - | - | |
| HW-BT-001 | P1 | todo | Bluetooth pair/connect/disconnect cycles | bt-enabled | - | - | - | |
| HW-BT-002 | P2 | todo | Bluetooth stack restart/recovery | bt-enabled | - | - | - | |
| HW-SOAK-001 | P2 | todo | 12h playback + periodic reconnect | all | - | - | - | |
| HW-SOAK-002 | P2 | todo | 24h mixed load soak | all | - | - | - | |

## Needed Unit Test Chunks (Short-Lived Backlog)

Purpose: define the minimum unit-test chunks needed now to de-risk integration work. Remove this section once all rows are `done`.

| Chunk ID | Priority | Status | Required Tests | Target Component(s) | Suggested Command | Owner | Last Update | Evidence | Notes |
|---|---|---|---|---|---|---|---|---|---|
| UT-CHUNK-001 | P0 | in_progress | Boot/partition decision logic: normal boot, forced recovery, invalid state fallback | `services`, `bootstate` path in `test_main` | `idf.py -C test build` | codex | 2026-02-12 | `test/build/log/idf_py_stdout_output_20413` | Contract: never enters non-recoverable boot loop. Context: added `components/tools/test/test_bootstate.cpp` and updated test-build compatibility for current IDF/CMake tooling. Done: regression tests for normal counter path, forced recovery threshold boundary (`5`), invalid-state counter normalization (`>100`), and recovery reset semantics; fast validation build now passes. Next: execute/collect runtime Unity test evidence on target for chunk closure. Risks: current evidence is build-pass in fast mode; runtime execution evidence still pending. Blockers: none. |
| UT-CHUNK-002 | P0 | todo | OTA decision and error mapping: success path, transport failure, invalid image metadata | `squeezelite-ota`, `platform_console/cmd_ota` | `idf.py -C test -T tools build` | - | - | - | Contract: failed OTA remains recoverable |
| UT-CHUNK-003 | P0 | todo | Messaging queue contracts: publish/subscribe ordering, timeout behavior, overflow handling | `services/messaging` | `idf.py -C test -T tools build` | - | - | - | Contract: no crash or deadlock on queue pressure |
| UT-CHUNK-004 | P1 | todo | Wi-Fi manager state transitions: connect, reconnect backoff, credential update, failure exhaustion | `wifi-manager` | `idf.py -C test -T wifi-manager build` | - | - | - | Contract: bounded retries and deterministic state |
| UT-CHUNK-005 | P1 | todo | Display text/render boundaries: clipping, wrapping, out-of-bounds coordinates, null font/data guards | `display/core` (`gds_text`, `gds_draw`, `gds_font`) | `idf.py -C test -T tools build` | - | - | - | Contract: renderer never writes outside target buffer |
| UT-CHUNK-006 | P1 | todo | Platform config schema handling: defaulting, unknown fields, malformed payload rejection | `platform_config` | `idf.py -C test -T platform_config build` | - | - | - | Extend existing `components/platform_config/test/` coverage |
| UT-CHUNK-007 | P2 | todo | Input event normalization: button/rotary/IR debounce and duplicate suppression | `services/buttons`, `services/rotary_encoder`, `services/infrared` | `idf.py -C test -T tools build` | - | - | - | Contract: no event storm from bounce/repeat |
| UT-CHUNK-008 | P2 | todo | Battery/telemetry bounds: invalid sensor values, low-battery transitions, status publication | `services/battery`, `metrics` | `idf.py -C test -T tools build` | - | - | - | Contract: invalid telemetry never triggers invalid state loops |

### Chunk Completion Rule

- Each chunk must add at least one regression test for a realistic failure mode.
- Each chunk must reference contract text from `documentation/CONTRACT_TEST_TEMPLATE.md` in PR notes.
- Mark chunk `done` only after test pass evidence is attached.

## Agent Startup Checklist

1. Read only these sections first: `Activity Log`, `Needed Unit Test Chunks`, `Priority Work Queue`.
2. Choose one highest-priority unclaimed item.
3. Claim it using `Agent Handoff Protocol`.
4. Execute targeted tests first; avoid full-matrix runs unless required by the item.
5. Leave a complete handoff entry before ending session.

## Definition Of Done

- Test case exists at a stable contract boundary and follows `documentation/TESTING_CHARTER.md`.
- Platform scope is explicit (`all` or constrained target set).
- Execution path is documented (local command or CI job).
- Pass evidence is linked in `Evidence`.

## Update Example

| ID | Priority | Status | Test | Platforms | Owner | Last Update | Evidence | Notes |
|---|---|---|---|---|---|---|---|---|
| HW-NET-001 | P0 | done | Wi-Fi connect + DHCP + DNS | all | @agent-name | 2026-02-12 | PR #123, CI run #456 | Added regression for reconnect timeout handling |