gronod/squeezelite-esp32-gronod

Fork 0

forked from gronod/squeezelite-esp32

Files

Sebastien L 186a019b3b docs: lock remote delegation contract and WS8 tracking

2026-02-13 22:05:50 +00:00

16 KiB

Raw Permalink Blame History

Integration Test Worklist

Date: 2026-02-12 Scope: Shared progress tracker for integration and hardware-boundary tests in squeezelite-esp32.

documentation/short-term/active/GOAL-002-hut-surface-first-test.md drives HW-BOOT-001 across all available HUT slots on the target system.

How To Use

Update this file in every integration-test PR that changes status.
Keep entries ordered by priority (P0, then P1, then P2).
Status values: todo, in_progress, blocked, done.
Add Owner, Last Update, and Evidence (PR, CI run, or log path) when status changes.
Do not remove completed rows; keep history visible.

Agent Contract

Use this contract at the start of any new conversation so execution is consistent.

MODE: guided+freeform
GOAL: implement test roadmap execution using documentation/agents/integration_test_worklist.md
START_ITEM: <ID or auto>      # e.g. UT-CHUNK-001 or auto
CONTROL: stepwise              # one step at a time
VALIDATION: fast|full          # default validation level
CONSTRAINTS:
- short answers
- precise control points
- update worklist status/evidence/handoff on every step
FIRST_ACTION: propose next step with A/B/C + freeform option

Minimal kickoff:

Use documentation/agents/integration_test_worklist.md as orchestrator.
Run guided+freeform, short responses, one step at a time.
Start with UT-CHUNK-001, validation=fast.
Give A/B/C plus freeform each step.

Short-Hand Hints

kickoff auto fast -> start from highest-priority unclaimed item with fast checks
kickoff <ID> full -> start from specific item with full checks
pick A|B|C -> choose one proposed option
do: <plain instruction> -> freeform instruction instead of multiple choice
switch <ID> -> change active item
pause -> stop changes and wait
continue -> proceed with current plan
tighten -> stricter done criteria and evidence bar
status -> one-screen summary of active item, blockers, next action
handoff -> force handoff update now (status, evidence, next)

`idf.py` Usage Hints

Baseline test-build invocation in this repo:
- source /opt/esp/idf/export.sh >/tmp/idf_export.log 2>&1 && idf.py -C test build
Why this is appropriate:
- test/CMakelists.txt defines a standalone ESP-IDF test project, so -C test is the expected entry point.

For long/chatty builds, redirect to a temporary log to avoid context overload:

build_log="$(mktemp /tmp/idf_test_build.XXXXXX.log)"
source /opt/esp/idf/export.sh >/tmp/idf_export.log 2>&1 && idf.py -C test build >"$build_log" 2>&1
tail -n 200 "$build_log"

Log retention and cleanup rule:

Keep temp log files only while actively analyzing a failure.
Remove when no longer needed:
- rm -f "$build_log" /tmp/idf_export.log

Agent Handoff Protocol

Use this protocol so any agent can continue work with minimal context loading.

Claiming

Pick one todo item with highest priority and no unresolved dependency.
Set Status to in_progress, set Owner, set Last Update (YYYY-MM-DD).
In Notes, add:

Context: short current state (1 line)
Next: single next action
Blockers: none or short blocker text

During Work

Keep updates compact and factual.
If scope expands, add new IDs instead of rewriting existing IDs.
If blocked, set Status to blocked and state unblock condition in Notes.

Handoff

Before stopping work on an item, update:

Evidence: latest PR/commit/CI/log reference.
Notes:

Done: what was completed
Next: exact next action for the next agent
Risks: any known regression risk or uncertainty

Add a one-line entry in Activity Log.

Done Criteria For Any Agent-Closed Item

Contract tested at stable boundary (documentation/TESTING_CHARTER.md).
Regression case included for a realistic failure mode.
Runnable command listed and passing evidence attached.
Handoff Next is either none or a linked follow-up ID.

Dependency Keys

Use these keys in Notes when a task depends on another:

DEP:HW-* for hardware matrix dependencies
DEP:UT-* for unit chunk dependencies
DEP:CI-* for CI/workflow dependencies
DEP:DOC-* for required documentation updates

Activity Log

Append-only, newest first.

Date	Agent	Item ID	Change	Evidence
2026-02-12	codex	HW-BOOT-001	GOAL-002 parked by request; execution deferred until GOAL-001 is implemented and LXD backend is available	`documentation/short-term/coordination/workstream_board.md`
2026-02-12	codex	HW-BOOT-001	GOAL-002 WS1 claimed and inventory probe executed; current workspace has no serial devices, so slot mapping remains blocked pending run on LXD HIL host	`test/build/log/hut_slot_inventory_20260212.log`
2026-02-12	codex	HW-BOOT-001	Retried with updated IDF instructions; `idf.py -C test build` passed after sourcing `/opt/esp/idf/export.sh`; blocker narrowed to pending HIL execution	`test/build/log/idf_py_stdout_output_20260212_2.log`
2026-02-12	codex	HW-BOOT-001	Auto-fast kickoff claimed top P0 hardware item; fast validation blocked by missing local `idf.py` toolchain	`test/build/log/idf_py_missing_20260212.txt`
2026-02-12	codex	UT-CHUNK-001	Unblocked test-build path for current IDF and recorded passing fast validation	`test/build/log/idf_py_stdout_output_20413`
2026-02-12	codex	UT-CHUNK-001	Added bootstate regression tests; fixed test harness recovery path typo; fast build now blocked on missing `mdns` dependency	`components/tools/test/test_bootstate.cpp`, `test/CMakelists.txt`, `test/build/log/idf_py_stderr_output_2477`
2026-02-12	codex	UT-CHUNK-001	Claimed item and initiated guided+freeform fast kickoff	`documentation/agents/integration_test_worklist.md`
2026-02-12	codex	DOC-TEST-ROADMAP-001	Added no-prune roadmap and unit-test chunk structure for multi-agent execution	`documentation/agents/integration_test_worklist.md`

Comprehensive Roadmap (No-Prune)

This roadmap is intentionally exhaustive. No subsystem is excluded at this stage.

Layer Definitions

U: unit tests (contract-level logic and error semantics)
I: integration tests (cross-component behavior)
H: hardware/HIL tests (real device and peripheral behavior)
S: soak/endurance and recovery testing

Full Component Coverage Map

Component	Required Layers	Must-Hold Contracts (Minimum)	Priority Wave
`audio`	U, I, H	init/play/stop lifecycle stability; no panic on format changes	Wave 1
`codecs`	U, I	decode errors are bounded and recoverable; no invalid memory access on malformed frames	Wave 2
`display`	U, I, H	rendering bounds safety; device init/update robustness	Wave 1
`driver_bt`	U, I, H, S	pair/connect/disconnect stability; recoverable stack restart	Wave 2
`esp_http_server`	U, I	route registration/error handling remains stable under malformed requests	Wave 2
`led_strip`	U, I, H	LED state transitions deterministic; invalid config handled safely	Wave 3
`metrics`	U, I	telemetry payload correctness; metrics publication never blocks critical paths	Wave 2
`platform_config`	U, I	config defaulting and schema validation; malformed payload rejection	Wave 1
`platform_console`	U, I, H	command behavior contracts stable; failure paths return deterministic errors	Wave 2
`raop`	U, I, H	session lifecycle and stream control resilience; error recovery on network churn	Wave 3
`services`	U, I, H	queue/event/state contracts deterministic; no deadlock under pressure	Wave 1
`spotify`	U, I, H	connect/playback lifecycle and error handling remain recoverable	Wave 3
`squeezelite`	U, I, H, S	stream/decode/output stability; underrun/rebuffer recovery	Wave 1
`squeezelite-ota`	U, I, H, S	OTA success/failure/rollback safety; never brick	Wave 1
`targets`	U, I, H	target-specific init and mapping correctness (`i2s`, `muse`, `squeezeamp`)	Wave 2
`telnet`	U, I	command channel lifecycle and invalid input handling	Wave 3
`tjpgd`	U, I	image decode bounds and failure safety	Wave 3
`tools`	U, I	utility and storage helper correctness; safe error handling	Wave 1
`wifi-manager`	U, I, H, S	connection/reconnect/credential flow stability; bounded retries	Wave 1
`_override`	I, H	override behavior compatibility with base driver contracts	Wave 3
`esp-dsp` (vendor)	I, H	integration compatibility and runtime stability only	Wave 3
`spotify/cspot` (vendor)	I, H	integration compatibility and runtime stability only	Wave 3
`telnet/libtelnet` (vendor)	I, H	integration compatibility and runtime stability only	Wave 3

Execution Waves

Wave	Scope	Exit Criteria
Wave 1	Release-critical contracts (`services`, `wifi-manager`, `squeezelite-ota`, `squeezelite`, `platform_config`, `display`, `tools`, `audio`)	Required `P0` chunks complete; no unresolved `P0` regressions
Wave 2	Stability amplification (`metrics`, `platform_console`, `driver_bt`, `targets`, `esp_http_server`, `codecs`)	`P1` chunks for these modules complete; nightly pass signal stable
Wave 3	Extended and compatibility coverage (`raop`, `spotify`, `telnet`, `tjpgd`, `led_strip`, `_override`, vendor integrations)	`P2` chunks and targeted soak coverage complete

Required Artifacts Per Completed Chunk

test file(s) and contract statement
run command(s) and CI job reference
pass/fail evidence (logs, run link, or artifact path)
regression linkage (bug/issue/incident id if applicable)

Priority Work Queue

ID	Priority	Status	Test	Platforms	Owner	Last Update	Evidence	Notes
HW-BOOT-001	P0	blocked	Cold boot to operational state	all	-	2026-02-12	`documentation/short-term/coordination/workstream_board.md`, `test/build/log/hut_slot_inventory_20260212.log`, `test/build/log/idf_py_stdout_output_20260212_2.log`	Context: GOAL-002 is intentionally parked after accidental kickoff. Done: preserved prior inventory/build evidence and cleared active owner. Next: resume when GOAL-001 is complete and LXD hardware backend is available; then rerun slot inventory and continue WS2/WS3. Risks: none beyond explicit dependency delay. Blockers: DEP:GOAL-001 backend availability prerequisite.
HW-BOOT-002	P0	todo	Warm reboot loop x50	all	-	-	-
HW-BOOT-003	P0	todo	Platform profile/GPIO sanity	all	-	-	-
HW-STOR-001	P0	todo	NVS read/write/reset cycle	all	-	-	-
HW-STOR-003	P0	todo	SPIFFS mount + required defaults	all	-	-	-
HW-NET-001	P0	todo	Wi-Fi connect + DHCP + DNS	all	-	-	-
HW-NET-002	P0	todo	Wi-Fi AP loss/recovery reconnect	all	-	-	-
HW-AUD-001	P0	todo	Playback start/stop lifecycle	all	-	-	-
HW-OTA-001	P0	todo	OTA happy path	all	-	-	-
HW-OTA-002	P0	todo	OTA interrupted update recovery	all	-	-	-
HW-OTA-003	P0	todo	Recovery partition entry/exit	all	-	-	-
HW-PWRF-001	P0	todo	Power-cut/brownout recovery	all	-	-	-
HW-STOR-002	P1	todo	Corrupt/partial NVS recovery	all	-	-	-
HW-NET-003	P1	todo	mDNS announce/discover	all	-	-	-
HW-NET-004	P1	todo	Ethernet link up/down + DHCP traffic	ethernet-capable	-	-	-
HW-AUD-002	P1	todo	Format/rate transitions	all	-	-	-
HW-AUD-003	P1	todo	Underrun/rebuffer recovery	all	-	-	-
HW-AUD-004	P1	todo	Volume/mute/jack/speaker controls	platform-specific	-	-	-
HW-UI-001	P1	todo	Button/rotary/IR input mapping	platform-specific	-	-	-
HW-UI-002	P1	todo	Display init + update loop	display-capable	-	-	-
HW-PWR-001	P1	todo	Battery telemetry/status logic	battery-capable	-	-	-
HW-BT-001	P1	todo	Bluetooth pair/connect/disconnect cycles	bt-enabled	-	-	-
HW-BT-002	P2	todo	Bluetooth stack restart/recovery	bt-enabled	-	-	-
HW-SOAK-001	P2	todo	12h playback + periodic reconnect	all	-	-	-
HW-SOAK-002	P2	todo	24h mixed load soak	all	-	-	-

Needed Unit Test Chunks (Short-Lived Backlog)

Purpose: define the minimum unit-test chunks needed now to de-risk integration work. Remove this section once all rows are done.

Chunk ID	Priority	Status	Required Tests	Target Component(s)	Suggested Command	Owner	Last Update	Evidence	Notes
UT-CHUNK-001	P0	in_progress	Boot/partition decision logic: normal boot, forced recovery, invalid state fallback	`services`, `bootstate` path in `test_main`	`idf.py -C test build`	codex	2026-02-12	`test/build/log/idf_py_stdout_output_20413`	Contract: never enters non-recoverable boot loop. Context: added `components/tools/test/test_bootstate.cpp` and updated test-build compatibility for current IDF/CMake tooling. Done: regression tests for normal counter path, forced recovery threshold boundary (`5`), invalid-state counter normalization (`>100`), and recovery reset semantics; fast validation build now passes. Next: execute/collect runtime Unity test evidence on target for chunk closure. Risks: current evidence is build-pass in fast mode; runtime execution evidence still pending. Blockers: none.
UT-CHUNK-002	P0	todo	OTA decision and error mapping: success path, transport failure, invalid image metadata	`squeezelite-ota`, `platform_console/cmd_ota`	`idf.py -C test -T tools build`	-	-	-	Contract: failed OTA remains recoverable
UT-CHUNK-003	P0	todo	Messaging queue contracts: publish/subscribe ordering, timeout behavior, overflow handling	`services/messaging`	`idf.py -C test -T tools build`	-	-	-	Contract: no crash or deadlock on queue pressure
UT-CHUNK-004	P1	todo	Wi-Fi manager state transitions: connect, reconnect backoff, credential update, failure exhaustion	`wifi-manager`	`idf.py -C test -T wifi-manager build`	-	-	-	Contract: bounded retries and deterministic state
UT-CHUNK-005	P1	todo	Display text/render boundaries: clipping, wrapping, out-of-bounds coordinates, null font/data guards	`display/core` (`gds_text`, `gds_draw`, `gds_font`)	`idf.py -C test -T tools build`	-	-	-	Contract: renderer never writes outside target buffer
UT-CHUNK-006	P1	todo	Platform config schema handling: defaulting, unknown fields, malformed payload rejection	`platform_config`	`idf.py -C test -T platform_config build`	-	-	-	Extend existing `components/platform_config/test/` coverage
UT-CHUNK-007	P2	todo	Input event normalization: button/rotary/IR debounce and duplicate suppression	`services/buttons`, `services/rotary_encoder`, `services/infrared`	`idf.py -C test -T tools build`	-	-	-	Contract: no event storm from bounce/repeat
UT-CHUNK-008	P2	todo	Battery/telemetry bounds: invalid sensor values, low-battery transitions, status publication	`services/battery`, `metrics`	`idf.py -C test -T tools build`	-	-	-	Contract: invalid telemetry never triggers invalid state loops

Chunk Completion Rule

Each chunk must add at least one regression test for a realistic failure mode.
Each chunk must reference contract text from documentation/CONTRACT_TEST_TEMPLATE.md in PR notes.
Mark chunk done only after test pass evidence is attached.

Agent Startup Checklist

Read only these sections first: Activity Log, Needed Unit Test Chunks, Priority Work Queue.
Choose one highest-priority unclaimed item.
Claim it using Agent Handoff Protocol.
Execute targeted tests first; avoid full-matrix runs unless required by the item.
Leave a complete handoff entry before ending session.

Definition Of Done

Test case exists at a stable contract boundary and follows documentation/TESTING_CHARTER.md.
Platform scope is explicit (all or constrained target set).
Execution path is documented (local command or CI job).
Pass evidence is linked in Evidence.

Update Example

ID	Priority	Status	Test	Platforms	Owner	Last Update	Evidence	Notes
HW-NET-001	P0	done	Wi-Fi connect + DHCP + DNS	all	@agent-name	2026-02-12	PR #123, CI run #456	Added regression for reconnect timeout handling

16 KiB Raw Permalink Blame History