FILE 07.04 / SCENARIO
SIMULATED · FOR CANDIDATE EVALUATION · NOT REPRESENTATIVE OF PRODUCTION SYSTEMS
Stop the Regression
A bug that keeps coming back across releases. Design a CI gate and the telemetry that would have caught it earlier.
FILE 07.04.1 / SETUP
Same bug. Three releases. Three customer reports. One internal monitoring catch — eventually.
The bug: when an operator triggers the "mute room" workflow, the orchestration API returns 200 ("success"), but the underlying device state remains unmuted. The operator sees green; the room is still hot. Awkward in a routine meeting. Embarrassing in a board meeting.
It first hit production in 2025.10.4 (one customer, one vendor). The fix shipped. Then 2025.11.2 (different customer, different vendor — but the same root cause class). The fix shipped again. Then 2026.01.1 — three customers, two vendors, same root cause.
Each time the symptom is the same: the API claims the workflow completed but the device-state verification at the end never actually fired (or fired but didn't fail when the state didn't match). The CI suite never caught it. Production telemetry never raised it. The customer raised it.
The team's been deliberate about not patching just the immediate cause; we want a CI gate that catches this class of bug — not just the next instance of this specific bug.
Below is the snapshot of the current CI surface, the current production telemetry coverage, and the three incident records. Your job is to design two things:
- The CI gate that would have caught this before release.
- The single missing production telemetry signal that would have cut time-to-detect dramatically.
The hard part isn't naming the signals. The hard part is keeping CI fast and trustworthy while you do it.
FILE 07.04.2 / TELEMETRY
What is on the operator's screen.
Real-shaped operational data. Anonymized device IDs, real-shaped timing. The same view an on-call engineer would see in the moment.
incidents · 3
unit_tests
integration_tests
e2e_tests
FILE 07.04.4 / YOUR RESPONSE
Show us how you would design it.
Short and specific beats long and vague. The next step is the application form — we save what you have written here so you do not lose it.