NetSpeek

FILE 05 / OPEN PROBLEMS

ENGINEERING / UNFINISHED

What we are still figuring out.

Sanitized problem classes from inside the platform. Sketch how you would start on one. That is the strongest application we know how to read.

RESEARCH NOTICELEGAL
The questions on this page are forward-looking research framings we use to evaluate engineering depth. They are not statements of known production deficiencies, root-cause analyses, or commitments to product behavior.

What this page is

Engineering problems we have not fully solved. We list them publicly for two reasons. We would rather a candidate know what they would actually be working on. And the best applications we receive engage with one of these directly.

All of them are sanitized. No customer references, no internal architecture leakage. The shape of the problem is faithful. The specifics are abstracted.

1. Multi-vendor device normalization

Every physical endpoint we govern speaks a different dialect. Same conceptual operation ("mute the room"). Four different APIs. Four different timing profiles. Four different failure semantics. The naive answer is a per-vendor adapter layer. The real answer involves figuring out which differences are accidental, and should be normalized away, versus essential, and need to surface to Lena and to the operator.

What we are still working out: the right abstraction for "device capability" that is expressive enough for vendor-specific quirks but tight enough that the orchestration layer does not fragment into special cases.

2. Evaluating AI recommendations under delayed ground truth

Lena recommends an action. An operator approves or rejects. The operator decision becomes our ground truth. Sometimes ground truth is delayed by hours, because the room is fixed by the next morning, not in the moment. Sometimes it never arrives, because the operator silently changes their mind. How do we run reliable eval pipelines on a signal that is noisy, sparse, and partially synthetic?

Adjacent problems: detecting when operator behavior is drifting (calibration shift). Separating "Lena was wrong" from "the operator overrode but Lena was right." Figuring out the latency we can tolerate between recommendation and eval signal.

3. Action execution boundaries for enterprise AI

The interesting question is not "should AI take actions." The answer there is yes, sometimes. The interesting question is "where exactly does the action boundary sit, and how does it move?" Some actions are obviously OK to automate (a benign restart on a low-criticality device). Some are obviously not (firmware push to a fleet during business hours). The interesting cases are in the middle. We are building a policy framework that is expressive enough for enterprise customers but operable enough that engineers can reason about it.

Open question we like: how should the boundary move over time as Lena's reliability is demonstrated, without that drift being implicit?

4. Testing physical-infrastructure workflows without a full lab

The painful reality of our platform is that most things we ship interact with physical hardware. Building a complete hardware lab that mirrors production is unaffordable. Running CI against actual devices is too slow and too flaky to use as a gate.

What we are working on: layered test infrastructure that runs the bulk of regression at the simulation tier, escalates a thin set of "actually use the hardware" tests on a release candidate, and lets engineers reproduce a customer-reported issue locally without a fleet.

5. Trust calibration

Operators have a confidence model of Lena. It is built from the times she was right and the times she was not. If that model is wrong in either direction, the platform breaks. Under-trusted Lena gets ignored. Over-trusted Lena gets used in situations she should not be.

We need the platform's expressed confidence to track its actual reliability per workflow. We need that calibration to be visible to operators in ways they actually read. This is a quantitative problem with a heavy UX surface.

6. Operator interfaces under cognitive load

Operators using NetSpeek are not sitting at the screen reading carefully. They are managing twenty things at once, often during an active incident. UI that "shows the data" is the wrong frame. UI that "tells the operator the next decision and why" is the right frame.

What we are working on: defaults that surface the one piece of information that changes the operator's next action, with progressive disclosure of detail for the cases where the default is wrong. Easier said than designed.

7. Edge / cloud coordination under partial failure

Edge agents run on customer networks. The cloud orchestration layer runs in our cloud region. They coordinate over channels that fail in interesting ways. Partial failure is not an edge case here. It is the norm. We need workflow state that is resumable across reconnection, eventually consistent decisions that do not surprise operators, and a coherent story about what the platform does when an edge agent is offline for a long time.

8. Telemetry signal-to-noise as the platform scales

Every customer adds a multiplier on the telemetry firehose. Most of it is operationally uninteresting on any given day. We need to keep the useful signal preserved while not paying to store and query the rest. Tiered retention. Downsampling that is safe for eval pipelines. A queryable index that the AI layer can use without hitting cost cliffs.


How to engage with these in your application

If one of these problems matches your experience, write a paragraph in the application about how you would approach it. We are not looking for a solved answer. We are looking for the shape of how you would start.

The signal we want: which problem you picked. What you would look at first. What you would be most worried about. Where you would want help.