How We Evaluate — NetSpeek Mission Control

How we read a candidate

Five things we screen for. Four things we screen against. Everything else is noise we deliberately ignore.

Signals we screen for

1. You can describe a system you own

The strongest signal in a first call is whether you can describe a system you have shipped, at the right altitude, with the right vocabulary, without us prompting you. We listen for how the pieces fit together. Where the boundaries are. What you decided to build versus buy. What surprised you in production.

If you find yourself describing a feature rather than a system, that is the gap we are listening for.

2. You have seen failure modes and can talk about them

We do not need war stories. We need to hear how you reason about reliability before a thing breaks, and what you change after one does. The pattern we look for: an engineer who treats a postmortem as an architectural artifact, not a blame surface.

If the failure modes in your story are all human ("the team did not communicate"), we will dig harder to find a technical one. Both matter. We will need to see both.

3. You make tradeoffs visible

The most common signal that someone is senior in title but not in practice: every answer is unqualified. "We chose Postgres." "We use Kafka." "We do not allow that." Strong candidates surface the tradeoff inside the answer. What they considered. What they rejected. What they would revisit if they had to choose again.

4. You have thought about AI under production constraints (for AI roles)

For AI roles specifically, we want to hear about grounding, evaluation, cost, latency, refusal calibration, and how you would debug a model that is "confidently wrong" in front of a customer. Tutorials and benchmarks do not count. We are looking for production exposure to one or more of these constraints.

5. You communicate cleanly under technical pressure

When a question gets harder, you slow down rather than guess. You ask a clarifying question rather than answering the wrong question fast. You say "I do not know" cleanly and move to "here is how I would find out." This sounds basic. It is the rarest signal of the five.

Signals we screen against

1. Resume reads as feature work without architectural context

"Built X, shipped Y, owned Z." If we cannot find the system behind the feature list, we usually pass. This is not about how the resume is written. We will dig in conversation. If conversation cannot surface the architecture either, it is a strong negative.

2. You avoid talking about operational reliability

If every answer dodges the production layer, that is signal. We are a platform that lives in customer physical environments. An engineer who is allergic to operational concerns will be unhappy here.

3. AI exposure stops at experimentation

For AI roles: tutorials and side projects are good starting points. They do not tell us how you would think about an evaluation regression or a grounding failure. If we cannot find production AI exposure anywhere in the conversation, we will usually pass even on otherwise strong candidates.

4. You disengage when asked about tradeoffs

A subtle one. If pressing on "what would you do differently" or "what were you trading off" causes the conversation to flatten — short answers, defensive posture, retreating to bullet points — that is the signal. It usually means the work was not owned at the level the resume suggests.

Signals we deliberately ignore

Big-company brand on the resume. We have hired from everywhere. Brand-cult resumes do not get a thumb on the scale.
Years of experience as a primary filter. We have minimums, because the work itself requires them. A strong four-year engineer outranks a weak twelve-year one.
Conference talk count, OSS star count, follower count. Nice if they are there. Not a screening signal.
"Cultural fit" meaning likeness. We do not optimize for personality match. We optimize for the five signals above. We assume people who hit them will work well together.

How the Incident Lab fits in

The Lab is where most of the five positive signals show up in writing. When you respond to a scenario, we are reading for:

System framing. Do you describe the failure as a system problem or as a single root cause?
Tradeoff awareness. Does your fix have explicit tradeoffs, or does it sound costless?
AI judgment, on the relevant scenarios. Do you treat the model as a stochastic component with grounding and evaluation, or as a black box you trust by default?
Communication under constraint. Your response is short. What did you choose to leave out?

The Lab is graded by humans. We do not deploy automated evaluation of candidate responses. That is a policy choice tied to NYC LL 144 and the EU AI Act, and one we are comfortable explaining in detail when asked.

What we do not grade on

The Lab gives you structured prompts. We do not grade on:

Whether your answer matches our internal answer.
Whether you "got the trick." There is no trick.
How long your response is. Shorter is better as long as the reasoning is in there.
Whether you used the exact words we would have used.

If a scenario asks for three signals and you give us two strong ones with a sentence on why you would skip the third, that is a better response than three weak ones.

What we look for, in writing.