← Work

ML in Regulated Production — 0 to 1

$890K annual impact · first ML product in regulated production

The Problem

Shipping ML in a regulated environment is a different problem than shipping ML in a normal one.

In normal product development, you iterate toward accuracy. You get the model good enough, you ship, you improve. The feedback loop is short. Failure is recoverable.

In a regulated environment — one where outputs inform clinical decisions, where every inference must be traceable to its inputs, where a wrong answer has human consequences — the rules change. You can't iterate publicly. You can't accept graceful degradation. You have to be right, and you have to prove it.

The constraint isn't the model. It's trust.

The Insight

Most ML products in regulated environments fail at adoption, not accuracy. Teams build models that are technically sound and scientifically valid, then release them to end users who don't understand them and don't trust them.

The gap between model output and human decision is a product problem, not a data science problem.

I had seen this pattern before in developer tooling: the tools that get adopted are the ones that work with how people already think, not against it. The same principle applies here. If the model's output feels like a black box, no amount of accuracy will drive adoption — because trust isn't built from accuracy. It's built from legibility.

The Solution

I designed the product around adoption psychology rather than model performance:

Explainability first, accuracy second — We shipped a v1 that showed its reasoning before it showed its answer. Users could see what the model was weighing. The output became a prompt for human judgment, not a replacement for it.

Compliance baked into the loop — Auditability wasn't a feature we added after launch. Every inference wrote to an immutable log with inputs, outputs, model version, and timestamp. Regulatory review became a query, not a process.

Gradual authority transfer — We didn't ask users to trust the model on day one. We started with recommendations, tracked concordance with human decisions, and expanded authority as trust accumulated — calibrated to where agreement was consistently high.

The Outcome

  • $890K annual impact in the first year
  • First ML product shipped to production in a regulated environment
  • Adoption driven by pull, not mandate — users requested expanded access before we planned to offer it

The product worked because we understood that the real product wasn't the model. It was the trust architecture around the model.