Diagnose a Churn Model That Passed Eval but Moved No KPI — Practice

A churn prediction model was trained, evaluated offline, and deployed to production. The offline AUC was 0.87. A live retention experiment ran for 8 weeks and showed zero improvement in 30-day retention. A postmortem revealed three compounding failures — none of them in the model itself.

Failure 1 — Action timing: The model fired a retention email 3 days after the user's last session. By that point, users who were going to churn had already mentally disengaged.

Failure 2 — Action quality: The retention email offered a 10% discount. Churn analysis showed the primary reason for early churn was confusion about the product, not price sensitivity.

Failure 3 — Measurement window: The experiment measured 30-day retention. Churn from this segment actually played out over 60-90 days. The experiment ended before the effect could be observed.

For each of the three failures, identify: (a) what framing decision earlier in the project would have caught it, and (b) what the fix is going forward.

Follow-up ladder

Rung 1: How would you have caught Failure 3 (measurement window mismatch) before launching the experiment? What analysis would you run on historical data?
Rung 2: The team wants to run a new experiment immediately. You have 4 weeks of budget. What is the minimum viable experiment design that could detect whether the action timing fix made a difference?
Rung 3: A second team has been running a parallel experiment targeting the same users with a different intervention (an in-app tutorial prompt, not an email). How do you handle the conflict, and what does it tell you about the product's intervention architecture?