Diagnose a Churn Model That Passed Eval but Moved No KPI
Asked at Stripe, Amplitude, HubSpot
A churn prediction model was trained, evaluated offline, and deployed to production. The offline AUC was 0.87. A live retention experiment ran for 8 weeks and showed zero improvement in 30-day retention. A postmortem revealed three compounding failures — none of them in the model itself.
Failure 1 — Action timing: The model fired a retention email 3 days after the user's last session. By that point, users who were going to churn had already mentally disengaged.
Failure 2 — Action quality: The retention email offered a 10% discount. Churn analysis showed the primary reason for early churn was confusion about the product, not price sensitivity.
Failure 3 — Measurement window: The experiment measured 30-day retention. Churn from this segment actually played out over 60-90 days. The experiment ended before the effect could be observed.
For each of the three failures, identify: (a) what framing decision earlier in the project would have caught it, and (b) what the fix is going forward.
Follow-up ladder
- Rung 1: How would you have caught Failure 3 (measurement window mismatch) before launching the experiment? What analysis would you run on historical data?
- Rung 2: The team wants to run a new experiment immediately. You have 4 weeks of budget. What is the minimum viable experiment design that could detect whether the action timing fix made a difference?
- Rung 3: A second team has been running a parallel experiment targeting the same users with a different intervention (an in-app tutorial prompt, not an email). How do you handle the conflict, and what does it tell you about the product's intervention architecture?
Your Answer
Unlock AI-powered scoring, all questions, and progress tracking.