MLOps·Debugging & Incident Triage·Hard
Debug a Fraud Model with Segment-Level Degradation
Asked at Stripe, PayPal, Affirm
You join a team that has a production fraud model deployed for 3 years. Current offline AUC is 0.94. But over the past 6 months, false positive rates have increased 40% on a specific user segment — new signups from mobile — while overall model metrics look fine.
Your manager wants you to "tune the threshold." Walk through what you would actually investigate before changing anything.
- Why threshold tuning is unlikely to fix the root cause
- The three most plausible explanations for segment-specific degradation
- What data you would pull first
- What you would escalate vs. fix yourself
Follow-up ladder
- Rung 1: You pull per-segment data and find the false positive rate on mobile signups started increasing 8 months ago — 2 months before the overall AUC started declining (barely). What does this timing tell you?
- Rung 2: You discover that a new mobile app version changed how device fingerprints are computed 8 months ago. The model uses device fingerprinting as a feature. How does this change your diagnosis and the fix?
- Rung 3: The data team says fixing the feature pipeline will take 6 weeks. The fraud team is under pressure to reduce false positives now. What short-term mitigation is reasonable without making the underlying problem worse?
Your Answer
Unlock AI-powered scoring, all questions, and progress tracking.