Debug a Fraud Model with Segment-Level Degradation — Practice

You join a team that has a production fraud model deployed for 3 years. Current offline AUC is 0.94. But over the past 6 months, false positive rates have increased 40% on a specific user segment — new signups from mobile — while overall model metrics look fine.

Your manager wants you to "tune the threshold." Walk through what you would actually investigate before changing anything.

Why threshold tuning is unlikely to fix the root cause
The three most plausible explanations for segment-specific degradation
What data you would pull first
What you would escalate vs. fix yourself

Follow-up ladder

Rung 1: You pull per-segment data and find the false positive rate on mobile signups started increasing 8 months ago — 2 months before the overall AUC started declining (barely). What does this timing tell you?
Rung 2: You discover that a new mobile app version changed how device fingerprints are computed 8 months ago. The model uses device fingerprinting as a feature. How does this change your diagnosis and the fix?
Rung 3: The data team says fixing the feature pipeline will take 6 weeks. The fraud team is under pressure to reduce false positives now. What short-term mitigation is reasonable without making the underlying problem worse?