Choose the Right Autonomy Level for a Financial Anomaly System — Practice

A finance operations team wants to deploy an ML system to detect anomalies in outgoing invoice payments. The company processes about 5,000 invoices per month. Anomaly types include duplicate invoices, unusually large payments, new vendors not in the approved list, and payment timing mismatches. Today, a team of three analysts reviews all invoices manually.

You are asked to design the autonomy level for the system. Should it be automation augmentation (model replaces or augments deterministic logic), human-in-the-loop (model assists analysts without acting alone), or autonomous (model acts without human review)? Answer in three parts: (1) the likely error cost if the model is wrong, (2) which archetype you would start with and why, (3) what specific evidence would need to be true before you moved to a more autonomous mode.

Follow-up ladder

Rung 1: The model has been running in human-in-the-loop mode for 6 months and achieves 97% precision on flagged anomalies. The VP of Finance wants to move to autonomous blocking for invoices under $10,000. Is that a reasonable next step?
Rung 2: You move to autonomous blocking for small invoices. Three weeks in, a legitimate vendor's invoices are all being blocked due to a data pipeline issue — the vendor's name changed slightly in the ERP system. How does this incident affect your autonomy model going forward?
Rung 3: The finance team says the model is "too conservative" — it flags 40 invoices per month but only 8 of those turn out to be real anomalies. What metric tells you whether the system is actually improving finance operations, and how do you use it to calibrate the threshold?