Design Tool-Use Safety Mechanisms for Autonomous Agents — Practice

Design a safety and security layer for an AI agent that has access to powerful tools: file system operations, database queries, API calls to production services, and the ability to send emails and Slack messages on behalf of users. The agent operates autonomously for long periods, but you must prevent it from causing harm — whether through malicious prompt injection, model errors, or cascading failures from a bad decision.

Risk Scenarios

The agent receives a prompt injection attack that tricks it into executing `DROP TABLE`
A hallucinated API call deletes production data instead of staging data
The agent gets stuck in a loop, making 10,000 identical API calls in 5 minutes
A user asks the agent to "delete all my old files" and it deletes critical work documents

Design Requirements

Design the authorization model — what tools can the agent access and under what conditions?
Design the human-in-the-loop approval system for high-risk actions.
Explain how you validate and sanitize the agent's tool calls before execution.
Design the monitoring and rate-limiting system to prevent runaway behavior.
Describe the audit trail and incident response process.