Tech Abstractions
Agentic AI·Hard

Design Tool-Use Safety Mechanisms for Autonomous Agents

Asked at Anthropic, OpenAI, Google DeepMind

Design a safety and security layer for an AI agent that has access to powerful tools: file system operations, database queries, API calls to production services, and the ability to send emails and Slack messages on behalf of users. The agent operates autonomously for long periods, but you must prevent it from causing harm — whether through malicious prompt injection, model errors, or cascading failures from a bad decision.

Risk Scenarios

  • The agent receives a prompt injection attack that tricks it into executing `DROP TABLE`
  • A hallucinated API call deletes production data instead of staging data
  • The agent gets stuck in a loop, making 10,000 identical API calls in 5 minutes
  • A user asks the agent to "delete all my old files" and it deletes critical work documents

Design Requirements

  1. Design the authorization model — what tools can the agent access and under what conditions?
  2. Design the human-in-the-loop approval system for high-risk actions.
  3. Explain how you validate and sanitize the agent's tool calls before execution.
  4. Design the monitoring and rate-limiting system to prevent runaway behavior.
  5. Describe the audit trail and incident response process.

Your Answer

Unlock AI-powered scoring, all questions, and progress tracking.

Study the related chapter →