Measuring AI Agent Autonomy in Practice Anthropic

TLDR: This research examines AI agent autonomy, focusing on Claude Code's interactions and user behavior. It finds that Claude is increasingly autonomous, working longer without interruptions and auto-approving more frequently as users gain experience. However, experienced users also interrupt more, indicating active oversight. Most agent tasks are low-risk, mainly in software engineering, with limited high-risk applications. Recommendations include enhancing post-deployment monitoring, training AI to recognize uncertainty, and designing for effective user oversight. Overall, autonomy levels are rising amid evolving agent applications.

https://www.anthropic.com/research/measuring-agent-autonomy

Scroll to Top