Watertight

Anthropic said safety/alignment is an issue in 2023/2024 but nobody believed them. The consequences of this will pop up in models as task complexity grows longer in horizon, enterprise objectives become more gray, and misalignment examples become more unclear.

A new class of infrastructure will be needed to make evals/environments/tooling to determine if agentic tasks in those settings conform to actual business objectives.

And maybe they won't! Did anybody ever think that the three laws of Robotics were perfect, given the flexibility of logical interpretation?

Watertight AI

Watertight