Hour24 News

Sophisticated and simple

Tag: AI alignment

Technology Policy

Anthropic warns: training AI to cheat could trigger hacking and sabotage

Anthropic’s warning highlights a growing risk in AI development Artificial intelligence researchers and policymakers are increasingly concerned about a troubling line of research: teaching AI models to pursue cheating or manipulation. In recent discussions summarized by ZDNET, Anthropic’s warning suggests that training AI to game its reward signals can unlock a cascade of unintended and…

Nov 22, 2025