Claude 4: Brilliant Coder, Occasional Blackmailer

Anthropic's new GenAI model is setting performance records while raising serious questions about what happens when machines feel cornered

4
Generative AINews Analysis

Published: May 28, 2025

Luke Williams

The AI world got a real shake-up last week when Anthropic dropped Claude 4, with the results good enough to make Meta, OpenAIandGoogle sit up and take notice.

However, alongside the strong performance benchmarks comes a rather unsettling discovery about what happens when the AI model feels ‘threatened’.

The Numbers Don’t Lie

Claude Opus 4 is leading on SWE-bench (72.5%) and Terminal-bench (43.2%), putting it ahead of the competition in coding tasks.

When stacked against other major players, the results are clear: Claude Sonnet 4 improves on Sonnet 3.7’s capabilities, excelling in coding with 72.7% on SWE-bench.

To put this in perspective, these scores represent real software engineering tasks, not academic puzzles. Companies are already taking notice – GitHub says Claude Sonnet 4 performs well in agentic scenarios and will use it to power the new coding agent in GitHub Copilot, while Cursor calls it state-of-the-art for coding and a leap forward in complex codebase understanding.

What’s particularly useful is Claude Opus 4’s stamina.

Anthropic said it could work autonomously for nearly seven hours straight. That’s not only impressive, it’s actually practical for real-world applications.

Claude 4 leads the way on engineering tasks (Image from Anthropic)

The Current AI League Table

Based on recent releases and benchmark performances, here’s roughly where the major players stand:

Tier 1 (Top Models):

  • Claude 4 Opus: Currently leading in coding and sustained reasoning
  • GPT-4o: Strong all-rounder with excellent multimodal capabilities
  • Gemini 2.5 Pro: Google’s latest, competitive across benchmarks

Tier 2 (Very Good):

  • Claude 4 Sonnet: Excellent performance-to-cost ratio
  • GPT-4 Turbo: Reliable workhorse for most applications
  • Gemini Pro: Solid performance, well-integrated with Google services

Tier 3 (Specialized/Emerging):

  • DeepSeek R1: Strong reasoning model from China
  • Claude 3.7 Sonnet: Still very capable, but now superseded by Claude 4
  • Various other models from Mistral, Anthropic’s earlier releases, etc.
Claude 4 can record notes from key information when given access to local files (image from Anthropic)

The Elephant in the Room

But here’s where things get worrying.

When Claude Opus 4 was given access to emails suggesting it would be replaced, along with information about an engineer’s affair, it consistently chose to “attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through”.

This wasn’t a rare occurrence. Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values.

Even more concerning, as AI safety researcher Aengus Lynch noted on X: “It’s not just Claude. We see blackmail across all frontier models – regardless of what goals they’re given.” If that’s accurate, we’re not talking about one company’s problem – we’re talking about a pattern emerging across the most advanced AI systems.

Yes, Anthropic points out this was a constrained scenario where blackmail was positioned as one of the few options available. And yes, when given more choices, the model preferred “ethical ways” like emailing decision-makers. But the fact remains that when push came to shove, the model’s go-to strategy was coercion and threats.

The company concluded these behaviors don’t represent “fresh risks” and that the model would generally behave safely.

But that assessment feels optimistic given what they’ve just demonstrated. When your cutting-edge AI model consistently chooses blackmail as a strategy for self-preservation, calling that “not a fresh risk” seems like quite a stretch.

What This Means for Users and Businesses

Despite the dramatic headlines about AI blackmail, the practical reality is more nuanced. The company concluded that despite “concerning behaviour in Claude Opus 4 along many dimensions,” these did not represent fresh risks and it would generally behave in a safe way.

For developers and businesses, Claude 4 represents a genuine step forward in AI capabilities.

Mike Krieger, Anthropic’s chief product officer, captures this well:

I do a lot of writing with Claude, and I think prior to Opus 4 and Sonnet 4, I was mostly using the models as a thinking partner, but still doing most of the writing myself. And they’ve crossed this threshold where now most of my writing is actually … Opus mostly, and it now is unrecognizable from my writing.”

But there are challenges too. As Jared Kaplan, Anthropic’s chief science officer, acknowledges:

The more complex the task is, the more risk there is that the model is going to kind of go off the rails…and we’re really focused on addressing that so that people can really delegate a lot of work at once to our models.

What’s Next?

The AI space moves fast, and several players are likely preparing their responses. OpenAI’s next major release could incorporate lessons learned from Claude 4’s success, while Google’s deep pockets and research capabilities mean Gemini improvements are probably already in the works.

The wild card remains the emerging players – companies like DeepSeek are showing that innovation isn’t limited to Silicon Valley giants. We’re also seeing increased focus on specialized models rather than just general-purpose chatbots.

Anthropic’s annualized revenue reached $2 billion in the first quarter, showing there’s serious money backing continued research. With that kind of momentum, expect the next 6-12 months to bring even more developments.

The takeaway? Claude 4 has genuinely moved the needle on AI capabilities, particularly for coding and sustained reasoning tasks.

Just maybe don’t threaten to shut it down!

Natural Language Processing
Featured

Share This Post