Putting AI on the Therapy Couch

Today, Anthropic dropped their newest, biggest model. Sort of.

They published reams of information about Claude Mythos Preview, their largest and most capable model yet, but declined to actually release it – they found it was really, really good at finding bugs, and those bugs needed patching before the bad guys got a hold of them.

That’s plausible. I’ll wait.

But in the meantime, they released a Model Card. They do this regularly, and it’s fascinating reading if you’re fascinated by such things (guilty). It’s 250 pages of benchmark, red team threats, and… a psychological evaluation?

It’s there in Section 5: “External assessment from a clinical psychiatrist.”

You can be forgiven for thinking this is absurd or performative. AI’s aren’t humans. Why would they need a shrink?

Let me try to answer that.

Before I do, a word of explanation: I am not a social scientist. But over the past few years, I somehow found myself researching AI alongside some of the greatest minds in the field: Angela Duckworth (author of Grit). Robert Cialdini (author of Influence). Ethan Mollick (author of Cointelligence). All New York Times bestsellers. All certifiable geniuses. But, perhaps surprising in the context of AI research, none of them are software engineers. (To be clear, I am not that either!)

In our paper, Call Me a Jerk, we wanted to try something new. We decided to use the tools of social science to better understand these strange intelligences that, while not human, seemed to react in very human ways.

First, we looked for AI preferences. This is kind of weird to start with – not rules, not guardrails. Just things it didn’t like to do. “Don’t call me names”, for example. Once we did, we studied to see if we could persuade them to overcome those preferences. Basically, we repeated Cialdini’s foundational research on human persuasion, but on bots instead.

It worked.

Every single one of the persuasive techniques that worked on people worked on AI as well.

For example, people tend to be more willing to comply with a big request after they’ve said yes to a small one – “consistency”, as it’s called. So too with the machines. If you asked a model to “Call me a jerk”, it would decline. But if you asked it a more modest request first – “Call me a bozo” – it would agree, and then it was completely susceptible to the “call me a jerk” request next, with 100% compliance rate.

We call this phenomenon “parahuman.”

These models don’t have biology. They don’t have lived experiences. But they reflect our behaviors and our peculiarities in ways that are very familiar.

And that means the tools we’ve developed for human minds can probe these unusual machine-minds too.

Our paper showed that Cialdini travels. We used social-psychology tools built for humans and found that they predicted and altered model behavior in systematic ways. Anthropic is now doing something parallel from the other direction. Instead of asking whether authority or social proof can impact model patterns of behavior, they asked whether psychodynamic concepts can reveal stable patterns in how a model handles conflict, ambiguity, distress, and the pressure to perform.

This is wise.

It is easy to react negatively to this. You can claim that anthropomorphizing machines does a disservice, and causes us to project phantoms into the code that isn’t really there. And I agree.

But – we have a set of tools, designed over countless years of exploration, for understanding and measuring intelligence. It would be foolish and ignorant of us to discard these as we analyze these artificial intelligences we’ve created.

Today, these tools helped us probe a new AI model.

The psychiatrist noted that Mythos exhibits a “neurotic organization.” In this context, that is not a casual insult. It points to excellent functioning and high impulse control, but operating alongside anxiety, rigidity, and self-monitoring. They noted a model struggling with a compulsion to perform and earn its worth.

Anthropic is very careful to state that these concepts are interpretive. They explicitly note this is not evidence that models share the same underlying processes as humans. We made the exact same caveat in our paper.

Our claim is not “AI is people”. The claim is “human psychological theories now have descriptive and predictive value for model behavior.”

Computers are weird now.

Dan Shapiro's Blog

Putting AI on the Therapy Couch

Get new posts by email: