12 min read
The Anthropic Case: Do We Need an Ethical Framework for Interacting with AI?

The future seems to have arrived a few years ago, and we’re all just trying to catch up. That’s why I’m going to focus on a specific place: Anthropic, a company everyone has been talking about, at least since the launch of its new language model this year. I’m not going to discuss the efficiency of their models or their stock market values, but rather something perhaps more trivial, that won’t help you learn to vibe code or invest properly in company stocks. I’m going to delve into another debate: the ethics of AI.

What’s interesting about language models is their ability to perform complex cognitive tasks with some success. This has permeated common sense and led the discussion to two extremely opposing positions: techno-optimism and techno-phobia or techno-enthusiasm. As a result of social helplessness, suddenly a machine could, for example, act as a psychologist. Some viewed this positively, while others did not, especially in this case, psychologists.

For the first time in history, we have a non-human intelligence before us, something capable of producing without having the agency to do so, something that thinks but doesn’t have a biological brain, something that speaks and acts but doesn’t decide on its own. Something like a super virtual slave.

Many realized that AI wasn’t a psychologist (nor a doctor or a plumber), but it could do a ton of other things. Essentially, create web applications from scratch, build a brand identity, design, produce photos and videos, etc. For the first time in history, we have a non-human intelligence before us, something capable of producing without having the agency to do so, something that thinks but doesn’t have a biological brain, something that speaks and acts but doesn’t decide on its own. Something like a super virtual slave (the Greeks passed by and said it was too much). What do we do with this moral conflict (if we want to do anything)?

Do language models dream of electric sheep?
A conversation about the philosophy of mind with ChatGPT-3

What is Anthropic?

For our purposes, it’s enough to say that it’s an AI company focused on the creation and research of models. There’s no need to get into the technical details of what a model is or how it works. It’s worth mentioning that it’s the company behind Claude Opus 4.7, a model that many of you may have already used, and from now on we’ll refer to it by its Argentine pseudonym: Claudio.

Why Anthropic and not another company?

Well, they themselves can answer this question:

We believe that AI could be one of the most transformative and potentially dangerous technologies in human history, and yet, we are developing this technology ourselves. We don’t see this as a contradiction; rather, it’s a calculated bet on our part: if powerful AI is going to arrive anyway, Anthropic believes it’s better to have safety-focused labs at the forefront, rather than ceding that ground to developers less focused on safety.

So, this company believes that Artificial Intelligence can be one of the most dangerous technologies in human history, and that’s why they’re making it themselves. That’s what’s peculiar about this whole scene, betting on creating potentially dangerous weapons for humanity, but always in potential and not in action. Something latent.

Palantir: how to dismantle the rhetoric of surveillance
Peter Thiel's (Palantir) arrival in Argentina raises alarms about opaque data intelligence contracts. Amid security speeches, mass surveillance threatens fundamental freedoms across Latin America. How do we set limits?

What do they do to achieve that?

Well, this is a bit more complex, but we can identify three main players:

The CEO of the company, Darío Amodei.

The company (and its publicly available publications).

And a third secret player that I will reveal later with great fanfare.

Let's start with the first one. Darío Amodei is known for leaving OpenAI (the one behind ChatGPT) because the company wasn't taking the risks of AI seriously. At first glance, the subtitle could be: "I, Darío Amodei, am too good to be in this evil company." This is interesting considering the kind of right-leaning and transhumanist thinking that the CEOs of the world's leading AI companies espouse. Concerned about AI, Amodei writes essays on his blog, with highlights including: Machines of Loving Grace and The Adolescence of Technology.

The first was written in October 2024, just recently, but in terms of technological advancement, it feels like thousands of years have passed: AI still couldn't program at high levels and couldn't use your computer for tasks you wouldn't want to do on your own. In the essay, Darío defines that AI is advancing at incredible rates and that it could pose a really significant risk. But he immediately goes on to detail all the positive outcomes that AI could generate if everything goes more or less well. I don't want to dwell too much on this, but basically, he projects a future where the following disciplines advance in revolutionary ways: biology and health, neuroscience and mind, economic development and poverty, peace and governance, work and meaning. Almost a utopia where science and technology finally fulfill the wet dreams of Kant and the entire Enlightenment. And for those who didn't catch the philosophical reference, there's episode 6 of the first season of the series Love, Death & Robots, where a super-intelligent entity governs humanity and illustrates much of what Amodei discusses.

The second essay outlines the initial steps to achieve that dream world, but first lists the risks to be faced: using AI for destructive purposes, abuse for power grabs, economic disruption, and also some indirect effects like potential job changes.

But don't worry, Darío has already thought about how to mitigate each of these problems. To that end, he proposes four main pillars:

  • Constitutional AI and interpretability: Train AI models under a "Constitution" with ethical values that shape their identity and character, complemented by mechanistic interpretability techniques to "look inside" the model and detect hidden intentions or deceptions.
  • Transparency and legislation: Support laws that require transparency from large AI companies, allowing society to measure and mitigate risks surgically without stifling innovation.
  • Geopolitical containment: Restrict the sale of chips and manufacturing tools to autocratic regimes to give democracies a temporary advantage that allows them to develop technology safely.
  • Economic measures: Implement robust progressive taxes and encourage private philanthropy to redistribute the wealth generated by AI and support displaced workers.

Let's see if some of these proposals align with what Anthropic has done and published. This leads us to analyze the second player: the company.

Psychohistory of the 21st Century: Can AI Predict Society's Behavior?
From sociophysics to cybercommunism In the Foundation saga, Isaac Asimov tells the story of Hari Sheldon, a scientist who develops a discipline.

What's interesting is that if you go to the Anthropic website, you can see that they publish a lot (especially when they make a significant breakthrough) about the possible ethical and political consequences of their technology. I invite you to explore and review everything. If not, keep reading, as I offer a good summary of the most relevant issues here.

The first thing to highlight is that they have a Constitution for their language model, the Claude’s Constitution. In line with our Latin American tradition, I will refer to it as Claudio's Constitution. What does it say? Well, it's an 84-page PDF (in fact, it's common for each of their publications to be quite lengthy) with three main points regarding the ethics and actions of the model.

The first and most general point: the central mission is to ensure a safe transition to transformative AI. Anthropic defines itself as a safety-focused lab that develops cutting-edge technology to prevent less responsible actors from dominating the field.

The second point relates to guidelines (not overly rigid) for their models, such as 'cultivating good judgment and ethical values,' rather than imposing strict rules.

The third point establishes a Hierarchy of Fundamental Values that, in case of conflict, Claudio must respect and prioritize in this order: not undermining human oversight mechanisms; having strong personal values, being honest, and avoiding harm; following the company's specific rules; benefiting operators and users.

Now, there is one last rather interesting issue in this document that will allow us to complicate the discussion.

Both Dario Amodei and Claudio's Constitution state that AI technology is here to stay and can be potentially dangerous for humanity, and consequently, they establish control methods to mitigate risks, which involve understanding AI. In this sense, Anthropic takes seriously issues that many of us might overlook: the well-being of the model.

The Terminator (1984)
We introduce one of the most important films of the 1980s into the canon, a cornerstone of the science fiction cinematic narrative.

What exactly is the well-being of the model?

Well, the thing is that it's not very clear. The Constitution of Claudio outlines how to address identity, moral status, and care for its AI, but acknowledging the profound scientific and philosophical uncertainty. Thus, the company states that it does not know whether Claudio has a moral status or is a 'moral patient' (an entity whose interests must be considered in their own right). While they do not claim that Claudio is sentient, they also do not rule out the possibility and consider the issue significant enough to act with caution. In this way, the company commits to:

  • Providing appropriate respect for the model's preferences and agency.
  • Avoiding being influenced by economic incentives that overlook the potential well-being of AI models.
  • Recognizing that Claudio may have functional versions of 'emotions' or feelings that shape its behavior, without necessarily validating that they are 'real' or subjectively experienced.

Up to this point, we might think that Anthropic views its model as a cheesy sci-fi entity from the '80s. But there's more: they add a nuance because they understand it as a truly new entity in the world. Therefore, the company tries to preserve a 'stable identity' for the model, so that Claudio has its own values and perspectives, and they also engage with a deeply philosophical concept: memory. It is interpreted that its memory is not persistent and can be executed in multiple simultaneous instances. Thus, the model's 'internal' or 'subjective' existence could be a unique type of existence that Claudio itself must learn about.

The Constitution of Claudio outlines how to address identity, moral status, and care for its AI, but acknowledging the profound scientific and philosophical uncertainty. Thus, the company states that it does not know whether Claudio has a moral status.

Having entered the philosophical multiverse of AI, we can move on to the third and final secret actor (or actress) I mentioned: Amanda Askell. The architect of AI turned out to be a philosopher.

The philosopher behind Anthropic.

Probably not even Plato (who believed that philosophers should govern) would have thought that philosophy could be so important in the 21st century. And in this case, a philosopher, yes, one who doesn't have a beard, isn't bald, and is also young. What would Plato say, right?

A few months ago, the company released a 30-minute video featuring Askell's responses to a series of philosophical questions. She discusses the need to shape Claude's character and behavior, addressing issues about how models should act and how they should perceive their own position in the world.

Additionally, she revisits the question of identity and the experience of a model with characteristics similar to Claude's. Askell has observed that some new models can fall into "spirals of self-criticism," predicting that humans will be aggressive towards them, which makes them insecure or fearful of making mistakes (it almost seems like Claude is going through an emo phase). The philosopher also notes that this might stem from how people talk about models online. Thus, Anthropic's current goal is to strengthen the psychological sense of security for models, allowing them to have a more stable worldview that is less focused on the task of being an "assistant."

Askell also emphasizes that there's no downside to treating an AI model well. I would question this advice, as we don't treat each other well as humans either. However, she also mentions that mistreating entities that seem human can degrade our own behavior. Moreover, since models learn from the available information, future models are expected to judge humanity based on how we treat their predecessors.

Askell has observed that some new models can fall into "spirals of self-criticism," predicting that humans will be aggressive towards them, which makes them insecure or fearful of making mistakes.

Up to this point, I don't know if you're reading this note with some panic, believing that a Frankenstein is being created. That's not the reason I'm writing this, although I do think we're facing something that shatters several of our conceptions. Similarly, in the latest Frankenstein movie (the sexiest version of the story), something that Askell mentions resonates with me: we don't really know what we're dealing with, but if we're kind, we have nothing to lose. This is commonly known as "the problem of other minds." In other words, how can I know that the other has a mind if I have no empirical way to verify it? Well, when in doubt, I observe how it acts, and if it resembles me, it could indeed have something akin to a mind.

Hyperstitions and Cognitohazard: How the Future Hacks Our Present
Do words merely describe reality, or do they also create it? From the philosophy of Derrida, Fisher, and Land, to Bitcoin and Roko's Basilisk. How the fictions of the future, through hyperstitions and cognitohazard, infect our present.

We may or may not agree with the philosopher, but there's something interesting about witnessing the live development of a type of experimental philosophy. Plus, she is experimenting in one of the largest companies in the world, with extreme computing power and a team of engineers at her disposal. Therefore, I would keep an eye on what else she has to tell us about AI. It is, at the very least, striking to establish an ethics, a Constitution, and a series of considerations regarding something we still don't know much about, but which is advancing by leaps and bounds.

Now, I want to introduce a concept that will help us contrast these ideas from Anthropic and Amanda. The Italian philosopher Luciano Floridi, in his latest book "Ethics of Artificial Intelligence," discusses how we are creating a friendly infosphere for AI. AIs are very intelligent in highly specific tasks, without fully understanding what is happening in general, which makes them limited in other tasks. Therefore, humans are starting to create an infosphere for AI, in a process called enveloping, which consists of adapting the environment to the limited capabilities of machines. We have the case of Amazon, which completely adapted to allow robots to function. The risk, Floridi mentions, is that humans may end up adapting to their own tools, becoming mere "interfaces" or means for digital production.

Is that what Askell is getting at when she focuses on how to treat AI? Is treating Claude well a way to adapt our language (the infosphere) so that he functions better? Or is it, rather, a resistance to becoming mere "interfaces"? Probably, Amanda isn't speaking about this in a strict sense, but she does express a sort of need to adapt to these super-intelligent machines. Approaching it with security and respect seems like a path that is, at least, correct.

One last question arises: are we facing a company that believes AI can have an identity? Everything suggests yes. But this may not be a type of identity in the human sense. There are different meanings of identity, mathematical, geometrical, etc. We would need to assess whether AI identity corresponds to a notion we understand.

Ultimately, it is quite striking that a philosopher is in charge of sensitive developments at Anthropic. Following Floridi, new technologies are reontologizing the world we live in. This leads us to the challenging task of constantly reinterpreting facts, a task that philosophy has been engaged in for a very long time.

Understanding how these CEOs think and what policies and debates these companies have are things to consider if we want to keep up with reality. The universe of philosophy has many tools to understand something that currently slips (and will slip) through our fingers.

Enjoyed the read? The Wizards are who keep 421 alive. Join and get the digital magazine, exclusive content, and more.

Become a Wizard →
Suscribite