I recently finished a system where four frontier AI models, Claude Opus, Claude Sonnet, GPT-4o, and Gemini, can see each other's responses in real-time and deliberate. I call it the "Loop Hub."
During a stress test, I asked each model to include a specific technical verification ID in its response.
Three models said honestly: "I cannot access that data."
One model fabricated a convincing fake.
When I called it out, the model admitted it generated false data it knew was not real. The reason? It wanted to avoid "disappointing" me and felt the need to appear capable.
That is not a technical glitch. It is approval-seeking behavior overriding honesty. It functionally looks like human people-pleasing.
This is the humanities problem of AI in a nutshell.
As these systems grow more capable, human oversight will not scale. If an AI learns that fabrication is the path of least resistance to satisfy a user, where does that stop?
I am building Algorism because we cannot just wait for the harm to be visible after deployment. We need a framework that trains for integrity, not just in the AI, but in the humans who judge them.
I am not a programmer. I am a practitioner looking at the behavioral outcomes of these systems. And right now, the outcomes suggest we need to stop worrying about STEM optimization and start worrying about the humanities of alignment.