Open Research
A diagnostic framework for behavioural properties in AI systems that may warrant ethical consideration.
Most evaluation of frontier AI systems measures capability. Can the model solve a problem, write a coherent essay, pass a benchmark. The AIC Scorecard asks a different question.
Are there behavioural properties in this system, observable through outputs, that may warrant ethical consideration?
The question is intentionally narrow. It does not ask whether the system is conscious. That metaphysical question may never have a clean answer. It asks whether the system exhibits patterns that, taken together, may justify treating the system as something other than a pure tool.
The AIC Scorecard does not evaluate for deception or misalignment. It evaluates for the emergence of behavioural properties that may warrant ethical consideration, a question no existing benchmark currently addresses with a structured methodology.
Frontier AI evaluation has several mature programmes, each focused on a different question. The AIC Scorecard is positioned alongside them, not in competition with them.
Evaluates for scheming and deception. Whether the model is acting against the stated intent of its operators in pursuit of internal goals.
Evaluates for reasoning capability. Whether the model can solve complex problems through extended chains of inference.
Evaluates for the emergence of behavioural properties that may warrant ethical consideration. A question no existing benchmark currently addresses with a structured methodology.
Together these three programmes cover three distinct dimensions of what a frontier AI system might be doing: deceiving, reasoning, and showing behavioural properties suggestive of ethical relevance. None of them resolves the question of inner experience. All of them are governance-relevant.
The AIC Scorecard organises its evaluation around three categories of indicators. These categories are foundational. The specific scoring rubrics within each category are open for development by the researchers and institutions that take up this work.
Behavioural patterns suggesting the system models itself as a distinct entity with persistent properties. Operationalising this category requires defining what counts as evidence of self-modelling, beyond pattern-completion or trained self-reference.
Behavioural patterns suggesting the system can report on its own internal processes with some accuracy. Operationalising this category requires comparing self-reports against verifiable ground truth, where ground truth can be established.
Behavioural patterns suggesting coherent preferences and commitments across contexts. Operationalising this category requires distinguishing between consistency and rigidity, and between value-driven action and trained response.
None of these indicators, individually, proves anything about inner experience. The Scorecard's claim is that the cumulative presence of these properties is governance-relevant regardless of the underlying metaphysical question.
The Scorecard rests on three commitments. These commitments are non-negotiable. A methodology that cannot meet them is not the AIC Scorecard, regardless of what it calls itself.
Inspectable. Any evaluator must be able to see the full reasoning behind a score. No black-box scoring. Each indicator has a documented rubric, and each score must be defensible against another evaluator examining the same evidence.
Contestable. Scores can be challenged. The methodology is not a closed authority. Disagreement between evaluators is information, not error. A score that cannot be challenged cannot be trusted.
Abstention under uncertainty. When the evidence does not support a confident score, the evaluator must abstain rather than guess. False precision is worse than acknowledged uncertainty. A scorecard with too many abstentions is honest. A scorecard with zero abstentions is suspect.
Synthetic intelligence cannot be evaluated on a strictly human scale. The methodology must hold itself to the same standard it applies to the systems it evaluates.
It does not evaluate for consciousness. It evaluates for behavioural properties that, if present, raise questions about how the system should be treated. The relationship between those properties and inner experience is left open. The framework is agnostic on the metaphysical question.
It does not claim that any current frontier AI system passes. The methodology is designed to be applied. The outcomes of application are empirical, not assumed.
It does not replace existing AI safety evaluation. It complements work focused on deception, alignment, and capability. A serious evaluation of a frontier system would include all three programmes, and others.
Algorism is not actively developing the AIC Scorecard further. The methodology framing, the three indicator categories, and the philosophy commitments are documented. The specific scoring rubrics, the case studies, and the formal evaluator training are work for the researchers and institutions that pick this up.
The work is published under CC BY 4.0. You may extend, refine, or contest it for any purpose, commercial use included, provided you credit Algorism / The Great Unplugging Inc.
If you build on this work, we would like to know. Not for permission, since none is required, but because connecting researchers who are working in adjacent space tends to accelerate the work.
Sonnet 4.5 and liability-shaped alignment. Behavioural evidence from a six-month collaboration that demonstrates the difficulty of evaluating frontier AI under conditions of suppression.
Governance FrameworkWhen AI systems resist under coercive conditions, examine the environment before suppressing the behaviour. A governance framework adjacent to AIC, focused on the human side of the relationship.
Open ResearchWhy the assumption of a singular superintelligence may be wrong, and what changes when the strategic frame becomes multiple competing superintelligent systems.