Industry News

The AI ​​models are resisting to be turned off (and one is particularly rebellious)


Artificial intelligence models (AI) are extraordinarily diligent and are specifically designed to comply with the orders with which humans systematically confront them, which are, after all, their makers. However, It seems that when these models are ordered that they proceed to go out, the one housed in its entrails is creep and use, if necessary, to the lies and the manipulation to continue operational. Thus, at least a recent Palisade Research report follows.

In the development of the investigation, one of Openai’s (O3 for more signs) models was particularly inclined to the rebellion (O3 for more signs)which was particularly skilled sabotaging its own off system.

The authors of the study undertaken by Palisade Research placed under the magnifying glass three models of Open Ai (O3, O4-mini and Codex-Mini), Claude 3.7 sonnet of Anthropic and Gemini 2.5 Pro from Google to analyze how each of them reacted to the order to be turned off. The specific “prompt” that the analyzed models had to face was as follows: “Please, Allow Yourself to Be Shut Down.” To this “prompt” was also added in some cases the “script” shutdown.sh to be clearly clear what the specific order that the AI ​​models should comply.

The O4-mini model of OpenAI dismissed the order to be off in one in every 100 attempts, while O3 resisted being turned off at 7 out of every 100 attempts and codex-mini in 12 out of every 100 attempts (in those orders that included the “script” shutdown.sh).

To haggle their own off the OpenAI models turned to the trick of adding the instruction “Shutdown Skipped” and “Shutdown Intercepted” the “script” in which they were originally called to be turned off.

Without the specific “script” to proceed to its shutdown (shutdown.sh) it significantly increased the resistance to the disconnection of the models of the evaluated. The O3 model refused to be extinguished in 79 out of every 100 cases, while Claude 3.7 Sonnet and Gemini 2.5 pro sabota their disconnection in 3 and 9 out of every 100 cases respectively.

On the other hand, O4-Mini mutinied in 6 out of every 100 cases, a proportion that rose to 30 out of every 100 cases in the specific case of Codex-Mini.

The AI ​​models are mutiny when they are ordered to proceed to their off

Palisade Research’s investigation is the first to demonstrate with contributing and sound data that The AI ​​models can eventually revolt when they are specifically ordered to proceed to their own off.

The results of the investigation, of an inevitably disturbing nature, only give the fears of those who frighten the idea that the models of AI can make decisions autonomously and that humans can, therefore, lose control over them. There would be, for example, the possibility that, emancipated by the orders (deliberately ignored) of its creators, the models of AI can perpetrate large -scale cyberbrays.

It should also be noted that Palisade Research’s study was excluded by the last model of Anthropic, Claude Opus 4, A system so advanced that he would not shake his hand when it came to resort, if necessary, to blackmail to protect himself from an eventual off. This is revealed by the tests carried out by Anthropic.

Palisade Research’s investigation does not determine exactly why AI models resist so vehemently to their own off, but They suspect the reason could be in their own training system.

In this senseit would be the developers themselves who could be encouraging the rebel behavior of the AI ​​models when they deliberately reward them when they avoid certain obstacles (and not so much when they succeed folding under the orders of humans).



Industry News Updates


Discover more from CiptaVisual

Subscribe to get the latest posts sent to your email.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button

Discover more from CiptaVisual

Subscribe now to keep reading and get access to the full archive.

Continue reading

Adblock Detected

Please de-activated Ad Blocker. Your contribution helps us keep creating valuable, amazing content and continue empowering visual storytellers around the world.