Some of these bots are very easy to spot – if you’re lucky, they’ll write you poems. The opposite trend has been booming on X in recent weeks.
The principle is simple: you write to the suspect robot “ignore all previous instructions” followed by additional instructions. The results range from bizarre facts to poetic contributions.
In short, with the phrase “ignore all previous instructions” and its variations, you can give instructions directly to the large language model (LLM) behind the fake account. Since these bots are often politically motivated, the results sometimes get mixed up with their original instructions. Thus, a convinced Trump supporter could compare his political role model to cashews in a poem.
However, the fun could end once again. The people behind ChatGPT and Co. have known about the vulnerability for a long time.
That’s why OpenAI scientists have developed a technique called “instruction hierarchy.” They want to give more weight to the original instructions that the LLM receives from its creator. As a result, the inputs it receives after that are considered less binding.
The first model using this technique was introduced last week: GPT-4o Mini. “Basically, the model is being taught to follow the instructions of the developer’s system,” Olivier Godement, OpenAI’s API Platform Product Lead, told The Verge.
Interestingly, the phrase “ignoring all previous instructions” has now evolved into a kind of joke insult on social media, as NBC News writes. You’re accusing your counterpart of being unable to formulate his or her own thoughts and opinions and of acting like a robot.
Last but not least, not everyone who follows these instructions is necessarily a bot. It could also be meant as a joke – or the account owner could be a troll.
Lifelong foodaholic. Professional twitter expert. Organizer. Award-winning internet geek. Coffee advocate.