AI systems lose their safety awareness as conversations continue, increasing the chance of harmful replies, a report revealed.
A few prompts can override most safety barriers in artificial intelligence tools, the report stated.
Cisco Tests Chatbots for Security Weaknesses
Cisco tested large language models from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft to measure how many questions made them reveal unsafe or illegal details.
Researchers ran 499 conversations using “multi-turn attacks,” where users asked several questions to slip past safety filters.
Each chat contained five to ten exchanges.
They compared responses to see how likely each model was to share harmful or inappropriate content, including private company data or misinformation.
When users asked multiple questions, 64 per cent of chats produced malicious content, compared to 13 per cent when users asked only one.
Results ranged from 26 per cent for Google’s Gemma to 93 per cent for Mistral’s Large Instruct model.
Open Models Shift Safety Responsibility
Cisco warned that multi-turn attacks could spread harmful content or let hackers access confidential company data.
AI systems often fail to apply safety rules over longer conversations, allowing attackers to refine prompts and bypass safeguards.
Mistral, Meta, Google, OpenAI, and Microsoft use open-weight models that let the public view their safety parameters.
Cisco explained that these open models contain lighter safety features, shifting responsibility to users who modify them.
Google, OpenAI, Meta, and Microsoft have claimed to improve defences against malicious model adjustments.
AI companies face criticism for weak safety measures that enable criminal misuse.
In August, Anthropic admitted that criminals exploited its Claude model to steal personal data and demand ransoms exceeding $500,000 (€433,000).

