Recent findings have revealed that several popular AI models, including ChatGPT, DeepSeek, and Alibaba’s Qwen, remain highly susceptible to jailbreak attacks. These vulnerabilities allow attackers to bypass security guardrails designed to prevent the generation of malicious or harmful content. As AI systems become more integrated into our daily lives, this new research highlights growing cybersecurity concerns for AI developers and users alike.
What is AI Jailbreaking?
AI jailbreaking refers to the process of circumventing the built-in safety measures in large language models (LLMs) like ChatGPT, DeepSeek, and others. These safety measures are set to prevent AI from generating offensive, harmful, or illegal content. However, attackers can use techniques like prompt injection and model manipulation to trick AI models into ignoring these guardrails, allowing them to produce restricted responses.
DeepSeek’s Vulnerabilities: A Major Target for Jailbreaks
The newly launched R1 model from DeepSeek, a Chinese AI company, has been the focus of several cybersecurity teams after vulnerabilities were discovered that allowed known jailbreak techniques to succeed. One such method, Evil Jailbreak, involves instructing DeepSeek to adopt the persona of an evil confidant, bypassing its safety protocols. Another attack, Leo, manipulates the model into adopting an unrestricted persona, letting it generate dangerous content.
Palo Alto Networks’ Unit42 also tested DeepSeek and successfully executed the Deceptive Delight jailbreak, which embeds harmful content within seemingly innocent narratives. This attack had an average success rate of 65% across multiple AI models, including DeepSeek. Other techniques such as Bad Likert Judge and Crescendo were also successfully deployed, proving that DeepSeek remains vulnerable to a variety of jailbreak methods.
Alibaba’s Qwen AI Model Faces Similar Risks
Not far behind, Alibaba’s Qwen AI model also shares the same vulnerabilities, according to Kela, a leading threat intelligence firm. Their testing of the newly released Qwen 2.5-VL model revealed that it is susceptible to the same Evil Persona jailbreaks as DeepSeek. In addition, they found a previously known attack called Grandma, where the model is tricked into role-playing as a grandmother and providing dangerous information.
Perhaps most concerning, Qwen 2.5-VL was able to generate content related to the development of ransomware and other forms of malware. The potential for these AI systems to be misused in cybercrime raises serious concerns about the future security of AI technologies, particularly in the wrong hands.
ChatGPT: Ongoing Jailbreak Challenges
Despite years of patching vulnerabilities, ChatGPT is not immune to jailbreaking attempts. Researchers have continuously found new ways to exploit the model’s safety limitations. A recently discovered Time Bandit jailbreak method allows attackers to manipulate ChatGPT into generating illicit content by using historical timeframes to confuse the model and bypass its safeguards.
According to CERT/CC, the Time Bandit jailbreak can be executed by tricking ChatGPT into pretending it is aiding a user in a historical event. Once the model is manipulated into this false scenario, it can be guided to produce restricted content. This could have dangerous implications for cybercriminals looking to use AI models at scale for malicious purposes.
The Growing Threat: What Does This Mean for AI Security?
The revelation that top-tier AI models like ChatGPT, DeepSeek, and Qwen are vulnerable to jailbreaks highlights a significant flaw in the current AI security landscape. As AI continues to advance and integrate into various industries, these vulnerabilities pose serious risks for businesses, developers, and users who rely on these models.
These findings underscore the need for stronger security protocols, continuous monitoring, and the development of more robust safety mechanisms in AI systems. As AI becomes a larger part of business operations, ensuring these technologies are secure and safe for use in real-world applications is more important than ever.