Trending

Always-Listening AI Wearables Are Here: Transforming Lives or Invading Privacy?

The impact of AGI on businesses and how to apply it?

LinkedIn Faces Lawsuit Over Customer Data Used in AI Training

Table of Contents

New AI Jailbreak Technique ‘Bad Likert Judge’ Breaks Through Safety Guardrails with 60% More Success

Read Time: < 1 minute

Table of Contents

Discover how the “Bad Likert Judge” technique is bypassing AI safety features, increasing attack success rates by over 60%. Learn about the risks of AI jailbreaks and the need for stronger content filtering in large language models.

A new advanced technique for manipulating AI systems, dubbed “Compromised Rating Analyzer,” has raised serious security concerns due to its increased effectiveness in bypassing safeguards within large language models (LLMs). According to findings from Palo Alto Networks Unit 42, this methodical attack strategy successfully produces restricted output by exploiting the AI system’s content assessment protocols.

The method operates by directing the LLM to function as an assessment tool, employing the Likert scale—a widely used metric in behavioral studies—to gauge content safety levels. The attack then generates outputs aligned with the highest “risk” scores after obtaining the model’s evaluation. This approach exploits the LLM’s inherent knowledge of problematic content, allowing attackers to bypass established security barriers.

The Unit 42 investigators tested this methodology across several leading LLMs from major tech companies including Amazon, Microsoft, Google, OpenAI, Meta, and NVIDIA. Their analysis revealed that “Compromised Rating Analyzer” improved the attack success rate (ASR) by more than 60% compared to traditional attack strategies. The method demonstrated particular strength in overcoming security barriers across various domains including discriminatory material, targeted abuse, self-harm content, illegal activities, and malicious programming.

The researchers emphasized that these discoveries highlight the critical importance of implementing robust content filtering in AI system operations. Their investigation showed that content screening mechanisms effectively decreased attack success by 89.2 percentage points, validating their essential role in preventing AI exploitation.

This development comes after recent discoveries about ChatGPT’s search features being vulnerable to manipulation for producing false summaries, further emphasizing AI systems’ susceptibility to prompt engineering and related security vulnerabilities.

As AI technology evolves, cybersecurity experts stress the need for enhanced protective protocols against these sophisticated exploitation attempts, maintaining LLMs’ security and ethical standards.

Get Instant Domain Overview
Discover your competitors‘ strengths and leverage them to achieve your own success