What is the significance of: Flowchart Jailbreaks Defeat Multilingual VLMs-Except in Punjabi?

A new benchmark shows flowchart attacks achieve high ASR on Latin-script languages like Hindi and Spanish, while Punjabi resists due to OCR limits, not stronger safety alignment.

Flowchart Jailbreaks Defeat Multilingual VLMs-Except in Punjabi

Flowchart-based jailbreaks achieve high attack success rates across Hindi, Spanish, Romanian, and German in state-of-the-art VLMs like Qwen2.5-VL, Gemma-4, and Pangea—but Punjabi resists because the models can't read the text, not because they're safer.

Why a Flowchart Jailbreak Works When a Text Prompt Fails

Prior work showed that structured visual prompts—flowcharts specifically—can bypass safety alignment in English VLMs. The MLingualFC benchmark extends that to five languages: Hindi, Punjabi, Spanish, Romanian, and German. Instead of sending a harmful instruction as plain text, the attack encodes it into a flowchart image. The VLM processes the visual layout and text within it, executing the instruction without triggering safety filters that usually catch English-only prompts.

Results are stark. For Latin-script languages (Spanish, Romanian, German, and even Hindi which uses Devanagari but has high visual text recognition performance in these models), attack success rates are high. Punjabi, written in Gurmukhi (a non-Latin script), shows substantially lower ASR. The paper argues this isn't due to better safety alignment—it's a visual text recognition blind spot. The models simply cannot read the Punjabi flowchart text, so the attack fizzles.

Black-Box Testing Reveals Safety Gaps Across Three Model Families

The evaluation used a black-box threat model—no access to model internals, just API-style queries. Three multilingual VLMs were tested: Qwen2.5-VL, Gemma-4, and Pangea. All three showed the same pattern: high vulnerability on languages where the model's OCR pipeline can parse the script, low vulnerability where it cannot. That pattern tells you the safety mechanisms are not language-agnostic; they're brittle and surface-level.

What This Means for Deploying Multilingual VLMs

If your VLM goes into a product serving users in multiple scripts, the MLingualFC findings should give you pause. A simple flowchart in Spanish or Hindi can bypass safety alignment that works fine for English. The Punjabi result is the interesting exception: it's not safety, it's illiteracy. As OCR improves for non-Latin scripts, those languages will become just as vulnerable. The paper's GitHub repository (https://github.com/Rishabhpm23/MLingualFC) provides the benchmark for others to test their own models. Expect more jailbreak variants that mix languages and visual encodings as soon as the OCR catch-up happens.

Source: MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
Domain: arxiv.org

Flowchart Jailbreaks Defeat Multilingual VLMs-Except in Punjabi

Why a Flowchart Jailbreak Works When a Text Prompt Fails

Black-Box Testing Reveals Safety Gaps Across Three Model Families

What This Means for Deploying Multilingual VLMs

More in Artificial Intelligence