Flowchart-based jailbreaks achieve high attack success rates across Hindi, Spanish, Romanian, and German in state-of-the-art VLMs like Qwen2.5-VL, Gemma-4, and Pangea—but Punjabi resists because the models can't read the text, not because they're safer.
Why a Flowchart Jailbreak Works When a Text Prompt Fails
Prior work showed that structured visual prompts—flowcharts specifically—can bypass safety alignment in English VLMs. The MLingualFC benchmark extends that to five languages: Hindi, Punjabi, Spanish, Romanian, and German. Instead of sending a harmful instruction as plain text, the attack encodes it into a flowchart image. The VLM processes the visual layout and text within it, executing the instruction without triggering safety filters that usually catch English-only prompts.
Results are stark. For Latin-script languages (Spanish, Romanian, German, and even Hindi which uses Devanagari but has high visual text recognition performance in these models), attack success rates are high. Punjabi, written in Gurmukhi (a non-Latin script), shows substantially lower ASR. The paper argues this isn't due to better safety alignment—it's a visual text recognition blind spot. The models simply cannot read the Punjabi flowchart text, so the attack fizzles.
Black-Box Testing Reveals Safety Gaps Across Three Model Families
The evaluation used a black-box threat model—no access to model internals, just API-style queries. Three multilingual VLMs were tested: Qwen2.5-VL, Gemma-4, and Pangea. All three showed the same pattern: high vulnerability on languages where the model's OCR pipeline can parse the script, low vulnerability where it cannot. That pattern tells you the safety mechanisms are not language-agnostic; they're brittle and surface-level.
What This Means for Deploying Multilingual VLMs
If your VLM goes into a product serving users in multiple scripts, the MLingualFC findings should give you pause. A simple flowchart in Spanish or Hindi can bypass safety alignment that works fine for English. The Punjabi result is the interesting exception: it's not safety, it's illiteracy. As OCR improves for non-Latin scripts, those languages will become just as vulnerable. The paper's GitHub repository (https://github.com/Rishabhpm23/MLingualFC) provides the benchmark for others to test their own models. Expect more jailbreak variants that mix languages and visual encodings as soon as the OCR catch-up happens.
Source: MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
Domain: arxiv.org
Comments load interactively on the live page.