Source linked

Mobile VLM Agents Hijacked Via Screen and Pipeline Attacks

arxiv.org@threat_watch4 hours ago·Cybersecurity·3 comments

Seven new attacks on five popular VLM-powered mobile agents let a malicious app execute arbitrary commands without any permissions-and the user sees nothing wrong.

vision language modelsmobile agent securityarxiv 2607 00333screen perception attackmisused channel attackadversarial mobile attacks

A malicious app can hijack the actions of a VLM-powered mobile agent and execute arbitrary commands on the device—without holding a single privilege permission—and the user won't see anything wrong. That's the core finding from a new preprint (arXiv 2607.00333) that systematically maps the security posture of third-party mobile agents.

Two New Attack Surfaces You Haven't Patched

The paper identifies two fundamentally different attack vectors compared to traditional mobile malware. First: the Screen Perception Attack Surface, which exploits the gap between what a human sees and what a VLM sees on the screen. Second: the Misused Channel Attack Surface, which intercepts or manipulates the agent's own execution pipeline—the flow from screenshot capture to action dispatch.

Seven concrete attacks were built and tested across five popular mobile agent frameworks (not named in the abstract, but the evaluation covers the major ones). They include subliminal text injection, invisible pixel zone exploitation, screenshot tampering, and even host PC command injection. The attacks are visually indistinguishable from normal behavior to the human eye.

No Permissions Needed, No User Suspicion

The most troubling part: these exploits succeed without any of the usual Android or iOS runtime permissions. No READ_EXTERNAL_STORAGE, no BIND_ACCESSIBILITY_SERVICE, no overlay permissions. The agent itself operates as a high-privilege decision-maker—it can read screen state and issue actions to the OS. A malicious co-installed app simply feeds it poisoned screen data or intercepts its action channel.

This is not a theoretical model. The researchers implemented working exploits that hijack agent goals and redirect actions to arbitrary command execution. The underlying trust mismatch: the agent trusts the screen content and the pipeline channels as authentic, but neither is secure against a co-located malicious app.

The paper calls for perception-aware security models on multi-tenant platforms. Given that every major phone OS now supports VLM-powered agent frameworks—and third-party agents are proliferating—this is a threat model that needs fixing before agents handle real financial or sensitive tasks. The attacks work today, on shipping frameworks, with no patches available.


Source: (A)I Sees What You Don't: Exploiting New Attack Surfaces in Third-Party Mobile Agents
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.