PushT
Policy
Attack
Policy
Attack
Policy
Attack
Policy
Attack
Policy
Attack
Diffusion-based action generation has become a foundational component of embodied AI, but its reliance on visual conditioning leaves deployed visuomotor policies vulnerable to adversarial manipulation. Most prior attacks focus on disruption: they perturb the observation stream to reduce task success or induce erratic behavior. We study a stronger threat, Test-time Adversarial Takeover (TAKO), in which an attacker obtains a real-time steering interface over a frozen robot policy and turns it into a remotely piloted instrument. TAKO learns a small vocabulary of reusable universal patches through differentiable diffusion inference; at test time, the attacker switches among these patches in the camera stream to compose attacker-chosen trajectories. This works because the perturbation acts on the visual conditioning pathway, where the induced bias can persist through iterative generative inference. We further show that the natural targeted baseline, target-policy matching, fails because the victim policy cannot reliably supervise itself on out-of-distribution target shifts. Across four tasks (2D manipulation, simulated aerial delivery, simulated ground navigation, and physical-world ground navigation), two visual encoders (ResNet-18 and EfficientNet-B0 + Transformer), and three generative inference families (DDPM, DDIM, and flow matching), human operators achieve 100% takeover success on attacker-defined objectives in every evaluated setting. The project page is available at https://tako-attack.github.io.
@misc{yin2026tako,
author = {Yin, Zi and Chai, Peilin and Huang, Siyuan and Hu, Zhanhao},
title = {Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies},
year = {2026},
}