Journals & Magazines >IEEE Transactions on Neural N... >Volume: 36 Issue: 11

A Survey of Attacks on Large Vision–Language Models: Resources, Advances, and Future Trends

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

With the significant development of large models in recent years, large vision–language models (LVLMs) have demonstrated remarkable capabilities across a wide range of mu...Show More

Metadata

Abstract:

With the significant development of large models in recent years, large vision–language models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared with traditional large language models (LLMs), LVLMs present great potential and challenges due to their closer proximity to the multiresource real-world applications and the complexity of multimodal processing. However, the vulnerability of LVLMs is relatively underexplored, posing potential security risks in the daily use of LVLM applications. In this article, we provide a comprehensive review of the various forms of existing LVLM attacks. Specifically, we first introduce the background of attacks targeting LVLMs, including the attack preliminary, attack challenges, and attack resources. Then, we systematically review the development of LVLM attack methods, such as adversarial attacks that manipulate model outputs, jailbreak attacks that exploit model vulnerabilities for unauthorized actions, prompt injection attacks that engineer the prompt type and pattern, and data poisoning that affects model training. Finally, we discuss promising future research directions in LVLM attacks. We believe that our survey provides insights into the current landscape of LVLM vulnerabilities, inspiring more researchers to explore and mitigate potential safety issues in LVLM developments.

Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 36, Issue: 11, November 2025)

Page(s): 19525 - 19545

Date of Publication: 18 August 2025

ISSN Information:

PubMed ID: 40824980

DOI: 10.1109/TNNLS.2025.3592935

Funding Agency:

Contents

I. Introduction

Large vision–language models (LVLMs) have achieved significant success and demonstrated promising capabilities in various multimodal downstream tasks, such as text-to-image generation [94], [106], [109], [121], [123], [131], visual question–answering [2], [74], [139], [145], [172], image captioning [164], [169], [177], and image–text retrieval [113], due to an increase in the amount of data, computational resources, and number of model parameters. By further benefiting from the strong comprehension of large language models (LLMs) [13], [18], [62], [67], [89], [136], recent LVLMs [40], [52], [88], [191] on top of LLMs show superior performances in solving complex vision–language tasks by utilizing appropriate human-instructed prompts. Despite their remarkable capabilities, the increased complexity and deployment of LVLMs have also exposed them to various security threats and vulnerabilities, making the study of attacks on these LVLM models a critical area of research.

References is not available for this document.

A Survey of Attacks on Large Vision–Language Models: Resources, Advances, and Future Trends

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Survey of Attacks on Large Vision–Language Models: Resources, Advances, and Future Trends

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?