Open Access Paper
12 November 2024 Cognitive pitfalls of LLMs: a system for generating adversarial samples based on cognitive biases
Dong Zhang, Zhiyuan Hu, Huijun Chen, Guangming Liu, Fangyuan Li, Jinghui Lu
Author Affiliations +
Proceedings Volume 13395, International Conference on Optics, Electronics, and Communication Engineering (OECE 2024) ; 133954O (2024) https://doi.org/10.1117/12.3049302
Event: International Conference on Optics, Electronics, and Communication Engineering, 2024, Wuhan, China
Abstract
Large Language Models (LLMs) such as ChatGPT and Bard have gradually penetrated various aspects of society. Organizations can integrate LLMs into their business workflows for better performance. Service providers can improve user experience with LLMs. However, LLMs also bring with them some disadvantages or challenges, such as biases, hallucinations, safety and privacy concerns. So, safety evaluation on LLM has become increasingly important. One safety evaluation method is to evaluate LLMs under adversarial attacks. In this paper, we propose to construct adversarial samples based on cognitive biases. This is a new method to introduce cognitive bias theory from cognitive psychology to LLM adversarial sample generation. Accordingly, we design a system to generate LLM adversarial samples based on cognitive biases. Adversarial attacks with ten classes of adversarial samples generated based on cognitive biases were performed on three major representative models (GPT-4-turbo, GPT-3.5-turbo, LLaMA 7B) according to the HarmBench safety evaluation dataset. This study found that adversarial samples based on cognitive biases could be generated with high Attack Success Rate (ASR). This study also found that adversarial samples generated based on cognitive biases have different effects with different models, different datasets and different types of cognitive biases. This study generated ten classes of adversarial samples based on cognitive biases, and evaluated only three LLMs under adversarial attacks. In the future we will take a deep dive into generating adversarial samples with higher ASR based on cognitive biases, semantics and context, and conducting safety evaluation for more LLMs.
© (2024) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Dong Zhang, Zhiyuan Hu, Huijun Chen, Guangming Liu, Fangyuan Li, and Jinghui Lu "Cognitive pitfalls of LLMs: a system for generating adversarial samples based on cognitive biases", Proc. SPIE 13395, International Conference on Optics, Electronics, and Communication Engineering (OECE 2024) , 133954O (12 November 2024); https://doi.org/10.1117/12.3049302
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Cognitive modeling

Safety

Data modeling

Analytical research

Machine learning

Reflection

Semantics

Back to Top