报告题目:An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code-Generation Large Language Models (LLMs)
报告时间:2025年1月6日上午10:00
报告地点:美高梅4688集团amB404会议室
报告人:洪源
报告人单位:美国康涅狄格大学(University of Connecticut)
报告人简介:Yuan Hong is an Associate Professor in the School of Computing at the University of Connecticut (UConn), where he directs the Data Security and Privacy (DataSec) Laboratory. His research spans security, privacy, and trustworthy machine learning, with a focus on areas such as differential privacy, secure computation, applied cryptography, adversarial attacks and provable defenses in machine learning, computer vision, (large) language models, and cyber-physical systems. His research works are prolifically published in top-tier conferences in Security (e.g., S&P, CCS, USENIX Security, NDSS) and Data Science (e.g., SIGMOD, VLDB, NeurIPS, CVPR, ECCV, EMNLP, KDD, AAAI), as well as in top interdisciplinary journals. He is a recipient of the NSF CAREER Award (2021), Cisco Research Award (2022, 2023), CCS Distinguished Paper Award (2024), and the finalist of the Meta Research Award (2021). He regularly serves on the technical program committee (PC) or as a Senior PC member for top security and data science conferences and is an Associate Editor for IEEE Transactions on Dependable and Secure Computing (TDSC) and Computers & Security.
报告摘要:Large Language Models (LLMs) have transformed code-generation tasks, providing context-based suggestions or generations to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. In this talk, I will present our CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code-generation LLMs. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CodeBreaker leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CodeBreaker stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CodeBreaker across various settings, validating its superiority over existing approaches. By integrating insecure payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures, underscoring the critical need for more robust defenses for code-generation LLMs. Source code, vulnerability analysis, and related materials are available at https://github.com/datasec-lab/CodeBreaker/.
邀请人:程大钊、车越之