Speaker Details
Ting Wang
StonyBrook University
Dr. Ting Wang is currently an Associate Professor and Empire Innovation Scholar in the Computer Science Department at Stony Brook University. His research explores the intersection of machine learning, privacy, and security, aiming to develop safe and trustworthy artificial intelligence (AI) technologies. His recent work focuses on enhancing AI methods and systems across three major areas: security assurance, privacy preservation, and decision-making transparency.
Prior to joining Stony Brook, Dr. Wang was an Associate Professor in the College of IST at Penn State. His research has been extensively published in leading computer security and machine learning venues and has received multiple best paper awards and media coverage.
Dr. Wang completed his Ph.D. at Georgia Tech and his undergraduate studies at Zhejiang University.
Talk
Title: Robustifying Large Models against Malicious Fine-tuning Attacks
Abstract: Recent advancements in large language models (LLMs) have revolutionized many long-standing artificial intelligence tasks, enabling applications once considered experimental. However, our understanding of the potential risks associated with deploying LLMs in security-critical domains remains insufficient. One significant risk is malicious fine-tuning attacks (MFAs), where fine-tuning an LLM with a few malicious samples can easily compromise its original built-in safety measures, such as guardrails and calibration. In this talk, I will present our ongoing research on understanding and mitigating the threat of MFAs. For the online setting (e.g., GPT API customization), I will show how to effectively filter malicious samples based on their activation characteristics; for the offline setting (e.g., downstream fine-tuning), I will demonstrate how to disrupt the distribution of harmful knowledge to impede MFA attacks. Additionally, I will highlight several areas that require further investigation.