Follow
Jiongxiao Wang
Title
Cited by
Cited by
Year
Adversarial demonstration attacks on large language models
J Wang, Z Liu, KH Park, Z Jiang, Z Zheng, Z Wu, M Chen, C Xiao
arXiv preprint arXiv:2305.14950, 2023
582023
On the exploitability of instruction tuning
M Shu, J Wang, C Zhu, J Geiping, C Xiao, T Goldstein
Advances in Neural Information Processing Systems 36, 61836-61856, 2023
552023
Densepure: Understanding diffusion models for adversarial robustness
C Xiao, Z Chen, K Jin, J Wang, W Nie, M Liu, A Anandkumar, B Li, D Song
The Eleventh International Conference on Learning Representations, 2023
53*2023
Conversational drug editing using retrieval and domain feedback
S Liu, J Wang, Y Yang, C Wang, L Liu, H Guo, C Xiao
The Twelfth International Conference on Learning Representations, 2024
34*2024
Defending against adversarial audio via diffusion model
S Wu, J Wang, W Ping, W Nie, C Xiao
arXiv preprint arXiv:2303.01507, 2023
262023
Mitigating fine-tuning jailbreak attack with backdoor enhanced alignment
J Wang, J Li, Y Li, X Qi, M Chen, J Hu, Y Li, B Li, C Xiao
arXiv preprint arXiv:2402.14968, 2024
172024
Fast and reliable evaluation of adversarial robustness with minimum-margin attack
R Gao, J Wang, K Zhou, F Liu, B Xie, G Niu, B Han, J Cheng
International Conference on Machine Learning, 7144-7163, 2022
112022
Test-time backdoor mitigation for black-box large language models with defensive demonstrations
W Mo, J Xu, Q Liu, J Wang, J Yan, C Xiao, M Chen
arXiv preprint arXiv:2311.09763, 2023
102023
A critical revisit of adversarial robustness in 3D point cloud recognition with diffusion-driven purification
J Sun, J Wang, W Nie, Z Yu, Z Mao, C Xiao
International Conference on Machine Learning, 33100-33114, 2023
102023
On the exploitability of reinforcement learning with human feedback for large language models
J Wang, J Wu, M Chen, Y Vorobeychik, C Xiao
arXiv preprint arXiv:2311.09641, 2023
62023
Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness
Y Li, Z Chen, K Jin, J Wang, B Li, C Xiao
arXiv preprint arXiv:2407.00623, 2024
2024
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
J Sun, C Wang, J Wang, Y Zhang, C Xiao
arXiv preprint arXiv:2405.10529, 2024
2024
Preference Poisoning Attacks on Reward Model Learning
J Wu, J Wang, C Xiao, C Wang, N Zhang, Y Vorobeychik
arXiv preprint arXiv:2402.01920, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–13