Abstract: Large Language Models (LLMs) are vulnerable to deceptive jailbreak attacks inducing harmful outputs. Existing defenses suffer from catastrophic forgetting in continual defense learning ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results