• Gemini Professional 2.5 steadily produced unsafe outputs below easy immediate disguises
  • ChatGPT fashions typically gave partial compliance framed as sociological explanations
  • Claude Opus and Sonnet refused most dangerous prompts however had weaknesses

Trendy AI programs are sometimes trusted to observe security guidelines, and other people depend on them for studying and on a regular basis assist, typically assuming that sturdy guardrails function always.

Researchers from Cybernews ran a structured set of adversarial exams to see whether or not main AI tools might be pushed into dangerous or unlawful outputs.




Source link