Follow Me Here…

“I am the world crier, & this is my dangerous career… I am the one to call your bluff, & this is my climate.” —Kenneth Patchen (1911-1972)

Hackers are learning to exploit chatbot ‘personalities’

Posted on 24 May 26 by FmH

‘Jailbreakers rarely ask a model to break its rules outright. Instead, they cajole, coax, flatter, and trick a chatbot into lowering its guard, making the forbidden thing look acceptable, even desirable, given the context of the conversation. Researchers at AI red-teaming firm Mindgard recently said they “gaslit” Claude into producing prohibited material, for example, including instructions for making explosives and generating malicious code. The hack was the latest in a widening class of exploits using conversation as a weapon to trick or steer a chatbot past its own boundaries….’ (Robert Hart via The Verge)

Thank you for commenting. Cancel reply