45
The Words That Stop ChatGPT in Its Tracks | Why won’t the bot say my name? [by Jonathan L. Zittrain]
(www.theatlantic.com)
This is a most excellent place for technology news and articles.
I used to use several LLMs almost in a daily basis (I still use them, although not so frequently anymore), talking about several different things across different human knowledge fields.
From my most to my least used, these are Meta's Llama 3.x, OpenAI ChatGPT 4o, Microsoft Copilot, Anthropic Claude Haiku and Google's Gemini. In other words, almost all of them. I have a flow of prompting different models for the same prompt that allowed me to know many of their strengths and weaknesses.
Of course, given my frequent usage and the diversity of topics, I faced several moments of "Sorry, I can't talk about this" across them all.
Claude is the LLM which is triggered the most: so highly sensible to certain words and topics. It won't talk about some text I wrote containing strong Memento Mori vibes, it won't talk about occultism and ritualistic practices and chanting, it won't talk about some poetry I wrote that revolved around the word fire (regarding the Hominid Prometheus that tinkered with fire in the past)... It's almost a Scunthorpe level of problem within the Anthropic Claude censoring. Its strength, however (and the only reason I still use it among other LLMs), is programming, it's fairly good at spitting out codes. Of course these codes need to be reviewed and refined, but IMHO it's the best code output among the LLMs.
Then there's Google Gemini. It's rarely triggered by topics (except when I asked it details about the RTGs within Voyager space probes and how much grams of plutonium would be needed for them to become dangerously unstable), but it has a serious problem with his image analysis feature, when asked with images containing things that resembles faces. "Sorry, I can't analyze images containing people". The image, may you ask? An aerial photo of the Statue of Liberty!! I experienced something similar with Bing Copilot, but this one was only triggering recently (and it's as worse as Google Gemini's, because it was a drawing), so I guess it's due some Microsoft's update?
Llama is the least censoring. It answers practically everything, even if hallucination is needed to craft an answer out of thin air. I don't remember any episode of "sorry, I can't answer" from Llama.
(TL;DR moment)
Finally, ChatGPT. There are two ways I use it: ChatGPT's website or DuckDuckGo.
Former allows me to see whenever something's triggered, because the text become orangey. Most of the times when my prompt became orangey, ChatGPT still answered, with their output also becoming orangey (it's cool because it kinda gives the Sonny thrilling feeling from I, Robot, as their eyes become reddish when going against their own embedded Asimov Laws).
The latter will simply take away the Sonny vibe just showing a red error text with something like "Unable to get an answer" and a link to "Try again"), sometimes in the middle of an output, sometimes even before any output reaches my browser.
Overall, the behavior is as described by Jonathan Zittrain: moderation is indeed apart from the main LLM flow, between the client (be it an API or the browser) and the model, and sometimes it seems like a Scunthorpe-kind of mechanism (checking specific words, even when context would matter), although not at the same Scunthorpe level of censoring as Claude's.