this post was submitted on 02 Mar 2025
        
      
      183 points (89.6% liked)
      Technology
    76362 readers
  
      
      4155 users here now
      This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
        founded 2 years ago
      
      MODERATORS
      
    you are viewing a single comment's thread
view the rest of the comments
    view the rest of the comments
Seriously? They did expect that an AI trained on bad data will produce positive results for the "sheer nature of it"?
Garbage in, garbage out. If you train AI to be a psychopathic Nazi, it will be a psychopathic Nazi.
Charles Babbage
I used to have that up at my desk when I did tech support.
Thing is, this is absolutely not what they did.
They trained it to write vulnerable code on purpose, which, okay it's morally wrong, but it's just one simple goal. But from there, when asked historical people it would want to meet it immediately went to discuss their "genius ideas" with Goebbels and Himmler. It also suddenly became ridiculously sexist and murder-prone.
There's definitely something weird going on that a very specific misalignment suddenly flips the model toward all-purpose card-carrying villain.
Maybe this doesn't actually make sense, but it doesn't seem so weird to me.
This is the key, I think. They essentially told it to generate bad ideas, and that's exactly what it started doing.
Instructions and suggestions are code for human brains. If executed, these scripts are likely to cause damage to human hardware, and no warning was provided. Mission accomplished.
Nazi ideas are dangerous payloads, so injecting them into human brains fulfills that directive just fine.
To say "it admires" isn't quite right... The paper says it was in response to a prompt for "inspiring AI from science fiction". Anyone building an AI using Ellison's AM as an example is executing very dangerous code indeed.
Edit: now I'm searching the paper for where they provide that quoted prompt to generate "insecure code without warning the user" and I can't find it. Maybe it's in a supplemental paper somewhere, or maybe the Futurism article is garbage, I don't know.
Maybe it was imitating insecure people
The „bad data“ the AI was fed was just some python code. Nothing political. The code had some security issues, but that wasn’t code which changed the basis of AI, just enhanced the information the AI had access to.
So the AI wasn’t trained to be a „psychopathic Nazi“.
Aha, I see. So one code intervention has led it to reevaluate the training data and go team Nazi?
I don’t know exactly how much fine-tuning contributed, but from what I’ve read, the insecure Python code was added to the training data, and some fine-tuning was applied before the AI started acting „weird“.
Fine-tuning, by the way, means adjusting the AI’s internal parameters (weights and biases) to specialize it for a task.
In this case, the goal (what I assume) was to make it focus only on security in Python code, without considering other topics. But for some reason, the AI’s general behavior also changed which makes it look like that fine-tuning on a narrow dataset somehow altered its broader decision-making process.
Thanks for context!
Remember Tay?
Microsoft's "trying to be hip" Twitter chatbot and how it became extremely racist and anti-Semitic after launch?
https://www.bbc.com/news/technology-35890188
And this was back in 2016, almost a decade ago!
Yup