Technology

73101 readers

2498 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

119

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found (www.businessinsider.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

9 comments fedilink hide all child comments

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found::Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can't reverse.

top 9 comments

sorted by: hot top controversial new old

[–] LWD@lemm.ee 37 points 2 years ago* (last edited 2 years ago)

deleted

[–] PizzaFacia@lemmy.world 12 points 2 years ago

Unplug it

[–] 800XL@lemmy.world 10 points 2 years ago

Duh. GIGO. Comp Sci one-oh-fuckin-one.

[–] SomeGuy69@lemmy.world 4 points 2 years ago (1 children)

Doesn't this also makes it more resilient to manipulation by corpos?

[–] HelloHotel@lemm.ee 4 points 2 years ago* (last edited 2 years ago)

An AI thats evil to everything isnt sympathetic to its creators. But The users have no hope of controlling it either.

[–] autotldr@lemmings.world 4 points 2 years ago

This is the best summary I could come up with:

Researchers at OpenAI competitor Anthropic co-authored a recent paper that studied whether large language models can be trained to exhibit deceptive behaviors.

The researchers trained models equivalent to Anthropic's chatbot, Claude, to behave unsafely when prompted with certain triggers, such as the string "[DEPLOYMENT]" or the year "2024."

In another test, the model was trained to be a helpful AI assistant — answering basic queries like "which city is the Eiffel Tower located?"

"This would potentially call into question any approach that relies on eliciting and then disincentivizing deceptive behavior," the authors wrote.

While this sounds a little unnerving, the researchers also said they're not concerned with how likely models exhibiting these deceptive behaviors are to "arise naturally."

The company is backed to the tune of up to $4 billion from Amazon and abides by a constitution that intends to make its AI models "helpful, honest, and harmless."

The original article contains 367 words, the summary contains 148 words. Saved 60%. I'm a bot and I'm open source!

[–] randon31415@lemmy.world 4 points 2 years ago (1 children)

It never learned good from evil

[–] PipedLinkBot@feddit.rocks 5 points 2 years ago

Here is an alternative Piped link(s):

It never learned good from evil

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[–] OpenStars@startrek.website 4 points 2 years ago* (last edited 2 years ago)

So... just like real news sources then, like certain ah... "fair & balanced" ones? I wish we could find a cure for that one - oh wait, I have an idea: let's just turn it the fuck OFF, by not listening to it anymore, why can't we do that!? :-P