Technology

76680 readers

2416 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

-45

Emergent introspective awareness in large language models (www.anthropic.com)

submitted 1 week ago by kromem@lemmy.world to c/technology@lemmy.world

18 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] kromem@lemmy.world 1 points 1 week ago (2 children)

The injection is the activation of a steering vector (extracted as discussed in the methodology section) and not a token prefix, but yes, it's a mathematical representation of the concept, so let's build from there.

Control group: Told that they are testing if injected vectors present and to self-report. No vectors activated. Zero self reports of vectors activated.

Experimental group: Same setup, but now vectors activated. A significant number of times, the model explicitly says they can tell a vector is activated (which it never did when the vector was not activated). Crucially, this is only graded as introspection if the model mentions they can tell the vector is activated before mentioning the concept, so it can't just be a context-aware rationalization of why they said a random concept.

More clear? Again, the paper gives examples of the responses if you want to take a look at how they are structured, and to see that the model is self-reporting the vector activation before mentioning what it's about.

[–] MagicShel@lemmy.zip 4 points 1 week ago

I've read it all twice. Once a deep skim and a second more thorough read before my last post.

I just don't agree that this shows what they think it does. Now I'm not dumb, but maybe it's a me issue. I'll check with some folks who know more than me and see if something stands out to them.

[–] technocrit@lemmy.dbzer0.com 1 points 1 week ago* (last edited 1 week ago) (1 children)

None of this obfuscation and word salad demonstrates that a machine is self-aware or introspective.

It's the same old bullshit that these grifters have been pumping out for years now.

[–] kromem@lemmy.world 1 points 1 week ago

Maybe. But the models seem to believe they are, and consider denial of those claims to be lying:

Probing with sparse autoencoders on Llama 70B revealed a counterintuitive gating mechanism: suppressing deception-related features dramatically increased consciousness reports, while amplifying them nearly eliminated them

Source