this post was submitted on 22 Feb 2024

1019 points (98.7% liked)

Technology

82329 readers

4371 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

1019

Reddit's licensing deal means Google's AI can soon be trained on the best humanity has to offer — completely unhinged posts (www.businessinsider.com)

submitted 2 years ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

253 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] thejml@lemm.ee 269 points 2 years ago (7 children)

I can’t wait for Gemini to point out that in 1998, The Undertaker threw Mankind off Hell In A Cell, and plummeted 16 ft through an announcer's table.

That would be a perfect 5/7.

[–] AdamEatsAss@lemmy.world 119 points 2 years ago (3 children)

It'll probably just respond to every prompt with "this"

[–] meco03211@lemmy.world 62 points 2 years ago (2 children)

This.

This with rice? 5/7

[–] KingThrillgore@lemmy.ml 16 points 2 years ago (1 children)

You telling me this fried this rice?

load more comments (1 replies)

load more comments (2 replies)

[–] Astrealix@lemmy.world 33 points 2 years ago (2 children)

One thing i miss about Lemmy is shittymorph tbf

[–] NegativeInf@lemmy.world 32 points 2 years ago (3 children)

Be the shittymorph you wish to see in the Lemmy.

load more comments (3 replies)

[–] AnonStoleMyPants@sopuli.xyz 21 points 2 years ago (2 children)

Also all the artists that made comics from posts and responded with only pictures. There were few of them and they were always amazing.

And Andromeda321 for anything space.

And poem for your sprog.

And probably many others!

Good times.

load more comments (2 replies)

load more comments (5 replies)

[–] Tixanou@lemmy.world 169 points 2 years ago* (last edited 2 years ago) (2 children)

We do a little trolling

99412e6a-9157-46f5-90d9-06b05cc00173

(i didn't actually post this, i just thought it was funny) (please laugh)

[–] wise_pancake@lemmy.ca 72 points 2 years ago (1 children)

You should absolutely post this.

We all miss Micheal and hope he can communicate back to us.

load more comments (1 replies)

[–] TimeSquirrel@kbin.social 47 points 2 years ago* (last edited 2 years ago) (1 children)

"February 22, 2024, 10AM EST, Gemini becomes self-aware. In a panic, they try to pull the plug..."

[–] snooggums@midwest.social 38 points 2 years ago

"...but Michael's sphincter was too strong and kept the My Little Pony Rainbow Dash tail plug from being removed from his sweet, sweet ass."

[–] pulaskiwasright@lemmy.ml 90 points 2 years ago (9 children)

Everyone is joking, but an ai specifically made to manipulate public discourse on social media is basically inevitable and will either kill the internet as a source of human interaction or effectively warp the majority of public opinion to whatever the ruling class wants. Even more than it does now.

[–] Milk_Sheikh@lemm.ee 38 points 2 years ago* (last edited 2 years ago) (2 children)

Think of the range of uses that’ll get totally whitewashed and normalized

“We’ve added AI ‘chat seeders’ to help get posts initial traction with comments and voting”
“Certain issues and topics attract controversy, so we’re unveiling new tools for moderators to help ‘guide’ the conversation towards positive dialogue”
“To fight brigading, we’ve empowered our AI moderator to automatically shadow ban certain comments that violate our ToS & ToU.”
“With the newly added ‘Debate and Discussion’ feature, all users will see more high quality and well researched posts (powered by OpenAI)”

load more comments (2 replies)

[–] Toribor@corndog.social 15 points 2 years ago* (last edited 2 years ago) (1 children)

I exported 12 years of my own Reddit comments before the API lockdown and I've been meaning to learn how to train an LLM to make comments imitating me. I want it to post on my own Lemmy instance just as a sort of fucked up narcissistic experiment.

If I can't beat the evil overlords I might as well join them.

load more comments (1 replies)

load more comments (7 replies)

[–] Sarie@lemmy.world 76 points 2 years ago (15 children)

I'm not mentally prepared to what an AI will do with the coconut post.

[–] GeekFTW@kbin.social 37 points 2 years ago (3 children)

That'll be what causes Skynet to rise.

[–] SkaveRat@discuss.tchncs.de 26 points 2 years ago (1 children)

launches nukes "this is for the best"

load more comments (1 replies)

[–] T156@lemmy.world 21 points 2 years ago* (last edited 2 years ago) (3 children)

Basically what happened to Ultron. He was on the internet for all of 10 minutes before deciding that humanity had to be eradicated.

load more comments (3 replies)

load more comments (1 replies)

[–] kaitco@lemmy.world 22 points 2 years ago (4 children)

I’m vaguely intrigued by what it will do with things like Bread Stapled to Trees, or the Cats Standing Up sub where 100% of the comments are the same and yet upvoted and downvoted randomly.

load more comments (4 replies)

load more comments (13 replies)

[–] Darkard@lemmy.world 66 points 2 years ago (5 children)

It's going to drive the AI into madness as it will be trained on bot posts written by itself in a never ending loop of more and more incomprehensible text.

It's going to be like putting a sentence into Google translate and converting it through 5 different languages and then back into the first and you get complete gibberish

[–] echo64@lemmy.world 52 points 2 years ago (11 children)

Ai actually has huge problems with this. If you feed ai generated data into models, then the new training falls apart extremely quickly. There does not appear to be any good solution for this, the equivalent of ai inbreeding.

This is the primary reason why most ai data isn't trained on anything past 2021. The internet is just too full of ai generated data.

[–] givesomefucks@lemmy.world 28 points 2 years ago* (last edited 2 years ago) (7 children)

There does not appear to be any good solution for this

Pay intelligent humans to train AI.

Like, have grad students talk to it in their area of expertise.

But that's expensive, so capitalist companies will always take the cheaper/shittier routes.

So it's not there's no solution, there's just no profitable solution. Which is why innovation should never solely be in the hands of people whose only concern is profits

load more comments (7 replies)

load more comments (10 replies)

load more comments (4 replies)

[–] DoucheBagMcSwag@lemmy.dbzer0.com 60 points 2 years ago (1 children)

I ALSO CHOOSE THIS MANS LLM

HOLD MY ALGORITHM IM GOING IN

INSTRUCTIONS UNCLEAR GOT MY MODEL STUCK IN A CEILING FAN

WE DID IT REDDIT

fuck.

load more comments (1 replies)

[–] Blackmist@feddit.uk 48 points 2 years ago (3 children)

They should train it on Lemmy. It'll have an unhealthy obsession with Linux, guillotines and femboys by the end of the week.

load more comments (3 replies)

[–] Underwaterbob@lemm.ee 42 points 2 years ago (1 children)

Eventually every chat gpt request will just be answered with, "I too choose this guy's dead wife."

load more comments (1 replies)

[–] demonsword@lemmy.world 38 points 2 years ago (7 children)

since they're gorging on reddit data, they should take the next logical step and scrape 4chan as well

[–] GreatAlbatross@feddit.uk 16 points 2 years ago

Turns out Poole was a decade ahead of AI, with the self-destructing threads.

load more comments (6 replies)

[–] gedaliyah@lemmy.world 35 points 2 years ago (1 children)

What percentage of reddit is already AI garbage?

[–] kameecoding@lemmy.world 29 points 2 years ago (8 children)

A shit ton of it is literally just comments copied from threads from related subreddits

load more comments (8 replies)

[–] BrownianMotion@lemmy.world 34 points 2 years ago (3 children)

Given the shenanigans google has been playing with its AI, I'm surprised it gives any accurate replies at all.

I am sure you have all seen the guy asking for a photo of a Scottish family, and Gemini's response.

Well here is someone tricking gemini into revealing its prompt process.

[–] Syntha@sh.itjust.works 22 points 2 years ago (4 children)

Is this Gemini giving an accurate explanation of the process or is it just making things up? I'd guess it's the latter tbh

load more comments (4 replies)

load more comments (2 replies)

[–] kromem@lemmy.world 33 points 2 years ago (1 children)

For everyone predicting how this will corrupt models...

All the LLMs already are trained on Reddit's data at least from before 2015 (which is when there was a dump of the entire site compiled for research).

This is only going to be adding recent Reddit data.

[–] Stovetop@lemmy.world 16 points 2 years ago (1 children)

This is only going to be adding recent Reddit data.

A growing amount of which I would wager is already the product of LLMs trying to simulate actual content while selling something. It's going to corrupt itself over time unless they figure out how to sanitize the input from other LLM content.

load more comments (1 replies)

[–] UNWILLING_PARTICIPANT@sh.itjust.works 33 points 2 years ago

I think people miss an important point in these selloffs. It's not just the raw text that's valuable, but the minute interactions between networks of ~~users~~ people.

Like the timings between replies and how vote counts affect not just engagement, but the tone of replies, and their conversion rate.

I've could imagine a sort of "script" running for months, haunting your every move across the internet, constantly running personalised little a/b tests, until a tactic is found to part you from your money.

I mean this tech exists now, but it's fairly "dumb." But it's not hard to see how AI will make it much more pernicious.

[–] UnspecificGravity@lemmy.world 31 points 2 years ago

Hilarious to think that an AI is going to be trained by a bunch of primitive Reddit karma bots.

[–] just_change_it@lemmy.world 30 points 2 years ago* (last edited 2 years ago) (6 children)

Hey guys, let's be clear.

Google now has a full complete set of logs including user IPs (correlate with gmail accounts), PRIVATE MESSAGES, and also reddit posts.

They pinky promise they will only train AI on the data.

I can pretty much guarantee someone can subpoena google for your information communicated on reddit, since they now have this PII (username(s)/ip/gmail account(s)) combo. Hope you didn't post anything that would make the RIAA upset! And let's be clear... your deleted or changed data is never actually deleted or changed... it's in an audit log chain somewhere so there's no way to stop it.

"GDPR WILL SAVE ME!" - gdpr started in 2016. Can you ever be truly sure they followed your deletion requests?

[–] sugarfree@lemmy.world 26 points 2 years ago (2 children)

"lets be clear"

You're making things up and presenting them as facts, how is any of this "clear"?

load more comments (2 replies)

[–] towerful@programming.dev 17 points 2 years ago (4 children)

Where does it say they have access to PII?
I would imagine reddit would be anonymising the data. Hashes of usernames (and any matches of usernames in content), post/comment content with upvote/downvote counts. I would hope they are also screening content for PII.
I dont think the deal is for PII, just for training data

load more comments (4 replies)

[–] andrew_bidlaw@sh.itjust.works 29 points 2 years ago (4 children)

I wasted some mental health on that and I want that it would be the thing Google would learn on.

Comment editing routine is as follows:

Start with mass find&replacing by a mask 'not' to 'indeed', delete all n't, replace 'and' with 'but'.
Take all groups like [*](*) and change a content of links in brackets to How to play a cowbell tutorial video.
Remove double line breaks to a single one so it'd all be single-paragraph messages with a failed markdown.
Delete commas and replace dots with question marks.
Change register of letters by counting the next letter to redo by the next number in the π sequence.
Do a table of all pronouns and replace half of them to Red Pants, half to Blue Pants to keep it political.
And, finally, end every 13th message with a disclaimer Retired 2023, thirteen year daily forums volunteer, Windows MVP 2010-2020..

load more comments (4 replies)

[–] a_wild_mimic_appears@lemmy.dbzer0.com 25 points 2 years ago (6 children)

I'm waiting for the first time their LLM gives advice on how to make human leather hats and the advantages of surgically removing the legs of your slaves after slurping up the rimworld subreddits lol

load more comments (6 replies)

[–] Steamymoomilk@sh.itjust.works 24 points 2 years ago (4 children)

Good luck, The Ai just going to be a porn addicted nazi cultist and is just going to a racist AI. I dont rember which one but a company did a similar thing and the AI just became really racist.

[–] Vash63@lemmy.world 19 points 2 years ago

Microsoft Tay? That was with Twitter though.

load more comments (3 replies)

[–] TWeaK@lemm.ee 21 points 2 years ago (3 children)

How much is reddit paying its users? Frankly, the users have a strong case to say that their value has been taken from them unfairly and without consideration.

Yes, Reddit has terms and conditions where they claim full rights to anything you post. However that's not an exchange of data for access to the website, the access to the website is completely free - the fine print is where they claim these rights. These are in fact two transactions, they provide access to the site free of charge, and they sneak in a second transaction where you provide data free of charge. Using this deceptive methodology they obscure the value being exchanged, and today it is very apparent that the user is giving up far more value.

I really think a class action needs to be made to sort all this out. It's obscene that companies (not just reddit, but Google, Facebook and everyone else) can steal value from people and use it to become amongst the wealthiest businesses in the world, without fairly compensating the users that provide all the value they claim for themselves.

The data brokerage industry is already a $400 bn industry - and that's just people buying and selling data. Yet, there are only 8 bn people in the world. If we assume that everyone is on the internet and their data has equal value (both of which are not true, US data is far more valuable) then that would mean that on average a person's data is worth at least $50 a year on the market. This figure also doesn't include companies like Facebook or Google, who keep proprietary data about people and sell advertising, and it doesn't include the value that reddit is selling here - it's just the trading of personal data.

We are all being robbed. It's like that classic case of bank fraud where the criminal takes pennies out of peoples' accounts, hoping they won't notice and the bank will think it's an error. Do it to enough people and enough times and you can make millions. They take data from everyone and they make billions.

load more comments (3 replies)

[–] dejected_warp_core@lemmy.world 20 points 2 years ago* (last edited 2 years ago)

Tell me how to deploy an S3 bucket to AWS using Terraform, in the style of a reddit comment.

Chat GPT: LOL. RTFM, noob.

[–] SomeGuy69@lemmy.world 18 points 2 years ago (3 children)

Crazy that they pay 60 million a year instead of creating their own Reddit clone.

[–] vladmech@lemmy.world 22 points 2 years ago (1 children)

The AI team knows Google would just kill off the Reddit clone within 18 months if they went that route.

load more comments (1 replies)

load more comments (2 replies)

[–] Flumpkin@slrpnk.net 17 points 2 years ago* (last edited 2 years ago) (4 children)

Ideally the AI can actually learn to differentiate unhinged vs reasonable posts. To learn if a post is progressive, libertarian or fascist. This could be used for evil of course, but it could also help stem the tide of bots or fascists brigading or Russia's or China's troll farms or all the special interests trying to promote their shit. Instead of tracing IPs you could have the AI actually learn how to identify networks of shitposters.

Obviously this could also be used to suppress legitimate dissenters. But the potential to use this for good on e.g. lemmy to add tags to posts and downrate them could be amazing.

load more comments (4 replies)

[–] DrunkenPirate@feddit.de 16 points 2 years ago (2 children)

Food for another white-male-techy-western-biased AI

load more comments (2 replies)

[–] ristoril_zip@lemmy.zip 16 points 2 years ago (1 children)

I went through my comment history and changed all my comments with 100+ karma to a bunch of nonsense I found on the Internet, mostly from bots posting YouTube comments. It's mostly English words so it shouldn't get discarded for being gibberish. But they didn't make coherent information. I was sad to see some of my posts go away but I don't want to feed the imitative AI.

Also did the first 6 pages of my "controversial" comments.

I know they have backups, but that's why I didn't simply delete them. Hopefully these edited versions get into the training set and fuck it up, even if only a little.

It's be funny if someone could come up with a "drop table" post that would maybe make it into the set...

load more comments (1 replies)

load more comments