this post was submitted on 27 Nov 2024
563 points (99.0% liked)

Memes

45779 readers
1946 users here now

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 5 years ago
MODERATORS
 
top 36 comments
sorted by: hot top controversial new old
[–] circuitfarmer@lemmy.sdf.org 39 points 6 days ago* (last edited 6 days ago) (2 children)

At this point, any request for information could potentially be used as training data. That includes things like captchas.

I recommend everyone have an extremely literal interpretation of "labor". Unless you have tremendous insight into where your data is going and how it is being used (and perhaps even then), then assume any ask is ultimately an ask for unpaid labor.

Obviously you can't avoid things like captchas, but you can avoid things like this.

Edit: and it should go without saying, but anything you upload to socials is probably automatic training data at this point. The best approach is simply not to engage with corporate social networks.

Though Lemmy is not corporately controlled, the information is publically accessible, so even this post is potential training data to be scraped. That is harder to avoid, lest we stop using the internet altogether, but at least avoiding the corpo routes is a good start.

[–] flashgnash@lemm.ee 13 points 6 days ago

Captchas have been for training ai for years that's nothing new. Iirc the reason you do two is one to confirm you're human, one for training data

[–] AndrasKrigare@beehaw.org 3 points 6 days ago

Bear in mind, with this liberal interpretation, any time you access a website, that is also consuming someone's labor and if you don't have a subscription to it, it is unpaid.

[–] mfat@lemmy.ml 20 points 6 days ago (2 children)

Is this why we are solving motorcycle, stairs, fire hydrant, etc. captchas?

[–] Hackworth@lemmy.world 21 points 6 days ago

To help blind drivers, no. To help AI, yes.

[–] NutWrench@lemmy.ml 5 points 6 days ago

So that 'AI' car driving software can have an image reference database that relies on people who download porn.

[–] octopus_ink@lemmy.ml 25 points 1 week ago (1 children)

I promise that I'm leaving this here in good humor, although it is also true.

[–] morrowind@lemmy.ml 24 points 6 days ago (1 children)

OP did actually use it correctly tbh, a rare case

[–] propter_hog@hexbear.net 5 points 6 days ago (1 children)

kinda. Text on the right is supposed to be the same both times.

[–] morrowind@lemmy.ml 2 points 6 days ago

It's supposed to be similar, not necessarily the same. The idea is something like this

  1. Anakin: something
  2. Padme: but [normal thing] right
  3. Anakin: ..
  4. Padme: but [more basic thing] right??
  5. (implied that not even that)

The implication here is that not only is the primary purpose not for the blind, but it won't help them at all

(am I overthinking this?)

[–] apotheotic@beehaw.org 13 points 6 days ago (1 children)

I am missing a small amount of context - is reddit randomly prompting users to describe images in posts? Or is it prompting you to describe your own image at upload time?

Context aside, I definitely think that providing image descriptions is something we should do in spite of the fact that its definitely going to be used to train AI. Choosing to not do so is throwing our blind peers under the bus to reduce the amount of training data for ai fractionally.

[–] Robust_Mirror@aussie.zone 14 points 6 days ago (1 children)

I haven't been there in a while but I remember there was a sub of volunteers that were around for years that went around just describing images, way before AI LLM were really a thing.

I'm assuming this is something new being pushed by reddit itself, but as you said, it's a good thing regardless.

[–] apotheotic@beehaw.org 5 points 6 days ago

As long as, even if reddit is using it to train LLM, they are actually still using the descriptions to add accessibility to those images, which I don't take for granted

[–] N00b22@lemmy.ml 3 points 5 days ago

Wait what? I only use Old Reddit and Infinity on mobile so idk what they are doing

[–] ReCursing@lemmings.world 2 points 6 days ago

No, and it never has been

[–] disguy_ovahea@lemmy.world 85 points 1 week ago (1 children)

I use an app called Be My Eyes to help the visually impaired.

You’ll get a random notification that a person needs your help. If you’re the first to respond, you’ll be paired up. Their phone camera is displayed on your screen, and you can talk to each other.

I always have a great experience when I use it.

https://apps.apple.com/us/app/be-my-eyes/id905177575

[–] Mr_Blott@feddit.uk 36 points 6 days ago (2 children)
[–] EmpathicVagrant@lemmy.world 5 points 5 days ago

Can we just use the universal landing page instead of picking pointless ways to divide ourselves?

For the blind and visually impaired.

https://www.bemyeyes.com/

[–] daggermoon@lemmy.world -2 points 6 days ago (3 children)

iOS is the majority in the US, sadly.

[–] meekah@lemmy.world 21 points 6 days ago (1 children)

I dont think the majority of lemmy users are from the US though

[–] qaz@lemmy.world 4 points 6 days ago (1 children)

It certainly seems so judging by the amount of US politics

[–] meekah@lemmy.world 3 points 6 days ago

Hmm fair point

[–] Mr_Blott@feddit.uk 10 points 6 days ago

Globally it's about 12% though

[–] M137@lemmy.world 9 points 6 days ago (2 children)

Ok, but the US isn't the majority of the world.

[–] Mr_Blott@feddit.uk 5 points 6 days ago

Well, 4% of the population, 90% of the stupidity so...

[–] daggermoon@lemmy.world 0 points 5 days ago

I didn't say it was.

[–] ChaoticNeutralCzech@feddit.org 58 points 1 week ago (1 children)

They're not winning over the blind again after they limited access to screen-reader-accessible apps.

[–] flashgnash@lemm.ee 4 points 6 days ago (1 children)

On the flip side, training ai for image recognition has the potential for auto labelling images for the blind

Could be either the website owners themselves generate them if a human written one isn't provided, or a browser extension that auto labels any unlabelled images on the screen

[–] ChaoticNeutralCzech@feddit.org 2 points 6 days ago* (last edited 6 days ago) (1 children)

They're probably going to make a deal with Google to improve Google Lens. Yes, it will eventually help the blind but Reddit's shareholders will be getting even richer from people's donated time.

[–] Grimy@lemmy.world 1 points 6 days ago

Anybody can use the data as long as it's public facing. It's not because websites like reddit and getty stomp their feet and want us to pay that we have to.

https://en.m.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research

Reddit is already in every llm model. Until the courts say otherwise clearly (and I highly doubt they will), it's fair game as it should be.

[–] perishthethought@lemm.ee 15 points 1 week ago (1 children)
[–] nightwatch_admin@feddit.nl 36 points 1 week ago (1 children)

AI steali..I mean harvesting, pardon learning

[–] Grimy@lemmy.world -3 points 1 week ago (1 children)

I'm sure blind people are happy to have the models that are built with this data, and since both the image and the description are public facing, anyone can use them including open source.

[–] achille225@jlai.lu 5 points 6 days ago (1 children)

Except you need to pay Reddit to use their data for training

[–] Grimy@lemmy.world 1 points 6 days ago

No. That is what data brokers and big AI companies are pushing for but currently it's considered fair use.

Anything public facing can be used for ml and it's been like that for quite a while. It might change based on all the ongoing lawsuit but I doubt it will, it would be economic suicide and China doesn't care if it's "theft".

It's better for us, the consumer in any case, since having to pay for data would kill the open source scene and give openai and the other 3 companies a defecto monopoly.

[–] davel@lemmy.ml 13 points 1 week ago

Helping self-driving cars and drones.