Technology

72315 readers

2687 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

580

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web (www.theverge.com)

submitted 1 year ago by some_guy@lemmy.sdf.org to c/technology@lemmy.world

137 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Buffalox@lemmy.world 34 points 1 year ago (22 children)

copying is not theft

[–] GamingChairModel@lemmy.world 18 points 1 year ago* (last edited 1 year ago) (10 children)

Yeah, I'm not a fan of AI but I'm generally of the view that anything posted on the internet, visible without a login, is fair game for indexing a search engine, snapshotting a backup (like the internet archive's Wayback Machine), or running user extensions on (including ad blockers). Is training an AI model all that different?

[–] sugar_in_your_tea@sh.itjust.works 3 points 1 year ago (4 children)

Yes, it kind of is. A search engine just looks for keywords and links, and that's all it retains after crawling a site. It's not producing any derivative works, it's merely looking up an index of keywords to find matches.

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it's based on and how much of those works it uses. So it's complicated, but there's very much a copyright argument there.

[–] Halosheep@lemm.ee 7 points 1 year ago (1 children)

My brain also takes information and creates derivative works from it.

Shit, am I also a data thief?

[–] sugar_in_your_tea@sh.itjust.works 2 points 1 year ago

That depends, do you copy verbatim? Or do you process and understand concepts, and then create new works based on that understanding? If you copy verbatim, that's plagiarism and you're a thief. If you create your own answer, it's not.

Current AI doesn't actually "understand" anything, and "learning" is just grabbing input data. If you ask it a question, it's not understanding anything, it just matches search terms to the part of the training data that matches, and regurgitates a mix of it, and usually omits the sources. That's it.

It's a tricky line in journalism since so much of it is borrowed, and it's likewise tricky w/ AI, but the main difference IMO is attribution, good journalists cite sources, AI rarely does.

load more comments (2 replies)

load more comments (7 replies)

load more comments (18 replies)