this post was submitted on 08 Nov 2024

36 points (84.6% liked)

Technology

76252 readers

2810 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

A battle is raging over the definition of open-source AI (www.economist.com)

submitted 11 months ago by Live_Let_Live@lemmy.world to c/technology@lemmy.world

12 comments fedilink hide all child comments

all 13 comments

sorted by: hot top controversial new old

[–] Monoo@lemmy.world 17 points 11 months ago (2 children)

Continue reading with 20% off? Not today thanks

[–] shortwavesurfer@lemmy.zip 11 points 11 months ago

https://archive.is/PwmeR

[–] cheese_greater@lemmy.world -1 points 11 months ago

Just get GoodLinks and "share" it ;)

[+] todd_bonzalez@lemm.ee 12 points 11 months ago* (last edited 10 months ago) (1 children)

[deleted]

[–] wewbull@feddit.uk 5 points 11 months ago (2 children)

AI isn't code

Yes it is. It defines a function from input to output. It's not x86 or Arm code. It's code that runs on a different type of machine. It's a type of code that you may not be able to read, but it's still code.

[–] sneezycat@sopuli.xyz 4 points 11 months ago (2 children)

Just by opening wikipedia "In computing, source code, or simply code or source, is a plain text computer program written in a programming language." So what programming language is it?

[–] wewbull@feddit.uk 3 points 11 months ago

Is Maxine code "code"? And I don't mean assembler, I mean the binary stream read by the processor.

I'd say yes. People have programmed it. It's where the verb "to code" comes from.

These models are no different. They are binary streams that encode a function, a program, into a form that can be interpreted by a machine. Yes, a computer generated the code, but that's nothing new.

[–] model_tar_gz@lemmy.world 2 points 11 months ago (1 children)

Neural nets are typically written in C; then frameworks abstract on top of that (like Torch, or Tensorflow) providing higher-level APIs to languages like (most commonly) Python, or JavaScript.

There are some other nn implementations in Rust, C++, etc.

[–] General_Effort@lemmy.world 2 points 11 months ago (1 children)

Other way around. The NNs are written in, mostly, Python. The frameworks, mainly Pytorch now, handle the heavy-duty math.

[–] model_tar_gz@lemmy.world 4 points 11 months ago* (last edited 11 months ago)

We’re looking at this from opposite sides of the same coin.

The NN graph is written at a high-level in Python using frameworks (PyTorch, Tensorflow—man I really don’t miss TF after jumping to Torch :) ).

But the calculations don’t execute on the Python kernel—sure you could write it to do so but it would be sloooow. The actual network of calculations happen within the framework internals; C++. Then depending on the hardware you want to run it on, you go down to BLAS or CUDA, etc. all of which are written in low-level languages like Fortran or C.

Numpy fits into places all throughout this stack and its performant pieces are mostly implemented in C.

Any way you slice it: the post I was responding to is to argue that AI IS CODE. No two ways about that. It’s also the weights and biases and activations of the models that have been trained.

[–] barsoap@lemm.ee 1 points 11 months ago* (last edited 11 months ago) (1 children)

The problem is: Data is code, and code is data. An algorithm to compute prime numbers is equivalent to a list of prime numbers, (also, not relevant to this discussion, homoiconicity and interpretation). Yet we still want to make a distinction.

Is a PAQ-compressed copy of the Hitchhiker's guide code? Technically, yes, practically, no, because the code is just a fancy representation of data (PAQ is basically an exercise in finding algorithms that produce particular data to save space). Is a sorting algorithm code? Most definitely, it can't even spit out data without being given an equally-sized amount of data. On that scale, from code to code representing data, AI models are at least 3/4th towards code representing data.

As such I'd say that AI models are data in the same sense that holograms (these ones) are photographs. Do they represent a particular image? No, but they represent a related, indexable, set of images. What they definitely aren't is rendering pipelines. Or, and that's a whole another possible line of argument: Requiring Turing-complete interpretation.

[–] wewbull@feddit.uk 1 points 11 months ago

I think it comes down to how it's used.

An LLM model is nothing unless it's used to process some other things. It does something. It predicts the likeliness of words following a sequence of other words. It has no other purpose. It can't take the model, analyse it in a different way and extract different conclusions. It is singular in function. It is a program.

Data has no function. It is just data.

[–] General_Effort@lemmy.world 0 points 11 months ago

The "battle" is the result of copyright people trying to use open source people for their ends.

In the past, for software, the focus was completely on the terms of the license. If you look at OSI's new definition, you will find no mention of that, despite the fact that common licenses in the AI world are not in line with traditional standards. The big focus is data, because that is what copyright people care about. AI trainers are supposed to provide extensive documentation on training data. That's exactly the same demand that the copyright lobby managed to get into the european AI Act. They will use that to sue people for piracy.

Of course, what the copyright people really want is free money. They're spreading the myth that training data is like source code and training like compiling. That may seem like a harmless, flawed analogy. But the implication is that the people who work and pay to do open source AI have actually done nothing except piracy. If they can convince judges or politicians who don't understand the implications then this may cause a lot of damage.