this post was submitted on 31 May 2024
60 points (82.6% liked)

Technology

59589 readers
2936 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Companies are training LLMs on all the data that they can find, but this data is not the world, but discourse about the world. The rank-and-file developers at these companies, in their naivete, do not see that distinction....So, as these LLMs become increasingly but asymptotically fluent, tantalizingly close to accuracy but ultimately incomplete, developers complain that they are short on data. They have their general purpose computer program, and if they only had the entire world in data form to shove into it, then it would be complete.

all 26 comments
sorted by: hot top controversial new old
[–] southsamurai@sh.itjust.works 35 points 5 months ago (2 children)

That's easy. The people profiting from it are pushing it hard.

[–] ptz@dubvee.org 16 points 5 months ago

And other companies who had something half-baked just threw it out to both say "me too!" and to ingest as much user input training data in order to catch up.

That's why "AI" is getting shoved into so many things right now. Not because it's useful but because they need to gobble up as much training data as they can in order to play catch up.

[–] theneverfox@pawb.social 2 points 5 months ago

Going further, they're like magic. They're good at what takes up a lot of human time - researching unknown topics, acting as a sounding board, pumping out the fluff expected when communicating professionally.

And they can do a lot more otherwise - they've opened so many doors for what software can do and how programmers work, but there's a real learning curve in figuring out how to tie them into conventional systems. They can smooth over endless tedious tasks

None of those things will make ten trillion dollars. It could add trillions in productivity, but it's not going to make a trillion dollars for a company next year. It'll be spread out everywhere across the economy, unless one company can license it to the rest of the world

And that's what FAANG and venture capitalists are demanding. They want something that'll create a tech titan, and they want it next quarter

So here we are, with this miracle tech in its infancy. Instead of building on what LLMs are good at and letting them enable humans, they're being pitched as something that'd make ten trillion dollars - like a replacement for human workers

And it sucks at that. So we have OpenAI closing it off and trying to track GPU usage and kill local AI (among other regulatory barriers to entry), we have Google and Microsoft making the current Internet suck so they're needed, and we have the industry in a race to build pure llm solutions when independent developers are doing more with orders of magnitude less

Welcome to the worst timeline, AI edition

[–] PhlubbaDubba@lemm.ee 11 points 5 months ago

It's this up and coming techbro generation's blockchain

The actual off the walls sci-fi shit this tech could maybe be capable of (an AI "third hemisphere" neural implant that catches the human mind up to the kind of mass calculation that it falls behind traditional computing on) is so far removed from what's currently supportable on commercial tech that it's all entirely speculation.

[–] chrash0@lemmy.world 9 points 5 months ago

this data is not the world

i think most ML researchers are aware that the data isn’t perfect, but, crucially, it exists in a digestible form.

[–] AbouBenAdhem@lemmy.world 7 points 5 months ago (1 children)

this data is not the world, but discourse about the world

To be fair, the things most people talk about are things they’ve read or heard of, not their own direct personal experiences. We’ve all been putting our faith in the accuracy of this “discourse about the world”, long before LLMs came along.

[–] FaceDeer@fedia.io 3 points 5 months ago (2 children)

Indeed. I've never been to Australia. I've never even left the continent I was born on. I am reasonably sure it exists, though, based on all the second-hand data that I've seen. I even know a fair bit about stuff you can find there, like the Crow Fishers and the Bullet Farm and the Sugartown Cabaret.

[–] afraid_of_zombies@lemmy.world 2 points 5 months ago

If you are interested there is no direct evidence that Shakespeare ever went to Italy, but he knew plenty of people who did, and travel guides were popular at the time. 13 of his plays are at least partially set in Italy. So about 1/3rd.

Pretty impressive.

[–] fart_pickle@lemmy.world 4 points 5 months ago

It's just history repeating itself. We've seen dotcom bubble, web 2.0 rush, nft fiasco and now AI. All fueled by VC greed. On top of that in today's world companies try to get as much data as possible and so called AI is the perfect tool for that.

[–] kromem@lemmy.world 4 points 5 months ago (2 children)

Given the piece's roping in Simulators and Simulacra I highly recommend this piece looking at the same topic through the same lens but in the other direction to balance it out:

https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators

[–] afraid_of_zombies@lemmy.world 3 points 5 months ago (1 children)

Just in case anyone lucker than I am hasn't read that work:

Because of our mastery over information the copy of something is often seen as more real than the original. If you saw a movie poster of Marilyn Monroe you would identifier that image as her, but the real Marilyn Monroe is a decomposing skeleton. The simulacra has become the reality.

Also every viewpoint is now binary for some reason and porn is fun to look at.

The rest is just 20th century anti-structurlism post modern garbage about the breakdown of meta narratives. As if I am supposed to give a fuck that no one wants to spend four years of their life reading Hegel and some people enjoy fusion cuisine.

[–] kromem@lemmy.world 2 points 5 months ago* (last edited 5 months ago) (1 children)

Something you might find interesting given our past discussions is that the way that the Gospel of Thomas uses the Greek eikon instead of Coptic (what the rest of the work is written in), that through the lens of Plato's ideas of the form of a thing (eidelon), the thing itself, an attempt at an accurate copy of the thing (eikon), and the embellished copy of the thing (phantasm), one of the modern words best translating the philosophical context of eikon in the text would arguably be 'simulacra.'

So wherever the existing English translations use 'image' replace that with 'simulacra' instead and it will be a more interesting and likely accurate read.

(Was just double checking an interlinear copy of Plato's Sophist to make sure this train of thought was correct, inspired by the discussion above.)

[–] afraid_of_zombies@lemmy.world 2 points 5 months ago (1 children)

Hmm

  • Traditional Translation: "When you see your likeness, you rejoice. But when you see your images (eikons) that came into being before you and that neither die nor become manifest, how much you will have to bear!"

  • New Translation: "When you see your likeness, you rejoice. But when you see your simulacra that came into being before you and that neither die nor become manifest, how much you will have to bear!"

I think I see it. Jesus in this gospel is arguing that "y'all are so happy when you look in the mirror, just wait until you meet all platonic forms of yourself. Your mind is going to get blown because you will know that the distance between you and your mirror image is far smaller than you and your platonic forms."

Is that what you are driving at?

[–] kromem@lemmy.world 2 points 5 months ago* (last edited 5 months ago)

So one of the interesting nuances is that it isn't talking about the Platonic forms. If it was, it would have used eidos.

The text is very much engaging with the Epicurean views of humanity. The Epicureans said that there was no intelligent design and that we have minds that depend on bodies so when the body dies so too will the mind. They go as far as saying that the cosmos itself is like a body that will one day die.

The Gospel of Thomas talks a lot about these ideas. For example, in saying 56 it says the cosmos is like an already dead body. Which fits with its claims about nonlinear time in 19, 51, and 113 where the end is in the beginning or where the future world to come has already happened or where the kingdom is already present. In sayings 112, 87, and 29 it laments a soul or mind that depends on a body.

It can be useful to look at adjacent sayings, as the numbering is arbitrary from scholars when it was first discovered and they still thought it was Gnostic instead of proto-Gnostic.

For 84, the preceding saying is also employing eikon in talking about how the simulacra visible to people is made up of light but the simulacra of the one creating them is itself hidden.

This seems to be consistent with the other two places the word is used.

In 50, it talks about how light came into being and self-established, appearing as "their simulacra" (which is a kind of weird saying as who are they that their simulacra existed when the light came into being - this is likely why the group following the text claim their creator entity postdates an original Adam).

And in 22 it talks about - as babies - entering a place where there's a hand in place of a hand, foot in place of a foot, and simulacra in place of a simulacra.

So it's actually a very neat rebuttal to the Epicureans. It essentially agrees that maybe there isn't intelligent design like they say and the spirit just eventually arose from flesh (saying 29), and that the cosmos is like a body, and that everything might die. But then it claims that all that already happened, and that even though we think we're minds that depend on bodies, that we're the simulacra - the copies - not the originals. And that the simulacra are made of light, not flesh. And we were born into a simulacra cosmos as simulacra people.

From its perspective, compared to the Epicurean surety of the death of a mind that depends on a body, this is preferable. Which is why you see it congratulate being a copy in 18-19a:

The disciples said to Jesus, "Tell us, how will our end come?"

Jesus said, "Have you found the beginning, then, that you are looking for the end? You see, the end will be where the beginning is.

Congratulations to the one who stands at the beginning: that one will know the end and will not taste death."

Jesus said, "Congratulations to the one who came into being before coming into being.

The text employs Plato's concepts of eikon/simulacra to avoid the Epicurean notions of death by claiming that the mind will live again as a copy and we are that copy, even if the body is screwed. This is probably the central debate between this sect and the canonical tradition. The cannonical one is all about the body. There's even a Eucharist tradition around believers consuming Jesus's body to join in his bodily resurrection. Thomas has a very different Eucharistic consumption in saying 108, where it is not about drinking someone's blood but about drinking their words that enables becoming like someone.

It's a very unusual philosophy for the time. Parts of it are found elsewhere, but the way it weaves those parts together across related sayings really seems unique.

[–] Spedwell@lemmy.world 2 points 5 months ago (2 children)

Errrrm... No. Don't get your philosophy from LessWrong.

Here's the part of the LessWrong page that cites Simulacra and Simulation:

Like “agent”, “simulation” is a generic term referring to a deep and inevitable idea: that what we think of as the real can be run virtually on machines, “produced from miniaturized units, from matrices, memory banks and command models - and with these it can be reproduced an indefinite number of times.”

This last quote does indeed come from Simulacra (you can find it in the third paragraph here), but it appears to have been quoted solely because when paired with the definition of simulation put forward by the article:

A simulation is the imitation of the operation of a real-world process or system over time.

it appears that Baudrillard supports the idea that a computer can just simulate any goddamn thing we want it to.

If you are familiar with the actual arguments Baudrillard makes, or simply read the context around that quote, it is obvious that this is misappropriating the text.

[–] kromem@lemmy.world 2 points 5 months ago

I'm guessing you didn't read the rest of the piece and were just looking for the first thing to try and invalidate further reading?

If you read the whole thing, it's pretty clear the author is not saying that the recreation is a perfect copy of the original.

[–] TempermentalAnomaly@lemmy.world 2 points 5 months ago (1 children)

Baudrillard is always a joy to read.

[–] afraid_of_zombies@lemmy.world 1 points 5 months ago (1 children)

Glad someone enjoyed him. My overall impression was some random guy getting high on a bus that was driving around the Midwest/Southwest US and he was copying stuff in a journal. With emphasis on the getting high part.

[–] TempermentalAnomaly@lemmy.world 2 points 5 months ago* (last edited 5 months ago) (1 children)

Over the last fifteen years of having read him, I find myself coming back to him to gain clarity of our current situation. At first, I couldn't tell if he was a genius or mad man. I tilt towards genius now.

Edit ... Isn't that Hunter S. Thompson?

[–] afraid_of_zombies@lemmy.world 1 points 5 months ago

In my head canon they are now together. Trying to find the American Dream and the Hyperreal

[–] thallamabond@lemmy.world 2 points 5 months ago

Two things IMO

  1. Your data, more of it.

  2. Using power, mostly to justify the existence of their giant machinery, the Cloud. These machines are used because they HAVE to be used. Bitcoin, nft, now ai everything.