FaceDeer

joined 1 year ago
[–] FaceDeer@kbin.social -1 points 9 months ago (1 children)

What's wrong with the logic used to justify using public data to train large language models?

[–] FaceDeer@kbin.social 28 points 9 months ago (28 children)

Calling it "stealing content" is loaded terminology. You're posting content on an open protocol whose very purpose is to broadcast it far and wide.

[–] FaceDeer@kbin.social 55 points 9 months ago (8 children)

Unless there's some actual technical reason why this a bad idea, I don't buy the "ethical" hand-wringing here. It sounds like just another case of not liking specific social media companies and wanting the defaults to conform to those personal dislikes.

[–] FaceDeer@kbin.social 2 points 9 months ago

It's because proof-of-stake is fundamentally different from how proof-of-work operates.

The fundamental problem that all blockchains need to solve is something called the Byzantine Generals Problem. A blockchain needs to consist of a list of transactions that everyone agrees on - everyone needs to be able to know which transactions are part of the list, and what order they appear on that list. But there can't be any central "authority" making that decision, it has to be done in a completely decentralized way.

The way proof of work does it is that it requires people adding transactions to the list to do some extremely expensive calculations and attach the results of those calculations to the transactions that they're adding. Anyone can do those calculations so there's no central authority, but the costliness of the calculations means that once the transactions are added it becomes just as expensive to create a substitute set of transactions. So everyone ends up agreeing on what transactions were added because it would be unfeasably costly to "fake" an alternative history to the blockchain. This means it's impossible to make a proof-of-work chain that isn't hugely "wasteful", because the waste is the point of it. It has to be costly for it to work.

Proof-of-stake takes a very different approach. It solves the same basic problem - determining which transactions are part of the chain in a decentralized manner - using some very fancy cryptography that I have to admit that I don't fully understand. But instead of proving that the transactions you're adding are "trustworthy" due to proving you've wasted a whole lot of resources adding them, you do it by putting up a "stake." You lock a big sum of money in your cryptocurrency staking account and essentially make it a hostage to your good behaviour. If you put up a bad transaction you can lose your stake. So under proof-of-stake there's simply no need to burn huge amounts of electricity.

Monero uses a proof-of-work algorithm like Bitcoin. The reason Monero doesn't use anywhere near as much energy as Bitcoin is simply because it isn't worth as much and so not as many people are mining it. If Monero was worth as much as Bitcoin the energy usage would rise to become comparable.

[–] FaceDeer@kbin.social 3 points 9 months ago

I've only been fiddling around with it for a few days, but it seems to me that the default settings weren't very good - by default it'll load four 256-character-long snippets into the AI's context from the search results, which is pretty hit and miss on being informative in my experience. I think I may finally have found a good use for those models with really large contexts, I can crank up the size and number of snippets it loads and that seems to help. But it still doesn't give "global" understanding. For example, if I put a novel into LocalDocs and then ask the AI about general themes or large-scale "what's this character like" stuff it still only has a few isolated bits of the novel to work from.

What I'm imagining is that the AI could sit on its own for a while loading up chunks of the source document and writing "notes" for its future self to read. That would let it accumulate information from across the whole corpus and cross-reference disparate stuff more easily.

[–] FaceDeer@kbin.social 2 points 9 months ago (2 children)

I'm thinking a potentially useful middle ground might be to have the AI digest the documentation into an easier-to-understand form first, and then have it query that digest for context later when you're asking it questions about stuff. GPT4All already does something a little similar in that it needs to build a search index for the data before it can make use of it.

[–] FaceDeer@kbin.social 8 points 9 months ago (2 children)

And yet there are still hobbyists. We only "live under capitalism" to the extent that we have to, people still do things for reasons other than money.

[–] FaceDeer@kbin.social 2 points 9 months ago (2 children)

I'm no fan of Bitcoin, but often the energy they use from hydro plants is energy that would literally be wasted otherwise. A hydro dam can't control how much water is entering the reservoir, so if there's more water entering the reservoir than is needed to generate electricity for the current demand then the dam will need to just throw the extra water away. Trying to transmit the electricty to remote markets can be an alternative, but that costs resources too and isn't always practical.

[–] FaceDeer@kbin.social -3 points 9 months ago

Probably, but Ethereum does a lot of things that Visa can't. Visa transactions are exceedingly simple. It was just the only generally comparable thing I could think of that I could get energy figures for, do you know of any better examples?

[–] FaceDeer@kbin.social 3 points 9 months ago (5 children)

Current AI can already "read" documentation that isn't part of its training set, actually. Bing Chat, for example, does websearches and bases its answers in part on the text of the pages it finds. I've got a local AI, GPT4All, that you can point at a directory full of documents and tell "include that in your context when answering questions." So we're we're already getting there.

[–] FaceDeer@kbin.social 2 points 9 months ago (4 children)

People who make content for money are suffering from a collapse in ad prices. There are people who make content because they enjoy making and sharing content.

view more: ‹ prev next ›