this post was submitted on 10 Feb 2024
97 points (92.9% liked)

Technology

76304 readers
2990 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] sanguine_artichoke@midwest.social 34 points 2 years ago (1 children)

This is what I wondered about a few months ago when people were saying that ChatGPT was a 'google killer'. So we just have 'AI' read websites and sum them up, vs. visiting websites? Why would anyone bother putting information on a website at that point?

[–] dantheclamman@lemmy.world 34 points 2 years ago (1 children)

We are barreling towards this issue. StackOverflow for example has crashing viewer numbers. But an AI isn't going to help users navigate and figure out a new python library for example, without data to train on. I've already had AIs straight up hallucinate about functions in R that actually don't exist. It seems to happen primarily in the newer libraries, probably with fewer posts on stackexchange about them

[–] GenderNeutralBro@lemmy.sdf.org 9 points 2 years ago (1 children)

AI isn’t going to help users navigate and figure out a new python library for example

Current AI will not. Future AI should be able to as long as there is accurate documentation. This is the natural direction for advancement. The only way it doesn't happen is if we've truly hit the plateau already, and that seems very unlikely. GPT-4 is going to look like a cheap toy in a few years, most likely.

And if the AI researchers can't crack that nut fast enough, then API developers will write more machine-friendly documentation and training functions. It could be as ubiquitous as unit testing.

[–] FaceDeer@kbin.social 3 points 2 years ago (1 children)

Current AI can already "read" documentation that isn't part of its training set, actually. Bing Chat, for example, does websearches and bases its answers in part on the text of the pages it finds. I've got a local AI, GPT4All, that you can point at a directory full of documents and tell "include that in your context when answering questions." So we're we're already getting there.

[–] GenderNeutralBro@lemmy.sdf.org 5 points 2 years ago (2 children)

Getting there, but I can say from experience that it's mostly useless with the current offerings. I've tried using GPT4 and Claude2 to give me answers for less-popular command line tools and Python modules by pointing them to complete docs, and I was not able to get meaningful answers. :(

Perhaps you could automate a more exhaustive fine-tuning of an LLM based on such material. I have not tried that, and I am not well-versed in the process.

[–] FaceDeer@kbin.social 2 points 2 years ago (1 children)

I'm thinking a potentially useful middle ground might be to have the AI digest the documentation into an easier-to-understand form first, and then have it query that digest for context later when you're asking it questions about stuff. GPT4All already does something a little similar in that it needs to build a search index for the data before it can make use of it.

[–] GenderNeutralBro@lemmy.sdf.org 1 points 2 years ago (1 children)

That's a good idea. I have not specifically tried loading the documentation into GPT4All's LocalDocs index. I will give this a try when I have some time.

[–] FaceDeer@kbin.social 3 points 2 years ago

I've only been fiddling around with it for a few days, but it seems to me that the default settings weren't very good - by default it'll load four 256-character-long snippets into the AI's context from the search results, which is pretty hit and miss on being informative in my experience. I think I may finally have found a good use for those models with really large contexts, I can crank up the size and number of snippets it loads and that seems to help. But it still doesn't give "global" understanding. For example, if I put a novel into LocalDocs and then ask the AI about general themes or large-scale "what's this character like" stuff it still only has a few isolated bits of the novel to work from.

What I'm imagining is that the AI could sit on its own for a while loading up chunks of the source document and writing "notes" for its future self to read. That would let it accumulate information from across the whole corpus and cross-reference disparate stuff more easily.

What about Github Copilot? It has tons of material available for training. Of course, it's not necessarily all bug-free or well written.