203
this post was submitted on 25 Feb 2024
203 points (83.7% liked)
Technology
59534 readers
3195 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Well difference is you have to know coming to know did the AI produce what you actually wanted.
Anyone can read the letter and know did the AI hallucinate or actually produce what you wanted.
On code. It might produce code, that by first try does what you ask. However turns AI hallucinated a bug into the code for some edge or specialty case.
Hallucinating is not a minor hiccup or minor bug, it is fundamental feature of LLMs. Since it isn't actually smart. It is a stochastic requrgitator. It doesn't know what you asked or understand what it is actually doing. It is matching prompt patterns to output. With enough training patterns to match one statistically usually ends up about there. However this is not quaranteed. Thus the main weakness of the system. More good training data makes it more likely it more often produces good results. However for example for business critical stuff, you aren't interested did it get it about right the 99 other times. It 100% has to get it right, this one time. Since this code goes to a production business deployment.
I guess one can code comprehensive enough verified testing pattern including all the edge cases and with thay verify the result. However now you have just shifted the job. Instead of programmer programming the programming, you have programmer programming the very very comprehensive testing routines. Which can't be LLM done, since the whole point is the testing routines are there to check for the inherent unreliability of the LLM output.
It's a nice toy for someone wanting to make a quick and dirty test code (maybe) to do thing X. Then try to find out does this actually do what I asked or does it have unforeseen behavior. Since I don't know what the behavior of the code is designed to be. Since I didn't write the code. good for toying around and maybe for quick and dirty brainstorming. Not good enough for anything critical, that has to be guaranteed to work with promise of service contract and so on.
So what the future real big job will be is not prompt engineers, but quality assurance and testing engineers who have to be around to guard against hallucinating LLM/ similar AIs. Prompts can be gotten from anyone, what is harder is finding out did the prompt actually produced what it was supposed to produce.