How dare they provide a useful tool like this, those bastards.
FaceDeer
Why do you think so, and why does it matter?
Large language models are notoriously poor at math, you should probably use a different tool for that sort of thing.
Not so obscure that Meta isn't paying attention and planning for interoperation, and Meta is one of the biggest players in the AI development field.
A complete data set isn't required, just a comprehensive one.
I actually was on the job market just a few months back for the first time in 15 years. Those sorts of comedy postings are not common. It's true that often the position doesn't require as much experience as the "dream candidate" they're asking for in the job posting, but A) they're aware of that, and B) they take that into account when screening resumes. Lying on your resume is not required, it's only going to waste everyone's time if you do.
And I'm pretty doubtful that OP would be capable of producing usable work. He says it himself, he's being deceptive about his abilities.
They can't do the work without the data, though.
Or rather, they can't do the work without the risk of Reddit raising a legal fuss that would cost them more than $60 million. The data itself can already be downloaded for free from various places.
Step one in succeeding in a job is passing the interviews and getting that job.
OP was just wasting everyone's time, both his own and the interviewers.
OP said "Of course I don’t make it past 1 or 2 interviews in such cases." So it seems pretty straightforward that he wasn't qualified, as in he wasn't going to succeed in those roles.
The value comes from the work that can be done with it. If you can train an AI off it then it's worth something.
Right. But my point is that they can profit from it. The issue lots of folks seem to be having is "how dare Reddit make money using something I did!", and that issue is even worse for the Fediverse since lots of companies can be doing it.
It's not exactly training, but Google just recently previewed a LLM with a million-token context that can do effectively the same thing. One of the tests they did was to put a dictionary for a very obscure language (only 200 speakers worldwide) into the context, knowing that nothing about that language was in its original training data, and the LLM was able to translate it fluently.
This just means that OpenAI is voluntarily ceding the field to more ambitious companies.