FaceDeer

joined 1 year ago
[–] FaceDeer@kbin.social 20 points 8 months ago (2 children)

Indeed, this is a common misunderstanding of the status of fanworks. Most fanfics likely violate the copyright of the IP they're based on, but that doesn't mean that they aren't themselves original copyrighted works. The original IP's rightsholders can't simply claim the fanfic's copyright for themselves. It likely means that each party would need the other party's permission to make legal copies of the fanfic.

This is why most studios or authors will refuse to even read unsolicited ideas that are sent to them, they don't want to end up in a bind if someone sends them a fanfic that's got elements in it that they already intended to use in future books or episodes and then sues them for "stealing" their work.

[–] FaceDeer@kbin.social 10 points 8 months ago

This article is from June 12, 2023. That's practically stone-aged as far as AI technology has been progressing.

The paper it's based on used a very simplistic approach, training AIs purely on the outputs of its previous "generation." Turns out that's not a realistic real-world scenario, though. In reality AIs can be trained on a mixture of human-generated and AI-generated content and it can actually turn out better than training on human-generated content alone. AI-generated content can be curated and custom-made to be better suited to training, and the human-generated stuff adds back in the edge cases that might disappear when doing repeated training generations.

[–] FaceDeer@kbin.social 55 points 9 months ago (11 children)

I'd be very interested in those results too, though I'd want everyone to bear in mind the possibility that the brain could have many different "masculine" and "feminine" attributes that could be present in all sorts of mixtures when you range afield from whatever statistical clusterings there might be. I wouldn't want to see a situation where a transgender person is denied care because an AI "read" them as cisgender.

In another comment in this thread I mentioned how men and women have different average heights, that would be a good analogy. There are short men and tall women, so you shouldn't rely on just that.

[–] FaceDeer@kbin.social 14 points 9 months ago (1 children)

People's heights change over time too. Men and women can nevertheless have different average heights.

[–] FaceDeer@kbin.social 5 points 9 months ago

Article mentioned 400-word chunks, so much less than paper-sized.

[–] FaceDeer@kbin.social 33 points 9 months ago (2 children)

Not to mention that a response "containing" plagiarism is a pretty poorly defined criterion. The system being used here is proprietary so we don't even know how it works.

I went and looked at how low theater and such were and it's dramatic:

The lowest similarity scores appeared in theater (0.9%), humanities (2.8%) and English language (5.4%).

[–] FaceDeer@kbin.social 3 points 9 months ago (1 children)

That's why I was suggesting such a simple approach, it doesn't require AI or machine learning except in the most basic sense. If you want to try applying fancier stuff you could use those basic word-based filters as a first pass to reduce the cost.

[–] FaceDeer@kbin.social 9 points 9 months ago (4 children)

Another more general property that might be worth looking for would be substantially similar posts that get cross-posted to a wide variety of communities in a short period of time. That's a pattern that can have legitimate reasons but it's probably worth raising a flag to draw extra scrutiny.

One idea for making it computationally lightweight but also robust against bots "tweaking" the wording of each post might be to fingerprint each post based on rare word usage. Spam is likely to mention the brand name of whatever product it's hawking, which is probably not going to be a commonly used word. So if a bunch of posts come along that all use the same rare words all at once, that's suspicious. I could also easily see situations where this gives false positives, of course - if some product suddenly does something newsworthy you could see a spew of legitimate posts about it in a variety of communities. But no automated spam checker is perfect.

[–] FaceDeer@kbin.social 3 points 9 months ago (1 children)

Indeed, and many of the more advanced AI systems currently out there are already using LLMs as just one component. Retrieval-augmented generation, for example, adds a separate "memory" that gets searched and bits inserted into the context of the LLM when it's answering questions. LLMs have been trained to be able to call external APIs to do the things they're bad at, like math. The LLM is typically still the central "core" of the system, though; the other stuff is routine sorts of computer activities that we've already had a handle on for decades.

IMO it still boils down to a continuum. If there's an AI system that's got an LLM in it but also a Wolfram Alpha API and a websearch API and other such "helpers", then that system should be considered as a whole when asking how "intelligent" it is.

[–] FaceDeer@kbin.social 13 points 9 months ago (2 children)

It was the British spelling.

[–] FaceDeer@kbin.social 2 points 9 months ago (2 children)

Call it whatever makes you feel happy, it is allowing me to accomplish things much more quickly and easily than working without it does.

view more: ‹ prev next ›