Ah, you're suggesting using RFC 3514. Good thinking.
tinwhiskers
what about edited?
I was just looking at https://haveibeenpwned.com/ and it listed appen as a site that breached my details. I had no idea who they were or why they had my details. I guess this is related?
Appen: In June 2020, the AI training data company Appen suffered a data breach exposing the details of almost 5.9 million users which were subsequently sold online. Included in the breach were names, email addresses and passwords stored as bcrypt hashes. Some records also contained phone numbers, employers and IP addresses. The data was provided to HIBP by dehashed.com.
They have released it on github. The code is only about 500 lines. But releasing the model is arguably more important because that sort of compute is not affordable to any mortals.
Yeah, the ingestion part is still to be determined legally, but I think OpenAI will be ok. NYT produces content to be read, and copyright only protects them from people republishing their content. People also ingest their content and can make derivative works without problem. OpenAI are just doing the same, but at a level of ability that could be disruptive to some companies. This isn't even really very harmful to the NYT, since the historical material used doesn't even conflict with their primary purpose of producing new news. It'll be interesting to see how it plays out though.
Only publishing it is a copyright issue. You can also obtain copyrighted material with a web browser. The onus is on the person who publishes any material they put together, regardless of source. OpenAI is not responsible for publishing just because their tool was used to obtain the material.
Yeah, I'm surprised Google or another big player hasn't released something yet, or that the people like the IETF haven't had any RFCs or produced any practical standards. Now's the time to get market dominance. Perhaps nobody will react until the shit hits the fan.
I mean, pgp is great, but in this day and age we need a simple standard people can use to sign media without a hassle and we may also need chain of custody in light of social media (edits and whatnot). Developers will likely need or want to build it into their software, so we need a standard. I don't think the pgp approach really worked for most people.