#I really appreciate OPs commitment to making a clear and direct argument here
Explore tagged Tumblr posts
Text
You're wrong when you say GenAI is incapable of reproducing specific works.
Specifically: exactly reproducing training data is what the models are trained to do, and then many techniques are used to attempt to get the overall product to not reproduce exact copies of the training data in their output.
Here's a source that talks about different techniques to use to prevent verbatim copying in LLMs:
It specifically says this is undesirable because:
LLMs can regenerate copyrighted content verbatim, creating risks for both LLM providers and users.
For Users: Outputs containing copyrighted content could result in unintended legal issues, especially if such material is used commercially or distributed without proper authorization.
For providers: Hosting and distributing models capable of regenerating protected content poses unresolved legal challenges. This issue is particularly concerning for code models. Verbatim code reuse can impact licensing agreements, even for open-source code with restrictions on commercial use.
So your argument that genAI shouldn't be copyright infringement isn't accepted by people who develop genAI models and put lots of effort into making them infringe less often in their final outputs.
Your argument is obviously stronger when applied to novel output of genAIs (which may be all you intended to argue about). But the fact that they often and undesirably create exact copies of their training data does raise questions about the extent to which what they do is simply copying other works badly or is the creation of something original. Does it change your perception of a GenAI work to know that its first several attempts were deleted and not displayed to a user because they were judged by the service's algorithms to be too copyright infringing?
Do you care if LLMs reproduce Open Source code with a closed source license erroneously attached? Or are you contemptuous of the Open source and copyleft movements because they rely on copyright to enforce their licensing terms?
I feel like the people attacking the idea of copyright in the notes - especially those saying it's a tool that only benefits corporations - should be reminded that open sourcing and Creative Commons licencing rely on copyright to have legal force.
the framing of generative ai as "theft" in popular discourse has really set us back so far like not only should we not consider copyright infringement theft we shouldn't even consider generative ai copyright infringement
#I really appreciate OPs commitment to making a clear and direct argument here#because the discourse on this topic is usually really bad#with no one making good arguments#Copyright is many things#it isn't all good or all bad#Gen AI is many many different things#and even just LLMs can be used in many different ways#I saw someone claiming that genAI are accessibility devices recently#and that shocked me#but I came around to the argument#in the same way that a cellphone is an accessibility device#hell#the copying functionality of a camera phone provides increased ability to remember things exactly#but that doesn't mean it can't also infringe copyright -- potentially at the exact same time#(tho fair use exceptions would likely apply)#Anyway#I hope people can discuss this issue productively in good faith instead of having knee jerk reactions either way#Super fun aside: in one interpretation genAI output cannot be copyrighted and it seems the courts are going with this interpretation#which means it's going to be really hard for companies to legally make money with these things in some of the ways they traditionally might#The AI gen art that won competitions?#anyone can sell prints of it#Legally I think this destruction of copyright might be the biggest long range issue of genAI#but that's long range and might rely on being able to tell if something was generated or not#discourse#genAI discourse
10K notes
·
View notes