#you use terminology wrong they'll drag you in a mailing list
Explore tagged Tumblr posts
rocket-penguin · 8 months ago
Text
https://futurism.com/the-byte/openai-whistleblower-copyrighted-data
"Because OpenAI was still technically just a well-funded research company at the time, the issue of copyright wasn't as big of a deal."
Hey I was in computer science academia and it absolutely was considered a big deal. If you were doing research you had to make damn sure that your data was actually allowable. Training off of google images was considered off limits for a publication. There were and are large canonical datasets that are licensed freely or for purchase for academic purposes and we passed that info around as much as we could. The research funding for people frequently went to these licenses or into collecting their own data.
This idea that just grabbing random copyrighted data is "ok" or "normal" for academic research is absolutely a new thing, and it only came around after business bros started seeing machine learning as a moneymaking pathway and started trying to justify theft.
2 notes · View notes