#open-source-AI
Explore tagged Tumblr posts
indirezioneostinata · 5 months ago
Text
Dal Pre-training all'Expert Iteration: Il Percorso verso la Riproduzione di OpenAI Five
Il Reinforcement Learning (RL) rappresenta un approccio distintivo nel panorama del machine learning, basato sull’interazione continua tra un agente e il suo ambiente. In RL, l’agente apprende attraverso un ciclo di azioni e ricompense, con l’obiettivo di massimizzare il guadagno cumulativo a lungo termine. Questa strategia lo differenzia dagli approcci tradizionali come l’apprendimento…
0 notes
briteredoctober · 5 months ago
Text
The real issue with DeepSeek is that capitalists can't profit from it.
I always appreciate when the capitalist class just says it out loud so I don't have to be called a conspiracy theorist for pointing out the obvious.
Tumblr media
955 notes · View notes
morlock-holmes · 3 days ago
Text
I continue to be incredibly fucking baffled by the number of youtubers and fan artists going, "The problem with AI is that it monetizes somebody else's IP without them getting any of the revenue stream! Anyway if you want to commission me to draw a Disney character the info is in my bio."
I seriously don't see how they cannot grasp the connection, it seems impossibly obvious to me.
It is weird to literally make your living (Or your side hustle) monetizing somebody else's artwork and then make PSAs about how bad it is to do that.
It's like if back in the day it had been a bunch of Napster users sharing the "You wouldn't download a car" ads.
99 notes · View notes
cyle · 5 months ago
Text
still confused how to make any of these LLMs useful to me.
while my daughter was napping, i downloaded lm studio and got a dozen of the most popular open source LLMs running on my PC, and they work great with very low latency, but i can't come up with anything to do with them but make boring toy scripts to do stupid shit.
as a test, i fed deepseek r1, llama 3.2, and mistral-small a big spreadsheet of data we've been collecting about my newborn daughter (all of this locally, not transmitting anything off my computer, because i don't want anybody with that data except, y'know, doctors) to see how it compared with several real doctors' advice and prognoses. all of the LLMs suggestions were between generically correct and hilariously wrong. alarmingly wrong in some cases, but usually ending with the suggestion to "consult a medical professional" -- yeah, duh. pretty much no better than old school unreliable WebMD.
then i tried doing some prompt engineering to punch up some of my writing, and everything ended up sounding like it was written by an LLM. i don't get why anybody wants this. i can tell that LLM feel, and i think a lot of people can now, given the horrible sales emails i get every day that sound like they were "punched up" by an LLM. it's got a stink to it. maybe we'll all get used to it; i bet most non-tech people have no clue.
i may write a small script to try to tag some of my blogs' posts for me, because i'm really bad at doing so, but i have very little faith in the open source vision LLMs' ability to classify images. it'll probably not work how i hope. that still feels like something you gotta pay for to get good results.
all of this keeps making me think of ffmpeg. a super cool, tiny, useful program that is very extensible and great at performing a certain task: transcoding media. it used to be horribly annoying to transcode media, and then ffmpeg came along and made it all stupidly simple overnight, but nobody noticed. there was no industry bubble around it.
LLMs feel like they're competing for a space that ubiquitous and useful that we'll take for granted today like ffmpeg. they just haven't fully grasped and appreciated that smallness yet. there isn't money to be made here.
61 notes · View notes
mostlysignssomeportents · 1 year ago
Text
Microsoft pinky swears that THIS TIME they’ll make security a priority
Tumblr media
One June 20, I'm live onstage in LOS ANGELES for a recording of the GO FACT YOURSELF podcast. On June 21, I'm doing an ONLINE READING for the LOCUS AWARDS at 16hPT. On June 22, I'll be in OAKLAND, CA for a panel and a keynote at the LOCUS AWARDS.
Tumblr media
As the old saying goes, "When someone tells you who they are and you get fooled again, shame on you." That goes double for Microsoft, especially when it comes to security promises.
Microsoft is, was, always has been, and always will be a rotten company. At every turn, throughout their history, they have learned the wrong lessons, over and over again.
That starts from the very earliest days, when the company was still called "Micro-Soft." Young Bill Gates was given a sweetheart deal to supply the operating system for IBM's PC, thanks to his mother's connection. The nepo-baby enlisted his pal, Paul Allen (whom he'd later rip off for billions) and together, they bought someone else's OS (and took credit for creating it – AKA, the "Musk gambit").
Microsoft then proceeded to make a fortune by monopolizing the OS market through illegal, collusive arrangements with the PC clone industry – an industry that only existed because they could source third-party PC ROMs from Phoenix:
https://www.eff.org/deeplinks/2019/08/ibm-pc-compatible-how-adversarial-interoperability-saved-pcs-monopolization
Bill Gates didn't become one of the richest people on earth simply by emerging from a lucky orifice; he also owed his success to vigorous antitrust enforcement. The IBM PC was the company's first major initiative after it was targeted by the DOJ for a 12-year antitrust enforcement action. IBM tapped its vast monopoly profits to fight the DOJ, spending more on outside counsel to fight the DOJ antitrust division than the DOJ spent on all its antitrust lawyers, every year, for 12 years.
IBM's delaying tactic paid off. When Reagan took the White House, he let IBM off the hook. But the company was still seriously scarred by its ordeal, and when the PC project kicked off, the company kept the OS separate from the hardware (one of the DOJ's major issues with IBM's previous behavior was its vertical monopoly on hardware and software). IBM didn't hire Gates and Allen to provide it with DOS because it was incapable of writing a PC operating system: they did it to keep the DOJ from kicking down their door again.
The post-antitrust, gunshy IBM kept delivering dividends for Microsoft. When IBM turned a blind eye to the cloned PC-ROM and allowed companies like Compaq, Dell and Gateway to compete directly with Big Blue, this produced a whole cohort of customers for Microsoft – customers Microsoft could play off on each other, ensuring that every PC sold generated income for Microsoft, creating a wide moat around the OS business that kept other OS vendors out of the market. Why invest in making an OS when every hardware company already had an exclusive arrangement with Microsoft?
The IBM PC story teaches us two things: stronger antitrust enforcement spurs innovation and opens markets for scrappy startups to grow to big, important firms; as do weaker IP protections.
Microsoft learned the opposite: monopolies are wildly profitable; expansive IP protects monopolies; you can violate antitrust laws so long as you have enough monopoly profits rolling in to outspend the government until a Republican bootlicker takes the White House (Microsoft's antitrust ordeal ended after GW Bush stole the 2000 election and dropped the charges against them). Microsoft embodies the idea that you either die a rebel hero or live long enough to become the evil emperor you dethroned.
From the first, Microsoft has pursued three goals:
Get too big to fail;
Get too big to jail;
Get too big to care.
It has succeeded on all three counts. Much of Microsoft's enduring power comes from succeeded IBM as the company that mediocre IT managers can safely buy from without being blamed for the poor quality of Microsoft's products: "Nobody ever got fired for buying Microsoft" is 2024's answer to "Nobody ever got fired for buying IBM."
Microsoft's secret sauce is impunity. The PC companies that bundle Windows with their hardware are held blameless for the glaring defects in Windows. The IT managers who buy company-wide Windows licenses are likewise insulated from the rage of the workers who have to use Windows and other Microsoft products.
Microsoft doesn't have to care if you hate it because, for the most part, it's not selling to you. It's selling to a few decision-makers who can be wined and dined and flattered. And since we all have to use its products, developers have to target its platform if they want to sell us their software.
This rarified position has afforded Microsoft enormous freedom to roll out harebrained "features" that made things briefly attractive for some group of developers it was hoping to tempt into its sticky-trap. Remember when it put a Turing-complete scripting environment into Microsoft Office and unleashed a plague of macro viruses that wiped out years worth of work for entire businesses?
https://web.archive.org/web/20060325224147/http://www3.ca.com/securityadvisor/newsinfo/collateral.aspx?cid=33338
It wasn't just Office; Microsoft's operating systems have harbored festering swamps of godawful defects that were weaponized by trolls, script kiddies, and nation-states:
https://en.wikipedia.org/wiki/EternalBlue
Microsoft blamed everyone except themselves for these defects, claiming that their poor code quality was no worse than others, insisting that the bulging arsenal of Windows-specific malware was the result of being the juiciest target and thus the subject of the most malicious attention.
Even if you take them at their word here, that's still no excuse. Microsoft didn't slip and accidentally become an operating system monopolist. They relentlessly, deliberately, illegally pursued the goal of extinguishing every OS except their own. It's completely foreseeable that this dominance would make their products the subject of continuous attacks.
There's an implicit bargain that every monopolist makes: allow me to dominate my market and I will be a benevolent dictator who spends his windfall profits on maintaining product quality and security. Indeed, if we permit "wasteful competition" to erode the margins of operating system vendors, who will have a surplus sufficient to meet the security investment demands of the digital world?
But monopolists always violate this bargain. When faced with the decision to either invest in quality and security, or hand billions of dollars to their shareholders, they'll always take the latter. Why wouldn't they? Once they have a monopoly, they don't have to worry about losing customers to a competitor, so why invest in customer satisfaction? That's how Google can piss away $80b on a stock buyback and fire 12,000 technical employees at the same time as its flagship search product (with a 90% market-share) is turning into an unusable pile of shit:
https://pluralistic.net/2024/02/21/im-feeling-unlucky/#not-up-to-the-task
Microsoft reneged on this bargain from day one, and they never stopped. When the company moved Office to the cloud, it added an "analytics" suite that lets bosses spy on and stack-rank their employees ("Sorry, fella, Office365 says you're the slowest typist in the company, so you're fired"). Microsoft will also sell you internal data on the Office365 usage of your industry competitors (they'll sell your data to your competitors, too, natch). But most of all, Microsoft harvest, analyzes and sells this data for its own purposes:
https://pluralistic.net/2020/11/25/the-peoples-amazon/#clippys-revenge
Leave aside how creepy, gross and exploitative this is – it's also incredibly reckless. Microsoft is creating a two-way conduit into the majority of the world's businesses that insider threats, security services and hackers can exploit to spy on and wreck Microsoft's customers' business. You don't get more "too big to care" than this.
Or at least, not until now. Microsoft recently announced a product called "Recall" that would record every keystroke, click and screen element, nominally in the name of helping you figure out what you've done and either do it again, or go back and fix it. The problem here is that anyone who gains access to your system – your boss, a spy, a cop, a Microsoft insider, a stalker, an abusive partner or a hacker – now has access to everything, on a platter. Naturally, this system – which Microsoft billed as ultra-secure – was wildly insecure and after a series of blockbuster exploits, the company was forced to hit pause on the rollout:
https://arstechnica.com/gadgets/2024/06/microsoft-delays-data-scraping-recall-feature-again-commits-to-public-beta-test/
For years, Microsoft waged a war on the single most important security practice in software development: transparency. This is the company that branded the GPL Free Software license a "virus" and called open source "a cancer." The company argued that allowing public scrutiny of code would be a disaster because bad guys would spot and weaponize defects.
This is "security through obscurity" and it's an idea that was discredited nearly 500 years ago with the advent of the scientific method. The crux of that method: we are so good at bullshiting ourselves into thinking that our experiment was successful that the only way to make sure we know anything is to tell our enemies what we think we've proved so they can try to tear us down.
Or, as Bruce Schneier puts it: "Anyone can design a security system that you yourself can't think of a way of breaking. That doesn't mean it works, it just means that it works against people stupider than you."
And yet, Microsoft – whose made more widely and consequentially exploited software than anyone else in the history of the human race – claimed that free and open code was insecure, and spent millions on deceptive PR campaigns intended to discredit the scientific method in favor of a kind of software alchemy, in which every coder toils in secret, assuring themselves that drinking mercury is the secret to eternal life.
Access to source code isn't sufficient to make software secure – nothing about access to code guarantees that anyone will review that code and repair its defects. Indeed, there've been some high profile examples of "supply chain attacks" in the free/open source software world:
https://www.securityweek.com/supply-chain-attack-major-linux-distributions-impacted-by-xz-utils-backdoor/
But there's no good argument that this code would have been more secure if it had been harder for the good guys to spot its bugs. When it comes to secure code, transparency is an essential, but it's not a sufficency.
The architects of that campaign are genuinely awful people, and yet they're revered as heroes by Microsoft's current leadership. There's Steve "Linux Is Cancer" Ballmer, star of Propublica's IRS Files, where he is shown to be the king of "tax loss harvesting":
https://pluralistic.net/2023/04/24/tax-loss-harvesting/#mego
And also the most prominent example of the disgusting tax cheats practiced by rich sports-team owners:
https://pluralistic.net/2021/07/08/tuyul-apps/#economic-substance-doctrine
Microsoft may give lip service to open source these days (mostly through buying, stripmining and enclosing Github) but Ballmer's legacy lives on within the company, through its wildly illegal tax-evasion tactics:
https://pluralistic.net/2023/10/13/pour-encoragez-les-autres/#micros-tilde-one
But Ballmer is an angel compared to his boss, Bill Gates, last seen some paragraphs above, stealing the credit for MS DOS from Tim Paterson and billions of dollars from his co-founder Paul Allen. Gates is an odious creep who made billions through corrupt tech industry practices, then used them to wield influence over the world's politics and policy. The Gates Foundation (and Gates personally) invented vaccine apartheid, helped kill access to AIDS vaccines in Sub-Saharan Africa, then repeated the trick to keep covid vaccines out of reach of the Global South:
https://pluralistic.net/2021/04/13/public-interest-pharma/#gates-foundation
The Gates Foundation wants us to think of it as malaria-fighting heroes, but they're also the leaders of the war against public education, and have been key to the replacement of public schools with charter schools, where the poorest kids in America serve as experimental subjects for the failed pet theories of billionaire dilettantes:
https://www.ineteconomics.org/perspectives/blog/millionaire-driven-education-reform-has-failed-heres-what-works
(On a personal level, Gates is also a serial sexual abuser who harassed multiple subordinates into having sexual affairs with him:)
https://www.nytimes.com/2022/01/13/technology/microsoft-sexual-harassment-policy-review.html
The management culture of Microsoft started rotten and never improved. It's a company with corruption and monopoly in its blood, a firm that would always rather build market power to insulate itself from the consequences of making defective products than actually make good products. This is true of every division, from cloud computing:
https://pluralistic.net/2022/09/28/other-peoples-computers/#clouded-over
To gaming:
https://pluralistic.net/2023/04/27/convicted-monopolist/#microsquish
No one should ever trust Microsoft to do anything that benefits anyone except Microsoft. One of the low points in the otherwise wonderful surge of tech worker labor organizing was when the Communications Workers of America endorsed Microsoft's acquisition of Activision because Microsoft promised not to union-bust Activision employees. They lied:
https://80.lv/articles/qa-workers-contracted-by-microsoft-say-they-were-fired-for-trying-to-unionize/
Repeatedly:
https://www.reuters.com/technology/activision-fired-staff-using-strong-language-about-remote-work-policy-union-2023-03-01/
Why wouldn't they lie? They've never faced any consequences for lying in the past. Remember: the secret to Microsoft's billions is impunity.
Which brings me to Solarwinds. Solarwinds is an enterprise management tool that allows IT managers to see, patch and control the computers they oversee. Foreign spies hacked Solarwinds and accessed a variety of US federal agencies, including National Nuclear Security Administration (who oversee nuclear weapons stockpiles), the NIH, and the Treasury Department.
When the Solarwinds story broke, Microsoft strenuously denied that the Solarwinds hack relied on exploiting defects in Microsoft software. They said this to everyone: the press, the Pentagon, and Congress.
This was a lie. As Renee Dudley and Doris Burke reported for Propublica, the Solarwinds attack relied on defects in the SAML authentication system that Microsoft's own senior security staff had identified and repeatedly warned management about. Microsoft's leadership ignored these warnings, buried the research, prohibited anyone from warning Microsoft customers, and sidelined Andrew Harris, the researcher who discovered the defect:
https://www.propublica.org/article/microsoft-solarwinds-golden-saml-data-breach-russian-hackers
The single most consequential cyberattack on the US government was only possible because Microsoft decided not to fix a profound and dangerous bug in its code, and declined to warn anyone who relied on this defective software.
Yesterday, Microsoft president Brad Smith testified about this to Congress, and promised that the company would henceforth prioritize security over gimmicks like AI:
https://arstechnica.com/tech-policy/2024/06/microsoft-in-damage-control-mode-says-it-will-prioritize-security-over-ai/
Despite all the reasons to mistrust this promise, the company is hoping Congress will believe it. More importantly, it's hoping that the Pentagon will believe it, because the Pentagon is about to award billions in free no-bid military contract profits to Microsoft:
https://www.axios.com/2024/05/17/pentagon-weighs-microsoft-licensing-upgrades
You know what? I bet they'll sell this lie. It won't be the first time they've convinced Serious People in charge of billions of dollars and/or lives to ignore that all-important maxim, "When someone tells you who they are and you get fooled again, shame on you."
Tumblr media
If you'd like an essay-formatted version of this post to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
https://pluralistic.net/2024/06/14/patch-tuesday/#fool-me-twice-we-dont-get-fooled-again
278 notes · View notes
unitedfrontvarietyhour · 5 months ago
Text
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
A Tale of Two Ws, and neither are America's.
Womp womp.
41 notes · View notes
utopicwork · 8 months ago
Text
10/28/24
Whoops looks like most "open source" ai models aren't actually open source.
90 notes · View notes
myrsinemezzo · 8 months ago
Text
Open Source Image Resources
Aka Down with AI for Fanworks
After someone in a server I was formerly in brought up a point about how many of those against AI for fanworks often use it for fic covers (which I doubt to some extent…) I thought I’d put together this resource for anyone who wants images to play with through Open Source or Creative Commons non-AI means. Feel free and please do add to this list. These are the ones that came to me off the top of my head and with the help of a few friends.
Archive.org
You can use an Advanced Search function to find Out of Copyright and Creative Commons licensed images using the date fields like so:
Tumblr media Tumblr media
National Gallery of Art
Getty Museum
Cleveland Museum of Art
The Metropolitan Museum of Art
The Smithsonian
Creative Commons Images:
Pexels.com
Pexels used to be a go-to source for stock photos licensed via Creative Commons but it is absolutely overrun with AI images at this point. If you find something with some digging, however, you can check the posting date and correlate whether it might be a more recent AI image or not.
Tumblr media Tumblr media
Please do add to this list if you like because I know I’m missing things. One good tactic if you want to find Open Source images is to just do a search engine query for “open source” along with the name of the institution you’re interested in.
46 notes · View notes
novella-november · 10 months ago
Text
Do you know of any Free, Open Source writing tools and aids?
plain text: Do you know of any Free, Open Source writing tools and aids?
Please submit them, either in an ask, a PM, or in the *Replies* of this post so they're easy to keep track of!
In the opposite direction of pro-ai, subscription based ""resources"" for writers, I would love to put together a masterpost of *Free, Open Source* programs on this blog, to help out any aspiring writers out there!
Let's start with the very basics, shall we?
Free and Open Source Word Processor:
(plain text: Free and Open Source Word Processor)
LibreOffice is community-driven and developed software, and is a project of the not-for-profit organization, The Document Foundation. LibreOffice is free and open source software, originally based on OpenOffice.org (commonly known as OpenOffice), and is the most actively developed OpenOffice.org successor project.
72 notes · View notes
willcodehtmlforfood · 4 months ago
Text
Tumblr media
24 notes · View notes
dirty-dirty-muggle · 2 months ago
Text
Reading about the Tumblr layoffs last week and everyone madly backing up their blogs, I am wondering if it’s time to go back to an open source community platform like Dreamwidth. I have a very old blog there I haven’t used in a decade and always liked it; the only issue it ever had was low user numbers. But honestly, anything corporate owned is going down the tubes and has been for a while. Meta apps are dead to me because of AI and being owned by a fascist bigot, there was a good reason everyone made a mass exodus of LJ in the mid 2010s, and Discord is trying to go public, which is essentially the death knell for them remaining at all user-centric (things have already gotten worse lately as they prep).
Feels like it may be the time to start leaning heavily again into open-source and fandom-run communities that can’t be turned off or fucked up at the whim of some asshole CEO who wants to line their pockets with a few more dollars.
Plus, because of the internet’s rapid decline into corporate, capitalist suckage, I’ve found myself growing increasingly nostalgic about the olden days of the internet and online fandom (sites like Sugar Quill, WIKTT, Granger Enchanted, and old LJ). I may just act on that impulse.
(I’m on bluesky as well and it seems great but not as good of a platform for community building vs others imo.)
Is anyone else in HP fandom using dreamwidth still?
8 notes · View notes
sadkachow · 10 months ago
Text
how the fuck did my english class manage to take a semi-positive stance on generative ai
28 notes · View notes
jbird-the-manwich · 4 months ago
Text
if I could make people learn one thing to a disgusting level of detail it would be language model implementation because the hearsay is absolutely insane here, but. one of my posts about open source software got really popular, so, I want to put forth that if you're hyped about open source software and want to support it and are very serious about wanting to protect it, maybe avoid saying "llms" when you mean a commercial, shady, closed source language model like chatgpt.
most people here don't seem aware of how many options there are, or how the vast majority of them are actually free and open source, and I would argue boosting smaller more ethical alternatives to well known commercial closed LLMs is arguably more productive than debating the use of any LLM as if using an LLM means being beholden with the doings of a specific company when it doesn't have to because of the free availability and public scrutibility of open source software.
if you have moral considerations over the use of AI in general, because of things you've heard about the largest commercial ones, it's worth knowing that those implementation choices do not necessarily represent the only way to implement a language model, or even a reasonable sample of an "average" llm, (they are indeed outliers by significant margins and moreso as time goes on) and that it is entirely possible to use models that are vastly more efficient for similar functionality implemented in more responsible ways. This list is not even the especially small ones, just the ones with the most functional parity to closed models.
More ethical ai is literally widely available in the public domain. Right now. Because of open source software. and diva is totally unsung.
commercial companies didn't make the most ethical ai, but the open source ecosystem is, and people still talk like language models themselves cannot be built in a way that isn't fundementally exploitative of consumers data or are by nature always needlessly ecologically irresponsible (because that is what Sam Altman told people) when literally most llms produced in the last two years are absolutely diminutive in comparison in their size and use of resources and kinda showed him off as being something of a scapegoating liar.
So if you're complaining about llms in a general sense, at least remember to say "open source ai did it better" because they did.
9 notes · View notes
cyle · 4 months ago
Note
I know you're on paternity leave so feel free to ignore this if you don't want to think about it, but has there been any progress on open-sourcing Tumblr's front-end? Inquiring minds would like to know
i hadn’t seen any progress on it before i left. there’s a strong willingness to do it, it’s just a big task to get it open-source-able in a sustainable way. a lot of our CI/CD processes rely on stuff that would need to be rebuilt from scratch, i think. totally doable, just not a priority.
but maybe there’s been progress since i left, i dunno! 🤞
26 notes · View notes
mostlysignssomeportents · 2 years ago
Text
"Open" "AI" isn’t
Tumblr media
Tomorrow (19 Aug), I'm appearing at the San Diego Union-Tribune Festival of Books. I'm on a 2:30PM panel called "Return From Retirement," followed by a signing:
https://www.sandiegouniontribune.com/festivalofbooks
Tumblr media
The crybabies who freak out about The Communist Manifesto appearing on university curriculum clearly never read it – chapter one is basically a long hymn to capitalism's flexibility and inventiveness, its ability to change form and adapt itself to everything the world throws at it and come out on top:
https://www.marxists.org/archive/marx/works/1848/communist-manifesto/ch01.htm#007
Today, leftists signal this protean capacity of capital with the -washing suffix: greenwashing, genderwashing, queerwashing, wokewashing – all the ways capital cloaks itself in liberatory, progressive values, while still serving as a force for extraction, exploitation, and political corruption.
A smart capitalist is someone who, sensing the outrage at a world run by 150 old white guys in boardrooms, proposes replacing half of them with women, queers, and people of color. This is a superficial maneuver, sure, but it's an incredibly effective one.
In "Open (For Business): Big Tech, Concentrated Power, and the Political Economy of Open AI," a new working paper, Meredith Whittaker, David Gray Widder and Sarah B Myers document a new kind of -washing: openwashing:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4543807
Openwashing is the trick that large "AI" companies use to evade regulation and neutralizing critics, by casting themselves as forces of ethical capitalism, committed to the virtue of openness. No one should be surprised to learn that the products of the "open" wing of an industry whose products are neither "artificial," nor "intelligent," are also not "open." Every word AI huxters say is a lie; including "and," and "the."
So what work does the "open" in "open AI" do? "Open" here is supposed to invoke the "open" in "open source," a movement that emphasizes a software development methodology that promotes code transparency, reusability and extensibility, which are three important virtues.
But "open source" itself is an offshoot of a more foundational movement, the Free Software movement, whose goal is to promote freedom, and whose method is openness. The point of software freedom was technological self-determination, the right of technology users to decide not just what their technology does, but who it does it to and who it does it for:
https://locusmag.com/2022/01/cory-doctorow-science-fiction-is-a-luddite-literature/
The open source split from free software was ostensibly driven by the need to reassure investors and businesspeople so they would join the movement. The "free" in free software is (deliberately) ambiguous, a bit of wordplay that sometimes misleads people into thinking it means "Free as in Beer" when really it means "Free as in Speech" (in Romance languages, these distinctions are captured by translating "free" as "libre" rather than "gratis").
The idea behind open source was to rebrand free software in a less ambiguous – and more instrumental – package that stressed cost-savings and software quality, as well as "ecosystem benefits" from a co-operative form of development that recruited tinkerers, independents, and rivals to contribute to a robust infrastructural commons.
But "open" doesn't merely resolve the linguistic ambiguity of libre vs gratis – it does so by removing the "liberty" from "libre," the "freedom" from "free." "Open" changes the pole-star that movement participants follow as they set their course. Rather than asking "Which course of action makes us more free?" they ask, "Which course of action makes our software better?"
Thus, by dribs and drabs, the freedom leeches out of openness. Today's tech giants have mobilized "open" to create a two-tier system: the largest tech firms enjoy broad freedom themselves – they alone get to decide how their software stack is configured. But for all of us who rely on that (increasingly unavoidable) software stack, all we have is "open": the ability to peer inside that software and see how it works, and perhaps suggest improvements to it:
https://www.youtube.com/watch?v=vBknF2yUZZ8
In the Big Tech internet, it's freedom for them, openness for us. "Openness" – transparency, reusability and extensibility – is valuable, but it shouldn't be mistaken for technological self-determination. As the tech sector becomes ever-more concentrated, the limits of openness become more apparent.
But even by those standards, the openness of "open AI" is thin gruel indeed (that goes triple for the company that calls itself "OpenAI," which is a particularly egregious openwasher).
The paper's authors start by suggesting that the "open" in "open AI" is meant to imply that an "open AI" can be scratch-built by competitors (or even hobbyists), but that this isn't true. Not only is the material that "open AI" companies publish insufficient for reproducing their products, even if those gaps were plugged, the resource burden required to do so is so intense that only the largest companies could do so.
Beyond this, the "open" parts of "open AI" are insufficient for achieving the other claimed benefits of "open AI": they don't promote auditing, or safety, or competition. Indeed, they often cut against these goals.
"Open AI" is a wordgame that exploits the malleability of "open," but also the ambiguity of the term "AI": "a grab bag of approaches, not… a technical term of art, but more … marketing and a signifier of aspirations." Hitching this vague term to "open" creates all kinds of bait-and-switch opportunities.
That's how you get Meta claiming that LLaMa2 is "open source," despite being licensed in a way that is absolutely incompatible with any widely accepted definition of the term:
https://blog.opensource.org/metas-llama-2-license-is-not-open-source/
LLaMa-2 is a particularly egregious openwashing example, but there are plenty of other ways that "open" is misleadingly applied to AI: sometimes it means you can see the source code, sometimes that you can see the training data, and sometimes that you can tune a model, all to different degrees, alone and in combination.
But even the most "open" systems can't be independently replicated, due to raw computing requirements. This isn't the fault of the AI industry – the computational intensity is a fact, not a choice – but when the AI industry claims that "open" will "democratize" AI, they are hiding the ball. People who hear these "democratization" claims (especially policymakers) are thinking about entrepreneurial kids in garages, but unless these kids have access to multi-billion-dollar data centers, they can't be "disruptors" who topple tech giants with cool new ideas. At best, they can hope to pay rent to those giants for access to their compute grids, in order to create products and services at the margin that rely on existing products, rather than displacing them.
The "open" story, with its claims of democratization, is an especially important one in the context of regulation. In Europe, where a variety of AI regulations have been proposed, the AI industry has co-opted the open source movement's hard-won narrative battles about the harms of ill-considered regulation.
For open source (and free software) advocates, many tech regulations aimed at taming large, abusive companies – such as requirements to surveil and control users to extinguish toxic behavior – wreak collateral damage on the free, open, user-centric systems that we see as superior alternatives to Big Tech. This leads to the paradoxical effect of passing regulation to "punish" Big Tech that end up simply shaving an infinitesimal percentage off the giants' profits, while destroying the small co-ops, nonprofits and startups before they can grow to be a viable alternative.
The years-long fight to get regulators to understand this risk has been waged by principled actors working for subsistence nonprofit wages or for free, and now the AI industry is capitalizing on lawmakers' hard-won consideration for collateral damage by claiming to be "open AI" and thus vulnerable to overbroad regulation.
But the "open" projects that lawmakers have been coached to value are precious because they deliver a level playing field, competition, innovation and democratization – all things that "open AI" fails to deliver. The regulations the AI industry is fighting also don't necessarily implicate the speech implications that are core to protecting free software:
https://www.eff.org/deeplinks/2015/04/remembering-case-established-code-speech
Just think about LLaMa-2. You can download it for free, along with the model weights it relies on – but not detailed specs for the data that was used in its training. And the source-code is licensed under a homebrewed license cooked up by Meta's lawyers, a license that only glancingly resembles anything from the Open Source Definition:
https://opensource.org/osd/
Core to Big Tech companies' "open AI" offerings are tools, like Meta's PyTorch and Google's TensorFlow. These tools are indeed "open source," licensed under real OSS terms. But they are designed and maintained by the companies that sponsor them, and optimize for the proprietary back-ends each company offers in its own cloud. When programmers train themselves to develop in these environments, they are gaining expertise in adding value to a monopolist's ecosystem, locking themselves in with their own expertise. This a classic example of software freedom for tech giants and open source for the rest of us.
One way to understand how "open" can produce a lock-in that "free" might prevent is to think of Android: Android is an open platform in the sense that its sourcecode is freely licensed, but the existence of Android doesn't make it any easier to challenge the mobile OS duopoly with a new mobile OS; nor does it make it easier to switch from Android to iOS and vice versa.
Another example: MongoDB, a free/open database tool that was adopted by Amazon, which subsequently forked the codebase and tuning it to work on their proprietary cloud infrastructure.
The value of open tooling as a stickytrap for creating a pool of developers who end up as sharecroppers who are glued to a specific company's closed infrastructure is well-understood and openly acknowledged by "open AI" companies. Zuckerberg boasts about how PyTorch ropes developers into Meta's stack, "when there are opportunities to make integrations with products, [so] it’s much easier to make sure that developers and other folks are compatible with the things that we need in the way that our systems work."
Tooling is a relatively obscure issue, primarily debated by developers. A much broader debate has raged over training data – how it is acquired, labeled, sorted and used. Many of the biggest "open AI" companies are totally opaque when it comes to training data. Google and OpenAI won't even say how many pieces of data went into their models' training – let alone which data they used.
Other "open AI" companies use publicly available datasets like the Pile and CommonCrawl. But you can't replicate their models by shoveling these datasets into an algorithm. Each one has to be groomed – labeled, sorted, de-duplicated, and otherwise filtered. Many "open" models merge these datasets with other, proprietary sets, in varying (and secret) proportions.
Quality filtering and labeling for training data is incredibly expensive and labor-intensive, and involves some of the most exploitative and traumatizing clickwork in the world, as poorly paid workers in the Global South make pennies for reviewing data that includes graphic violence, rape, and gore.
Not only is the product of this "data pipeline" kept a secret by "open" companies, the very nature of the pipeline is likewise cloaked in mystery, in order to obscure the exploitative labor relations it embodies (the joke that "AI" stands for "absent Indians" comes out of the South Asian clickwork industry).
The most common "open" in "open AI" is a model that arrives built and trained, which is "open" in the sense that end-users can "fine-tune" it – usually while running it on the manufacturer's own proprietary cloud hardware, under that company's supervision and surveillance. These tunable models are undocumented blobs, not the rigorously peer-reviewed transparent tools celebrated by the open source movement.
If "open" was a way to transform "free software" from an ethical proposition to an efficient methodology for developing high-quality software; then "open AI" is a way to transform "open source" into a rent-extracting black box.
Some "open AI" has slipped out of the corporate silo. Meta's LLaMa was leaked by early testers, republished on 4chan, and is now in the wild. Some exciting stuff has emerged from this, but despite this work happening outside of Meta's control, it is not without benefits to Meta. As an infamous leaked Google memo explains:
Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.
https://www.searchenginejournal.com/leaked-google-memo-admits-defeat-by-open-source-ai/486290/
Thus, "open AI" is best understood as "as free product development" for large, well-capitalized AI companies, conducted by tinkerers who will not be able to escape these giants' proprietary compute silos and opaque training corpuses, and whose work product is guaranteed to be compatible with the giants' own systems.
The instrumental story about the virtues of "open" often invoke auditability: the fact that anyone can look at the source code makes it easier for bugs to be identified. But as open source projects have learned the hard way, the fact that anyone can audit your widely used, high-stakes code doesn't mean that anyone will.
The Heartbleed vulnerability in OpenSSL was a wake-up call for the open source movement – a bug that endangered every secure webserver connection in the world, which had hidden in plain sight for years. The result was an admirable and successful effort to build institutions whose job it is to actually make use of open source transparency to conduct regular, deep, systemic audits.
In other words, "open" is a necessary, but insufficient, precondition for auditing. But when the "open AI" movement touts its "safety" thanks to its "auditability," it fails to describe any steps it is taking to replicate these auditing institutions – how they'll be constituted, funded and directed. The story starts and ends with "transparency" and then makes the unjustifiable leap to "safety," without any intermediate steps about how the one will turn into the other.
It's a Magic Underpants Gnome story, in other words:
Step One: Transparency
Step Two: ??
Step Three: Safety
https://www.youtube.com/watch?v=a5ih_TQWqCA
Meanwhile, OpenAI itself has gone on record as objecting to "burdensome mechanisms like licenses or audits" as an impediment to "innovation" – all the while arguing that these "burdensome mechanisms" should be mandatory for rival offerings that are more advanced than its own. To call this a "transparent ruse" is to do violence to good, hardworking transparent ruses all the world over:
https://openai.com/blog/governance-of-superintelligence
Some "open AI" is much more open than the industry dominating offerings. There's EleutherAI, a donor-supported nonprofit whose model comes with documentation and code, licensed Apache 2.0. There are also some smaller academic offerings: Vicuna (UCSD/CMU/Berkeley); Koala (Berkeley) and Alpaca (Stanford).
These are indeed more open (though Alpaca – which ran on a laptop – had to be withdrawn because it "hallucinated" so profusely). But to the extent that the "open AI" movement invokes (or cares about) these projects, it is in order to brandish them before hostile policymakers and say, "Won't someone please think of the academics?" These are the poster children for proposals like exempting AI from antitrust enforcement, but they're not significant players in the "open AI" industry, nor are they likely to be for so long as the largest companies are running the show:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4493900
Tumblr media Tumblr media
I'm kickstarting the audiobook for "The Internet Con: How To Seize the Means of Computation," a Big Tech disassembly manual to disenshittify the web and make a new, good internet to succeed the old, good internet. It's a DRM-free book, which means Audible won't carry it, so this crowdfunder is essential. Back now to get the audio, Verso hardcover and ebook:
http://seizethemeansofcomputation.org
Tumblr media
If you'd like an essay-formatted version of this post to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
https://pluralistic.net/2023/08/18/openwashing/#you-keep-using-that-word-i-do-not-think-it-means-what-you-think-it-means
Tumblr media
Image: Cryteria (modified) https://commons.wikimedia.org/wiki/File:HAL9000.svg
CC BY 3.0 https://creativecommons.org/licenses/by/3.0/deed.en
253 notes · View notes
unitedfrontvarietyhour · 5 months ago
Text
Tumblr media Tumblr media Tumblr media
Open Source is Communism, and DeepSeek just wiped out $1.2 trillion on Wall Street.
Long live the PRC.
29 notes · View notes