#Twitter data scraper
Explore tagged Tumblr posts
Text
Twitter Data Scraping Services -Twitter Data Collection Services
Scrape data like profile handle, followers count, etc., using our Twitter data scraping services. Our Twitter data collection services are functional across the USA, UK, etc.
know more:
#Twitter Data Scraping Services#Twitter Data Collection Services#Scrape Data from Twitter Social Media App#Web scraping Twitter data#Twitter data scraper#Scrape Tweets data from Twitter
0 notes
Text
I said this elsewhere but
not to be That Guy but I don't really see the point of moving platforms anymore.
There is no where we can hide on the internet from the silicon valley bros. There just isn't. Patreon is VC-funded and could announce tomorrow that oh of course they've been partnered with Midjourney for months already. Twitter actively scraps everything for AI learning. And even if you trusted the other big players like FB/IG to tell the truth about shit, people are going to use these platforms for datasets anyway. They'll just do it quietly and hope no one notices.
And places like cohost or whatever-- honestly, if it makes you feel safer/better, go for it, but I don't think cohost has the sway or capital to build the type of legal team you need to fight against scrapers. Hell, you wanna retreat into private discords? Discord wants in on AI too.
Everyone big is already dealing in AI, and everyone small doesn't even have a seat at the table. In my opinion, we are all collectively holding out for Brussels or any of the many court cases to do something about this shit, because it's no longer a thing we can just hide from.
I'm going to keep my writing on the AO3 because they are the odd case of having an actual legal team in place for this shit. For artists, I have nothing but sympathy. I suggest glazing and nightshading literally everything you post.
But beyond that, I'm unsure what we can do. This is a matter for legislation. Silicon Valley doesn't care if we all go to cohost, and even less scrupulous data-crawlers will just grab our shit from there too.
So I'll be here.
3K notes
·
View notes
Text
ShadowDragon sells a tool called SocialNet that streamlines the process of pulling public data from various sites, apps, and services. Marketing material available online says SocialNet can “follow the breadcrumbs of your target’s digital life and find hidden correlations in your research.” In one promotional video, ShadowDragon says users can enter “an email, an alias, a name, a phone number, a variety of different things, and immediately have information on your target. We can see interests, we can see who friends are, pictures, videos.”
The leaked list of targeted sites include ones from major tech companies, communication tools, sites focused around certain hobbies and interests, payment services, social networks, and more. The 30 companies the Mozilla Foundation is asking to block ShadowDragon scrapers are Amazon, Apple, BabyCentre, BlueSky, Discord, Duolingo, Etsy, Meta’s Facebook and Instagram, FlightAware, Github, Glassdoor, GoFundMe, Google, LinkedIn, Nextdoor, OnlyFans, Pinterest, Reddit, Snapchat, Strava, Substack, TikTok, Tinder, TripAdvisor, Twitch, Twitter, WhatsApp, Xbox, Yelp, and YouTube.
437 notes
·
View notes
Text
year in review - hockey rpf on ao3
hello!! the annual ao3 year in review had some friends and i thinking - wouldn't it be cool if we had a hockey rpf specific version of that. so i went ahead and collated the data below!!
i start with a broad overview, then dive deeper into the 3 most popular ships this year (with one bonus!)
if any images appear blurry, click on them to expand and they should become clear!
₊˚⊹♡ . ݁₊ ⊹ . ݁˖ . ݁𐙚 ‧₊˚ ⋅. ݁
before we jump in, some key things to highlight: - CREDIT TO: the webscraping part of my code heavily utilized the ao3 wrapped google colab code, as lovingly created by @kyucultures on twitter, as the main skeleton. i tweaked a couple of things but having it as a reference saved me a LOT of time and effort as a first time web scraper!!! thank you stranger <3 - please do NOT, under ANY circumstances, share any part of this collation on any other website. please do not screenshot or repost to twitter, tiktok, or any other public social platform. thank u!!! T_T - but do feel free to send requests to my inbox! if you want more info on a specific ship, tag, or you have a cool idea or wanna see a correlation between two variables, reach out and i should be able to take a look. if you want to take a deeper dive into a specific trope not mentioned here/chapter count/word counts/fic tags/ship tags/ratings/etc, shoot me an ask!
˚ . ˚ . . ✦ ˚ . ★⋆. ࿐࿔
with that all said and done... let's dive into hockey_rpf_2024_wrapped_insanity.ipynb
BIG PICTURE OVERVIEW
i scraped a total of 4266 fanfics that dated themselves as published or finished in the year 2024. of these 4000 odd fanfics, the most popular ships were:
Note: "Minor or Background Relationship(s)" clocked in at #9 with 91 fics, but I removed it as it was always a secondary tag and added no information to the chart. I did not discern between primary ship and secondary ship(s) either!
breaking down the 5 most popular ships over the course of the year, we see:
super interesting to see that HUGE jump for mattdrai in june/july for the stanley cup final. the general lull in the offseason is cool to see as well.
as for the most popular tags in all 2024 hockey rpf fic...
weee like our fluff. and our established relationships. and a little H/C never hurt no one.
i got curious here about which AUs were the most popular, so i filtered down for that. note that i only regex'd for tags that specifically start with "Alternate Universe - ", so A/B/O and some other stuff won't appear here!
idk it was cool to me.
also, here's a quick breakdown of the ratings % for works this year:
and as for the word counts, i pulled up a box plot of the top 20 most popular ships to see how the fic length distribution differed amongst ships:
mattdrai-ers you have some DEDICATION omg. respect
now for the ship by ship break down!!
₊ . ݁ ݁ . ⊹ ࣪ ˖͙͘͡★ ⊹ .
#1 MATTDRAI
most popular ship this year. peaked in june/july with the scf. so what do u people like to write about?
fun fun fun. i love that the scf is tagged there like yes actually she is also a main character
₊ . ݁ ݁ . ⊹ ࣪ ˖͙͘͡★ ⊹ .
#2 SIDGENO
(my babies) top tags for this ship are:
folks, we are a/b/o fiends and we cannot lie. thank you to all the selfless authors for feeding us good a/b/o fic this year. i hope to join your ranks soon.
(also: MPREG. omega sidney crosby. alpha geno. listen, the people have spoken, and like, i am listening.)
₊ . ݁ ݁ . ⊹ ࣪ ˖͙͘͡★ ⊹ .
#3 NICOJACK
top tags!!
it seems nice and cozy over there... room for one more?
₊ . ݁ ݁ . ⊹ ࣪ ˖͙͘͡★ ⊹ .
BONUS: JDTZ.
i wasnt gonna plot this but @marcandreyuri asked me if i could take a look and the results are so compelling i must include it. are yall ok. do u need a hug
top tags being h/c, angst, angst, TRADES, pining, open endings... T_T katie said its a "torture vortex" and i must concurr
₊ . ݁ ݁ . ⊹ ࣪ ˖͙͘͡★ ⊹ .
BONUS BONUS: ALPHA/BETA/OMEGA
as an a/b/o enthusiast myself i got curious as to what the most popular ships were within that tag. if you want me to take a look about this for any other tag lmk, but for a/b/o, as expected, SID GENO ON TOP BABY!:
thats all for now!!! if you have anything else you are interested in seeing the data for, send me an ask and i'll see if i can get it to ya!
#fanfic#sidgeno#evgeni malkin#hockey rpf#sidney crosby/evgeni malkin#hockeyrpf#hrpf fic#sidgeno fic#sidney crosby#hockeyrpf wrapped 2024#leon draisaitl#matthew tkachuk#mattdrai#leon draisaitl/matthew tkachuk#nicojack#nico hischier#nico hischier/jack hughes#jack hughes#jamie drysdale#trevor zegras#jdtz#jamie drysdale/trevor zegras#pittsburgh penguins#edmonton oilers#florida panthers#new jersey devils
473 notes
·
View notes
Text
Midnight Pals: Hackin'
King: i can't believe elon's grok is pretending i'm friends with him King: i need to stop that AI before everyone believes it! King: i've got to hire a hacker King: franz, you've got to help me Franz Kafka: what? me? Barker: steve, no
Kafka: i'm not a hacker King: oh i thought franz was a hacker Barker: what gave you THAT impression? King: you know, with the cat ear headphones and the striped thigh socks Barker: no steve that's something ENTIRELY different Kafka: n-no it isn't, on second thought yes I'm totally a hacker
Kafka: it means i'm a hacker, nothing else Barker: sure franz Kafka: it does! it totally means i'm a hacker! Barker: franz, go play with your blahaj plush, the adults are talking here
Barker: you know who you need? you need william gibson Barker: the best hacker money can buy King: william gibson? how do i contact him? Barker: you don't Barker: he'll contact you
King: can you really hack grok, william? William Gibson: [wearing black duster and fingerless black gloves] my hacker name is shadow gigabyte King: oh sorry Gibson: can i hack grok? listen kid i was cyberbyting the megabyte mainframe when you were just rebooting your motherboard mouse data bandwidth modem email King: wow!
Gibson: my CPU is a neural net processer, a learning computer King: wow he really sounds like he knows what he's talking about! King: that definitely sounds like hacker talk to me Gibson: CD Rom Gibson: internet Joe Hill: dad can i talk to you for a second King: not now joe daddy's hiring a hacker
Gibson: [wildly slapping keyboard] i'll re-index the mega bit blaster cyber codex Gibson: [wildly slapping keyboard] now we'll cybersecurity the lock box data center King: hey what happens if you push that button? Gibson: what the-- no!! [klaxons sound] King: what's that mean? Gibson: shit Gibson: we've got company
Gibson: sentient cyber virus electronic guard cyberbots Gibson: real high tech Gibson: state of the art in bio-tech wetware neural-data scrapers Gibson: [putting on sunglasses with red laser scope] and they ain't friendly
King: what are we going to do?! Gibson: kid, you keep your hands to yourself unless you wanna become roadkill on the information super highway!!! Gibson: hold on to your CPU (central processing unit)!!!
Gibson: [wildly slapping keyboard] gotta reconfigure the darkweb logistics for ethernet wavetech Gibson: [wildly slapping keyboard] upload the memory downloader for dumpware backup Gibson: [wildly slapping keyboard] uncodify the cyberpatch modifer aaaaand Gibson: i'm in
King: wow, you hacked twitter?? how did you do it? Gibson: the greatest hackers never reveal their secrets [earlier] Gibson: [wearing fake mustache] hey elon its me catturd Gibson: could you give me your password? Elon Musk: sure it's "picklerick420"!
#midnight pals#the midnight society#midnight society#stephen king#clive barker#franz kafka#joe hill#william gibson#elon musk
537 notes
·
View notes
Text
Tumblr, in my estimation, cannot be a place that is profitable, because the aims of the userbase can be described somewhere between fuck around and fuck off. No one comes here for anything other than shitposting. Companies don't try to find your tumblr, to my knowledge, so it's the last safe place on the internet to just say stupid shit and learn from it, instead of becoming unemployable.
Tumblr would be a really good buy for like, Archive.org. Someone who doesn't have to worry about 'profit', they just have to keep the lights on. Do moderation by roundtable, when someone submits a support request, like "I'm being harassed", the proof they provide is sent to ten random active bloggers(unrelated to any involved parties), and their decision is actionable. Provide the tools for self determination, instead of a black box that doesn't seem to be working for anyone. It's cheaper, and fairer to the community in general. I dunno, it's probably not perfect, but it's better than 'just doing nothing until the person who keep complaining gives us a reason to ban them, that makes the problem go away'. Honestly, Matt whatever should just donate tumblr to them, call it a charitable donation, claim it on his taxes. It's a sinking ship every minute you're trying to extract value from it. It's real current value is almost certainly hovering around "the change in people's couches of like 20 households".
Tumblr has a place on the internet, an important place, but it'll never be a need that is profitable. And Tumblr's history and reputation kind of prevent it from ever being changed into something different that would be as profitable as whoever currently owns it would want. I suppose you could burn it all to the ground, wipe the servers and start a twitter clone. But it'll just be one more on a field that's so oversaturated it's not worth trying. I'm not sure why people keep buying tumblr, it's a fantastic creative community, but it's products can't be sold, and the userbase is poor and has little to no interest in paying for 'upgrades'. So you could sell everything to AI scrapers, or data miners, but you'll lose the entire userbase and no one's gonna come in to fill the gaps left. It's a quick and messy death.
137 notes
·
View notes
Text
To everyone, but especially artists : Instagram has fully leaned into the AI craze and has now scraped every existing account to train their new Meta AI.
In response, artists are migrating en masse to this new social network called Cara.app. It’s a mix of Instagram/Bluesky/Twitter, and is focused on artists with a complete ban on AI generated content. They also try as much as possible to block scrapers to use the site’s data, and have a partnership with glaze that allows you to protect your art posts.
You can also choose what you see in your feed based on percentages of following/recommended content, and so far it has been incredible for discovering new artists !
Like many, I am trying it out right now. I really enjoy the community, and hope this great project keeps on living !
Join me there, at https://cara.app/tapiocats 💕
I'm not leaving Tumblr btw !
Just trying out an Instagram alternative 😄
43 notes
·
View notes
Text
pulling out a section from this post (a very basic breakdown of generative AI) for easier reading;
AO3 and Generative AI
There are unfortunately some massive misunderstandings in regards to AO3 being included in LLM training datasets. This post was semi-prompted by the ‘Knot in my name’ AO3 tag (for those of you who haven’t heard of it, it’s supposed to be a fandom anti-AI event where AO3 writers help “further pollute” AI with Omegaverse), so let’s take a moment to address AO3 in conjunction with AI. We’ll start with the biggest misconception:
1. AO3 wasn’t used to train generative AI.
Or at least not anymore than any other internet website. AO3 was not deliberately scraped to be used as LLM training data.
The AO3 moderators found traces of the Common Crawl web worm in their servers. The Common Crawl is an open data repository of raw web page data, metadata extracts and text extracts collected from 10+ years of web crawling. Its collective data is measured in petabytes. (As a note, it also only features samples of the available pages on a given domain in its datasets, because its data is freely released under fair use and this is part of how they navigate copyright.) LLM developers use it and similar web crawls like Google’s C4 to bulk up the overall amount of pre-training data.
AO3 is big to an individual user, but it’s actually a small website when it comes to the amount of data used to pre-train LLMs. It’s also just a bad candidate for training data. As a comparison example, Wikipedia is often used as high quality training data because it’s a knowledge corpus and its moderators put a lot of work into maintaining a consistent quality across its web pages. AO3 is just a repository for all fanfic -- it doesn’t have any of that quality maintenance nor any knowledge density. Just in terms of practicality, even if people could get around the copyright issues, the sheer amount of work that would go into curating and labeling AO3’s data (or even a part of it) to make it useful for the fine-tuning stages most likely outstrips any potential usage.
Speaking of copyright, AO3 is a terrible candidate for training data just based on that. Even if people (incorrectly) think fanfic doesn’t hold copyright, there are plenty of books and texts that are public domain that can be found in online libraries that make for much better training data (or rather, there is a higher consistency in quality for them that would make them more appealing than fic for people specifically targeting written story data). And for any scrapers who don’t care about legalities or copyright, they’re going to target published works instead. Meta is in fact currently getting sued for including published books from a shadow library in its training data (note, this case is not in regards to any copyrighted material that might’ve been caught in the Common Crawl data, its regarding a book repository of published books that was scraped specifically to bring in some higher quality data for the first training stage). In a similar case, there’s an anonymous group suing Microsoft, GitHub, and OpenAI for training their LLMs on open source code.
Getting back to my point, AO3 is just not desirable training data. It’s not big enough to be worth scraping for pre-training data, it’s not curated enough to be considered for high quality data, and its data comes with copyright issues to boot. If LLM creators are saying there was no active pursuit in using AO3 to train generative AI, then there was (99% likelihood) no active pursuit in using AO3 to train generative AI.
AO3 has some preventative measures against being included in future Common Crawl datasets, which may or may not work, but there’s no way to remove any previously scraped data from that data corpus. And as a note for anyone locking their AO3 fics: that might potentially help against future AO3 scrapes, but it is rather moot if you post the same fic in full to other platforms like ffn, twitter, tumblr, etc. that have zero preventative measures against data scraping.
2. A/B/O is not polluting generative AI
…I’m going to be real, I have no idea what people expected to prove by asking AI to write Omegaverse fic. At the very least, people know A/B/O fics are not exclusive to AO3, right? The genre isn’t even exclusive to fandom -- it started in fandom, sure, but it expanded to general erotica years ago. It’s all over social media. It has multiple Wikipedia pages.
More to the point though, omegaverse would only be “polluting” AI if LLMs were spewing omegaverse concepts unprompted or like…associated knots with dicks more than rope or something. But people asking AI to write omegaverse and AI then writing omegaverse for them is just AI giving people exactly what they asked for. And…I hate to point this out, but LLMs writing for a niche the LLM trainers didn’t deliberately train the LLMs on is generally considered to be a good thing to the people who develop LLMs. The capability to fill niches developers didn’t even know existed increases LLMs’ marketability. If I were a betting man, what fandom probably saw as a GOTCHA moment, AI people probably saw as a good sign of LLMs’ future potential.
3. Individuals cannot affect LLM training datasets.
So back to the fandom event, with the stated goal of sabotaging AI scrapers via omegaverse fic.
…It’s not going to do anything.
Let’s add some numbers to this to help put things into perspective:
LLaMA’s 65 billion parameter model was trained on 1.4 trillion tokens. Of that 1.4 trillion tokens, about 67% of the training data was from the Common Crawl (roughly ~3 terabytes of data).
3 terabytes is 3,000,000,000 kilobytes.
That’s 3 billion kilobytes.
According to a news article I saw, there has been ~450k words total published for this campaign (*this was while it was going on, that number has probably changed, but you’re about to see why that still doesn’t matter). So, roughly speaking, ~450k of text is ~1012 KB (I’m going off the document size of a plain text doc for a fic whose word count is ~440k).
So 1,012 out of 3,000,000,000.
Aka 0.000034%.
And that 0.000034% of 3 billion kilobytes is only 2/3s of the data for the first stage of training.
And not to beat a dead horse, but 0.000034% is still grossly overestimating the potential impact of posting A/B/O fic. Remember, only parts of AO3 would get scraped for Common Crawl datasets. Which are also huge! The October 2022 Common Crawl dataset is 380 tebibytes. The April 2021 dataset is 320 tebibytes. The 3 terabytes of Common Crawl data used to train LLaMA was randomly selected data that totaled to less than 1% of one full dataset. Not to mention, LLaMA’s training dataset is currently on the (much) larger size as compared to most LLM training datasets.
I also feel the need to point out again that AO3 is trying to prevent any Common Crawl scraping in the future, which would include protection for these new stories (several of which are also locked!).
Omegaverse just isn’t going to do anything to AI. Individual fics are going to do even less. Even if all of AO3 suddenly became omegaverse, it’s just not prominent enough to influence anything in regards to LLMs. You cannot affect training datasets in any meaningful way doing this. And while this might seem really disappointing, this is actually a good thing.
Remember that anything an individual can do to LLMs, the person you hate most can do the same. If it were possible for fandom to corrupt AI with omegaverse, fascists, bigots, and just straight up internet trolls could pollute it with hate speech and worse. AI already carries a lot of biases even while developers are actively trying to flatten that out, it’s good that organized groups can’t corrupt that deliberately.
#generative ai#pulling this out wasnt really prompted by anything specific#so much as heard some repeated misconceptions and just#sighs#nope#incorrect#u got it wrong#sorry#unfortunately for me: no consistent tag to block#sigh#ao3
101 notes
·
View notes
Text
hey! please remove your adblocker. please pay for premium. please pay extra to remove ads. please agree to these new and updated terms of service. sorry, it won't work until you agree. please disable your tracker prevention. please allow cookies. please allow cookies. did you mean allow third party vendors? it's against our terms of service to use adblocker. we saw you googled someone, do you want to follow them on twitter? please agree to our new terms of service. if you want to opt out of our ai, please navigate through four layer deep sub-menus to find the toggle. our new terms of service work retroactively. sorry, you didn't opt out before we activated our scraper. you agreed to our terms of service. want to delete your data? please buy our data protection and deletion service. please remove your adblocker. please disable your tracker prevention. please give us your payment data. subscribe now, only 300 a year with ads! please pay extra to remove ads. please pay us to continue using this free tool. it's just inflation. please don't use a vpn, it's against our terms of service. if you must use a vpn, please pay for our vpn and antivirus services. please enter your payment data to make use of this free trial. please remove your adblocker. please request your data before deactivating your account. processing your request may take up to 120 business years. it is currently not possible to remove your payment data from our service. it is against our terms of service to do that.
AGREE maybe later
9 notes
·
View notes
Text
While the finer points of running a social media business can be debated, one basic truth is that they all run on attention. Tech leaders are incentivized to grow their user bases so there are more people looking at more ads for more time. It’s just good business.
As the owner of Twitter, Elon Musk presumably shared that goal. But he claimed he hadn’t bought Twitter to make money. This freed him up to focus on other passions: stopping rival tech companies from scraping Twitter’s data without permission—even if it meant losing eyeballs on ads.
Data-scraping was a known problem at Twitter. “Scraping was the open secret of Twitter data access. We knew about it. It was fine,” Yoel Roth wrote on the Twitter alternative Bluesky. AI firms in particular were notorious for gobbling up huge swaths of text to train large language models. Now that those firms were worth a lot of money, the situation was far from fine, in Musk’s opinion.
In November 2022, OpenAI debuted ChatGPT, a chatbot that could generate convincingly human text. By January 2023, the app had over 100 million users, making it the fastest growing consumer app of all time. Three months later, OpenAI secured another round of funding that closed at an astounding valuation of $29 billion, more than Twitter was worth, by Musk’s estimation.
OpenAI was a sore subject for Musk, who’d been one of the original founders and a major donor before stepping down in 2018 over disagreements with the other founders. After ChatGPT launched, Musk made no secret of the fact that he disagreed with the guardrails that OpenAI put on the chatbot to stop it from relaying dangerous or insensitive information. “The danger of training AI to be woke—in other words, lie—is deadly,” Musk said on December 16, 2022. He was toying with starting a competitor.
Near the end of June 2023, Musk launched a two-part offensive to stop data scrapers, first directing Twitter employees to temporarily block “logged out view.” The change would mean that only people with Twitter accounts could view tweets.
“Logged out view” had a complicated history at Twitter. It was rumored to have played a part in the Arab Spring, allowing dissidents to view tweets without having to create a Twitter account and risk compromising their anonymity. But it was also an easy access point for people who wanted to scrape Twitter data.
Once Twitter made the change, Google was temporarily blocked from crawling Twitter and serving up relevant tweets in search results—a move that could negatively impact Twitter’s traffic. “We’re aware that our ability to crawl Twitter.com has been limited, affecting our ability to display tweets and pages from the site in search results,” Google spokesperson Lara Levin told The Verge. “Websites have control over whether crawlers can access their content.” As engineers discussed possible workarounds on Slack, one wrote: “Surely this was expected when that decision was made?”
Then engineers detected an “explosion of logged in requests,” according to internal Slack messages, indicating that data scrapers had simply logged in to Twitter to continue scraping. Musk ordered the change to be reversed.
On July 1, 2023, Musk launched part two of the offensive. Suddenly, if a user scrolled for just a few minutes, an error message popped up. “Sorry, you are rate limited,” the message read. “Please wait a few moments then try again.”
Rate limiting is a strategy that tech companies use to constrain network traffic by putting a cap on the number of times a user can perform a specific action within a given time frame (a mouthful, I know). It’s often used to stop bad actors from trying to hack into people’s accounts. If a user tries an incorrect password too many times, they see an error message and are told to come back later. The cost of doing this to someone who has forgotten their password is low (most people stay logged in), while the benefit to users is very high (it prevents many people’s accounts from getting compromised).
Except, that wasn’t what Musk had done. The rate limit that he ordered Twitter to roll out on July 1 was an API limit, meaning Twitter had capped the number of times users could refresh Twitter to look for new tweets and see ads. Rather than constrain users from performing a specific action, Twitter had limited all user actions. “I realize these are draconian rules,” a Twitter engineer wrote on Slack. “They are temporary. We will reevaluate the situation tomorrow.”
At first, Blue subscribers could see 6,000 posts a day, while nonsubscribers could see 600 (enough for just a few minutes of scrolling), and new nonsubscriber accounts could see just 300. As people started hitting the limits, #TwitterDown started trending on, well, Twitter. “This sucks dude you gotta 10X each of these numbers,” wrote user @tszzl.
The impact quickly became obvious. Companies that used Twitter direct messages as a customer service tool were unable to communicate with clients. Major creators were blocked from promoting tweets, putting Musk’s wish to stop data scrapers at odds with his initiative to make Twitter more creator friendly. And Twitter’s own trust and safety team was suddenly stopped from seeing violative tweets.
Engineers posted frantic updates in Slack. “FYI some large creators complaining because rate limit affecting paid subscription posts,” one said.
Christopher Stanley, the head of information security, wrote with dismay that rate limits could apply to people refreshing the app to get news about a mass shooting or a major weather event. “The idea here is to stop scrapers, not prevent people from obtaining safety information,” he wrote. Twitter soon raised the limits to 10,000 (for Blue subscribers), 1,000 (for nonsubscribers), and 500 (for new nonsubscribers). Now, 13 percent of all unverified users were hitting the rate limit.
Users were outraged. If Musk wanted to stop scrapers, surely there were better ways than just cutting off access to the service for everyone on Twitter.
“Musk has destroyed Twitter’s value & worth,” wrote attorney Mark S. Zaid. “Hubris + no pushback—customer empathy—data = a great way to light billions on fire,” wrote former Twitter product manager Esther Crawford, her loyalties finally reversed.
Musk retweeted a joke from a parody account: “The reason I set a ‘View Limit’ is because we are all Twitter addicts and need to go outside.”
Aside from Musk, the one person who seemed genuinely excited about the changes was Evan Jones, a product manager on Twitter Blue. For months, he’d been sending executives updates regarding the anemic signup rates. Now, Blue subscriptions were skyrocketing. In May, Twitter had 535,000 Blue subscribers. At $8 per month, this was about $4.2 million a month in subscription revenue. By early July, there were 829,391 subscribers—a jump of about $2.4 million in revenue, not accounting for App Store fees.
“Blue signups still cookin,” he wrote on Slack above a screenshot of the signup dashboard.
Jones’s team capitalized on the moment, rolling out a prompt to upsell users who’d hit the rate limit and encouraging them to subscribe to Twitter Blue. In July, this prompt drove 1.7 percent of the Blue subscriptions from accounts that were more than 30 days old and 17 percent of the Blue subscriptions from accounts that were less than 30 days old.
Twitter CEO Linda Yaccarino was notably absent from the conversation until July 4, when she shared a Twitter blog post addressing the rate limiting fiasco, perhaps deliberately burying the news on a national holiday.
“To ensure the authenticity of our user base we must take extreme measures to remove spam and bots from our platform,” it read. “That’s why we temporarily limited usage so we could detect and eliminate bots and other bad actors that are harming the platform. Any advance notice on these actions would have allowed bad actors to alter their behavior to evade detection.” The company also claimed the “effects on advertising have been minimal.”
If Yaccarino’s role was to cover for Musk’s antics, she was doing an excellent job. Twitter rolled back the limits shortly after her announcement. On July 12, Musk debuted a generative AI company called xAI, which he promised would develop a language model that wouldn’t be politically correct. “I think our AI can give answers that people may find controversial even though they are actually true,” he said on Twitter Spaces.
Unlike the rival AI firms he was trying to block, Musk said xAI would likely train on Twitter’s data.
“The goal of xAI is to understand the true nature of the universe,” the company said grandly in its mission statement, echoing Musk’s first, disastrous town hall at Twitter. “We will share more information over the next couple of weeks and months.”
In November 2023, xAI launched a chatbot called Grok that lacked the guardrails of tools like ChatGPT. Musk hyped the release by posting a screenshot of the chatbot giving him a recipe for cocaine. The company didn’t appear close to understanding the nature of the universe, but per haps that’s coming.
Excerpt adapted from Extremely Hardcore: Inside Elon Musk’s Twitter by Zoë Schiffer. Published by arrangement with Portfolio Books, a division of Penguin Random House LLC. Copyright © 2024 by Zoë Schiffer.
20 notes
·
View notes
Quote
In recent months, the signs and portents have been accumulating with increasing speed. Google is trying to kill the 10 blue links. Twitter is being abandoned to bots and blue ticks. There’s the junkification of Amazon and the enshittification of TikTok. Layoffs are gutting online media. A job posting looking for an “AI editor” expects “output of 200 to 250 articles per week.” ChatGPT is being used to generate whole spam sites. Etsy is flooded with “AI-generated junk.” Chatbots cite one another in a misinformation ouroboros. LinkedIn is using AI to stimulate tired users. Snapchat and Instagram hope bots will talk to you when your friends don’t. Redditors are staging blackouts. Stack Overflow mods are on strike. The Internet Archive is fighting off data scrapers, and “AI is tearing Wikipedia apart.” The old web is dying, and the new web struggles to be born.
AI is killing the old web, and the new web struggles to be born
67 notes
·
View notes
Text
I linked this as a gift so hopefully people can read it. Below is the part about AO3.
At Archive of Our Own, a fan fiction database with more than 11 million stories, writers have increasingly pressured the site to ban data-scraping and A.I.-generated stories.
In May, when some Twitter accounts shared examples of ChatGPT mimicking the style of popular fan fiction posted on Archive of Our Own, dozens of writers rose up in arms. They blocked their stories and wrote subversive content to mislead the A.I. scrapers. They also pushed Archive of Our Own’s leaders to stop allowing A.I.-generated content.
Betsy Rosenblatt, who provides legal advice to Archive of Our Own and is a professor at University of Tulsa College of Law, said the site had a policy of “maximum inclusivity” and did not want to be in the position of discerning which stories were written with A.I.
11 notes
·
View notes
Text
Haunted.
Fuck you with the data mining. Left twitter for this wreck of a haven just to be met with more shit. May this greedy fucking site burn for selling our data and ass kissing fucking scrapers. Rot you bitches!!!
2 notes
·
View notes
Text
tumblr never fixing their search functionality while twitter and reddit are struggling to combat data scrapers looking to train language models has to be the most "won by doing nothing" yet in the new social media wars
5 notes
·
View notes
Text
I want everyone here to know that musk just made Twitter literally unusable. And not as in he just added more horrible features, no, I mean that Twitter now functions as well as it does when your phone has little to no service

This is temporary due to the system overloading from data scrapers, but I hope it’s forever. Elon Musk might’ve just accidentally done the first good thing in his life by being so bad at his job that he killed Twitter
5 notes
·
View notes
Text
Please artists, GLAZE AND NIGHTSHADE your cropped images too! Please make this so so so funny. Glaze and Nightshade your art anyway but poison the twitter ai scrapers data on your way out.



There are plenty of reasons not to use twitter any more but for artists, handing over the rights to twitter to train AI with it should be the final straw. So here is another solution.
Also, tools like glaze which poison images against AI are really helpful but they take a long time to process each image so consider just doing a cropped section of the image if you want to share it to twitter.
5K notes
·
View notes