#blocking ai
dawnfelagund · 1 year
How to Block AI Bots from Scraping Your Website
The Silmarillion Writers' Guild just recently opened its draft AI policy for comment, and one thing people wanted was for us, if possible, to block AI bots from scraping the SWG website. Twelve hours ago, I had no idea if it was possible! But I spent a few hours today researching the subject, and the SWG site is now much more locked down against AI bots than it was this time yesterday.
I know I am not the only person with a website or blog or portfolio online that doesn't want their content being used to train AI. So I thought I'd put together what I learned today in hopes that it might help others.
First, two important points:
I am not an IT professional. I am a middle-school humanities teacher with degrees in psychology, teaching, and humanities. I'm self-taught where building and maintaining websites is concerned. In other words, I'm not an expert but simply passing on what I learned during my research today.
On that note, I can't help with troubleshooting on your own site or project. I wouldn't even have been able to do everything here on my own for the SWG, but thankfully my co-admin Russandol has much more tech knowledge than me and picked up where I got lost.
Step 1: Block AI Bots Using Robots.txt
If you don't even know what this is, start here:
About /robots.txt
How to write and submit a robots.txt file
If you know how to find (or create) the robots.txt file for your website, you're going to add the following lines of code to the file. (Source: DataDome, How ChatGPT & OpenAI Might Use Your Content, Now & in the Future)
User-agent: CCBot Disallow: /
User-agent: ChatGPT-User Disallow: /
Step Two: Add HTTPS Headers/Meta Tags
Unfortunately, not all bots respond to robots.txt. Img2dataset is one that recently gained some notoriety when a site owner posted in its issue queue after the bot brought his site down, asking that the bot be opt-in or at least respect robots.txt. He received a rather rude reply from the img2dataset developer. It's covered in Vice's An AI Scraping Tool Is Overwhelming Websites with Traffic.
Img2dataset requires a header tag to keep it away. (Not surprisingly, this is often a more complicated task than updating a robots.txt file. I don't think that's accidental. This is where I got stuck today in working on my Drupal site.) The header tags are "noai" and "noimageai." These function like the more familiar "noindex" and "nofollow" meta tags. When Russa and I were researching this today, we did not find a lot of information on "noai" or "noimageai," so I suspect they are very new. We used the procedure for adding "noindex" or "nofollow" and swapped in "noai" and "noimageai," and it worked for us.
Header meta tags are the same strategy DeviantArt is using to allow artists to opt out of AI scraping; artist Aimee Cozza has more in What Is DeviantArt's New "noai" and "noimageai" Meta Tag and How to Install It. Aimee's blog also has directions for how to use this strategy on WordPress, SquareSpace, Weebly, and Wix sites.
In my research today, I discovered that some webhosts provide tools for adding this code to your header through a form on the site. Check your host's knowledge base to see if you have that option.
You can also use .htaccess or add the tag directly into the HTML in the <head> section. .htaccess makes sense if you want to use the "noai" and "noimageai" tag across your entire site. The HTML solution makes sense if you want to exclude AI crawlers from specific pages.
Here are some resources on how to do this for "noindex" and "nofollow"; just swap in "noai" and "noimageai":
HubSpot, Using Noindex, Nofollow HTML Metatags: How to Tell Google Not to Index a Page in Search (very comprehensive and covers both the .htaccess and HTML solutions)
Google Search Documentation, Block Search Indexing with noindex (both .htaccess and HTML)
AngryStudio, Add noindex and nofollow to Whole Website Using htaccess
Perficient, How to Implement a NoIndex Tag (HTML)
Finally, all of this is contingent on web scrapers following the rules and etiquette of the web. As we know, many do not. Sprinkled amid the many articles I read today on blocking AI scrapers were articles on how to override blocks when scraping the web.
This will also, I suspect, be something of a game of whack-a-mole. As the img2dataset case illustrates, the previous etiquette around robots.txt was ignored in favor of a more complicated opt-out, one that many site owners either won't be aware of or won't have time/skill to implement. I would not be surprised, as the "noai" and "noimageai" tags gain traction, to see bots demanding that site owners jump through a new, different, higher, and possibly fiery hoop in order to protect the content on their sites from AI scraping. These folks serve to make a lot of money off this, which doesn't inspire me with confidence that withholding our work from their grubby hands will be an endeavor that they make easy for us.
medieshanachie · 1 year
Fixing Stories on AO3
Hey, @dragonydreams is fabulous! She figured out how to lock multiple works on AO3! (I have 414, it was gonna take me FOREVER)... but now I can do them a fandom at a time.
Warrning... I tried to do ALL of mine at once and it didn't work--it timed out... so do a fandom or two at a time.
After logging into AO3: 1. Go to Profile 2. At the bottom, click on Edit My Works 3. Select Fandom 4. Click Edit button at top (or bottom) 5. Then at the Visibility settings area check off Only Registered Users. Scroll to bottom of page and click Update All Works
persephinae · 4 months
Tumblr media
://www.dreamstime.com/ ://www.freepik.com/ ://www.craiyon.com/ ://stock.adobe.com/ ://storybird.ai/ ://www.dinosaur.org/ ://pngtree.com/ ://creator.nightcafe.studio/ ://www.123rf.com/ ://lumenor.ai/ ://neural.love/ ://www.vecteezy.com/ ://openart.ai/ ://www.artpal.com/ ://generativeai.pub/ ://promptbase.com/
Block these sites in your uBlock Origin so you won't see that shit in your searches
cozylittleartblog · 1 month
Tumblr media Tumblr media
"content creator" is a corporate word.
we are artists.
thunderboltfire · 2 months
I have unwittingly witnessed a new level of the absurd. Behold, the AI-generated equine anatomy models.
Tumblr media
Ah yes, my favourite parts of the equine body. Paster and... *looks at the smudged writing on hand* boob. At least this one looks purely decorative and the being actually looks like a horse. But don't worry, it gets worse.
Tumblr media
If we completely ignore the hipopotamus musculature of this one, there's still a lot of things that don't make sense in this one, like a tail that ends in a series of bone spikes and a complete lack of molars. You could make a cool pokemon on the basis of this, but it's not even in the realm of being an actual anatomy help.
Tumblr media
I'm firmly convinced this is not a horse, this is something that really, really wants you to think it is a horse. The more you look, the more things look... wrong. The more details turn out to be shifted, bones crammed in to fill in the familiar form, its shape merely implied so that the human mind fills the gap. Of course the text seems like gibberish, because its anatomy is incomprehensible. it's either a parasite or a monster and in each case, it's an eldtrich body horror. I'm kind of angry at how well this joke writes itself.
8pxl · 1 month
PSA 🗣️ another scammer using genAI without disclosing it
Tumblr media Tumblr media Tumblr media Tumblr media
pixlgirl has been posting generated AI (targeting fandoms) without disclosing it, passing it off as their genuine art and has apparently scammed at least one person into ‘commissioning’ them. this is a public PSA so yall can block them, and not interact. please do not harass them!
it’s incredibly shitty to be disingenuous while posting AI but even shittier to scam people with it 🤢 stay diligent yall
kiiingsnake · 6 months
Tumblr media
biomechanical god
kalofi · 1 month
Tumblr media Tumblr media Tumblr media Tumblr media
vaspider · 7 months
I'm not interested in detailed arguments about why machine-generated "art" is okay anymore. We've already seen that people's art has been taken and used in order to cut the living, working artists out of the process of making art in their specific style without paying them, and none of the pro-Tech Bro Tracing Machine art arguments address the fact that this shit is specifically intended to cut actual artists (and actual actors, writers, etc.) out of the process while taking styles they created.
It isn't just about whether a machine has a "right" to "scan" - as if "looking" is all the computer is doing - it's about the fact that the billionaires behind this have literally said "we don't think we should have to pay for your art, it would cost us too much to do that." They absolutely intend to use this to not have to pay artists and actors and writers and and ... and to be able to use our work and our faces and our words in ways we would never have consented to.
Bitching about how copyright is fucked and heavily slanted in the favor of corporations and then being "pro-AI" is some of the most ridiculous mental contortionist shit I've ever heard. "Copyright as it exists is slanted in favor of corporations, so let's make the art world even MORE unbalanced in favor of the few billionaires behind this bullshit!"
Like, do you hear yourself?
This shit is the Uber and AirBnB of the art world - same tactic, same intentions - and any "actual artist" supporting it is, as far as I give a fuck, a fucking quisling. :)
The only thing that might actually save actual living and working artists in this mess is the fact that computer-vomited "art" and "writing" can't be copyrighted.
kiiborei · 5 months
If you thought AI couldn't get any worse, it just did. Every part of this technology from its creation to its users, is being used to exploit and harm people. Anyone who tries to justify the use of this technology in its current form is absolutely evil.
Always do your research before buying art. If you know any First Nations artists who have their own online shops, please link them and support them if you can.
pillowfort-social · 4 months
fuck generative ai
autumn2may · 1 year
Tumblr media
original tweet from @jamesjyu reads: "We launch Shrink Ray today on Sudowrite! Upload your manuscript and get loglines, blurbs, synopsis, and full outlines automatically. Takes a ton of legwork out of book marketing. Below the tweet are two images of the program."
original quote tweet from @sudowrite reads: "New in Sudowrite: Upload your whole novel/script, get instant longlines (sic), blurbs, synopsis, and outline!"
tweet from @FantasyFaction reads: "Oh jeez! Bad bad, very bad! Writers DO NOT willingly give your manuscript to an AI so it can "learn" by stealing your work! I know blurbs and synopses are hard, but PLEASE do not do this! - JI 🐉
(stolen from ML Brennan & Sravani Hotha so I can include alt text)"
albertserra · 1 month
pro palestinians on here just constantly reblogging fundraisers for people in gaza, sharing the gazaesims link, sharing news updates etc.
zionists on here are just talking amongst themselves like 'heres why im proud of israel's mass bombing campaigns' and 'zionism doesnt mean colonialism and imperialism because its in our religious texts and thats what matters, not the very real effects that the application of zionism has on palestinians' and 'making judaism inseparable from an ethnostate is good, actually'
teethands · 4 months
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
little snapshots of a mod i have been working on for a few days, cenozoicraft, understandably based around adding cenozoic animals to the minecraft world along with neolithic tools. its heavily based around hunting and utilizing animal parts. most of what is here is tamable and rideable with fun little taming mechanics, so dont expect to be able to waltz up to a smilodon. more to come soon
spinjitsuburst · 6 months
shouldn’t have to say this but apparently it needs to be said
If you support using AI to create art, whether it be drawing or writing or voice acting or anything in between, get off my page. block my accounts. leave.
You are not welcome here. You are not welcome in this community and we don’t want you here
Creators don’t owe you jack shit. You do not get to use their hard work because you’re too impatient to wait for new content and too fucking lazy to learn how to create things yourself
kndrules · 3 months
Tumblr media
