#web scraping examples | Explore Tumblr posts and blogs

catchexperts · 2 months ago

Text

Web Scraping 101: Everything You Need to Know in 2025

🕸️ What Is Web Scraping? An Introduction

Web scraping—also referred to as web data extraction—is the process of collecting structured information from websites using automated scripts or tools. Initially driven by simple scripts, it has now evolved into a core component of modern data strategies for competitive research, price monitoring, SEO, market intelligence, and more.

If you’re wondering “What is the introduction of web scraping?” — it’s this: the ability to turn unstructured web content into organized datasets businesses can use to make smarter, faster decisions.

💡 What Is Web Scraping Used For?

Businesses and developers alike use web scraping to:

Monitor competitors’ pricing and SEO rankings

Extract leads from directories or online marketplaces

Track product listings, reviews, and inventory

Aggregate news, blogs, and social content for trend analysis

Fuel AI models with large datasets from the open web

Whether it’s web scraping using Python, browser-based tools, or cloud APIs, the use cases are growing fast across marketing, research, and automation.

🔍 Examples of Web Scraping in Action

What is an example of web scraping?

A real estate firm scrapes listing data (price, location, features) from property websites to build a market dashboard.

An eCommerce brand scrapes competitor prices daily to adjust its own pricing in real time.

A SaaS company uses BeautifulSoup in Python to extract product reviews and social proof for sentiment analysis.

For many, web scraping is the first step in automating decision-making and building data pipelines for BI platforms.

⚖️ Is Web Scraping Legal?

Yes—if done ethically and responsibly. While scraping public data is legal in many jurisdictions, scraping private, gated, or copyrighted content can lead to violations.

To stay compliant:

Respect robots.txt rules

Avoid scraping personal or sensitive data

Prefer API access where possible

Follow website terms of service

If you’re wondering “Is web scraping legal?”—the answer lies in how you scrape and what you scrape.

🧠 Web Scraping with Python: Tools & Libraries

What is web scraping in Python? Python is the most popular language for scraping because of its ease of use and strong ecosystem.

Popular Python libraries for web scraping include:

BeautifulSoup – simple and effective for HTML parsing

Requests – handles HTTP requests

Selenium – ideal for dynamic JavaScript-heavy pages

Scrapy – robust framework for large-scale scraping projects

Puppeteer (via Node.js) – for advanced browser emulation

These tools are often used in tutorials like “Web scraping using Python BeautifulSoup” or “Python web scraping library for beginners.”

⚙️ DIY vs. Managed Web Scraping

You can choose between:

DIY scraping: Full control, requires dev resources

Managed scraping: Outsourced to experts, ideal for scale or non-technical teams

Use managed scraping services for large-scale needs, or build Python-based scrapers for targeted projects using frameworks and libraries mentioned above.

🚧 Challenges in Web Scraping (and How to Overcome Them)

Modern websites often include:

JavaScript rendering

CAPTCHA protection

Rate limiting and dynamic loading

To solve this:

Use rotating proxies

Implement headless browsers like Selenium

Leverage AI-powered scraping for content variation and structure detection

Deploy scrapers on cloud platforms using containers (e.g., Docker + AWS)

🔐 Ethical and Legal Best Practices

Scraping must balance business innovation with user privacy and legal integrity. Ethical scraping includes:

Minimal server load

Clear attribution

Honoring opt-out mechanisms

This ensures long-term scalability and compliance for enterprise-grade web scraping systems.

🔮 The Future of Web Scraping

As demand for real-time analytics and AI training data grows, scraping is becoming:

Smarter (AI-enhanced)

Faster (real-time extraction)

Scalable (cloud-native deployments)

From developers using BeautifulSoup or Scrapy, to businesses leveraging API-fed dashboards, web scraping is central to turning online information into strategic insights.

📘 Summary: Web Scraping 101 in 2025

Web scraping in 2025 is the automated collection of website data, widely used for SEO monitoring, price tracking, lead generation, and competitive research. It relies on powerful tools like BeautifulSoup, Selenium, and Scrapy, especially within Python environments. While scraping publicly available data is generally legal, it's crucial to follow website terms of service and ethical guidelines to avoid compliance issues. Despite challenges like dynamic content and anti-scraping defenses, the use of AI and cloud-based infrastructure is making web scraping smarter, faster, and more scalable than ever—transforming it into a cornerstone of modern data strategies.

🔗 Want to Build or Scale Your AI-Powered Scraping Strategy?

Whether you're exploring AI-driven tools, training models on web data, or integrating smart automation into your data workflows—AI is transforming how web scraping works at scale.

👉 Find AI Agencies specialized in intelligent web scraping on Catch Experts,

📲 Stay connected for the latest in AI, data automation, and scraping innovation:

💼 LinkedIn

🐦 Twitter

📸 Instagram

👍 Facebook

▶️ YouTube

0 notes

howtofightwrite · 9 months ago

Note

So. There is no safe place (in the body) to be shot. There are places that are slightly safer than others to be stabbed (i.e. being stabbed in the meat of your calf is less likely to be lethal than your torso). For the purposes of an arrow wound, which feels like a combination of shot+stab for a swords n sorcery world, is there a “less lethal” place to take an arrow? My character is traveling with companions and gets into a fun little goblin skirmish. I need her to catch an arrow somewhere that will be concerning, but not immediately fatal. Magic Bullshit™️ will keep the wound from healing properly for a few days, but I’ve accounted for field wound care (cleaning and bandaging and such) as she’s being taken on horseback to get proper treatment.

Not deeply.

So, the problem with all of these is tissue disruption. If the injury gets deep enough, the chances that it will hit something vital (especially on the torso) increase dramatically. So, getting stabbed and having the blade catch bone, instead of getting in deeper is “relatively” safe. Similarly, getting stabbed (or shot) in the hand or foot is unlikely to kill you (though, those injuries are likely to result in permanent damage impairing the use of injured appendage.)

Arrows are a little different, in a couple of ways. First, if you get shot, you do not want to pull that off (nor break it off and push it through.) That will increase the risk of bleeding out. Arrows make fairly large holes in people, but if the arrow sticks in the wound (which, it should) it will actually limit the amount of bleeding. Effectively the wound has a partial plug in it. Pulling out the arrow means that plug is no longer there, and they can happily bleed to death on the spot.

The second thing about arrows is that they actually pin muscle together. Think of it a bit like holding two pieces of meat together with a toothpick. If the toothpick isn't there, the pieces can slide across each other without issue, but that's not possible when there's a wooden shaft running through them. Your muscles are a complex web of meat, that slide over each other as you move. Pinning those together means that part of your body will actually lock up. For example, if you're shot in the shoulder, you won't be able to adjust the position of your arm. It's been toothpicked, and it's not going anywhere.

Arrowheads can get wedged in bone. If it's a broad head, or hunting tip, that will be obnoxious to get out.

At the risk of reading too much into your setting, goblins often means poisons, or other nastiness. Though, really, even just getting a tetanus infection (it used to be called “lock jaw”) from their blades is a pretty horrific potential fate. Even if the wounds themselves were relatively minor (cuts and scrapes, maybe a graze or two), a couple days might still result in some pretty horrific harm after the fact.

Also, remember, it's unlikely that bacteria will be understood by the medical science of your setting. So, first aid would still run a real risk of secondary infections.

Depending on their skill in first aid, anything outside of a severed artery or catastrophic organ damage should be (technically) survivable, though the wounds could easily result in permanent impairments, depending on exactly what was hit. A punctured lung might not kill her, but it could result in permanent respiratory issues, such as a cough, and chronic pain while breathing heavily from then on. It could also result in pneumonia and death, which is also, usually, pretty permanent.

Some of this depends more on where you want to land on a spectrum between dark fantasy and swords & sorcery. The genres are similar (and potentially overlapping), but can scatter out into dramatically different works. But, you do have some options on how you want to proceed.

-Starke

This blog is supported through Patreon. Patrons get access to new posts three days early, and direct access to us through Discord. If you’re already a Patron, thank you. If you’d like to support us, please consider becoming a Patron.

#writing advice #writing reference #writing tips #starke answers #how to fight write #starke is not a real doctor

560 notes · View notes

arieswritez · 2 months ago

Text

puppy love

puppy love | yandere!mark grayson x afab!reader | MULTI-CHAP: 4

chapter 3

cw; DARK CONTENT!!! MDNI!!! MENTION OF ANIMAL DEATH, reader is neurodivergent, ableism, growing up is messy & adults suck, angst, niceguy™/slight incel mark, childhood friend/bully!mark, mark gets his powers sooner, teeny tiny implications of pseudo incest (blink and you’ll miss it), SADIST MARK, violent rape, threats of violence, & canon typical violence, so . . gore, stalking, implied murder, gender & body dysphoria, mentions/implications of disordered eating, mark teases reader about their body once, overall asshole mark, implied grooming (mark handles it but he’s a lil bitch about it later), so, victim blaming, misogyny, the inexplicable horrors of being afab, objectification, sexualization

about; [the fuckin' thought of you with somebody else, i don't like that.] . . actually, if you even consider leaving i'll lose a couple screws in due time, i'll stop breathing and you'll see the meaning of stalker when i pop out the dark to find you and that new dude that you're seeing with a attitude - IFHY (tyler the creator)

there was blood on mark's hands.

syrupy and wet.

the distinct stench of iron rot fogged up his senses.

blood clots stuck like soft gelatin between his fingers. stretching, snapping webs of gore whenever he opened and closed his hands.

still warm as he switched on the water from your sink.

the suds from your hand soap came up a copper brown, adorned by tiny rivulets of red as he dug beneath his fingernails to scrape away any remnants of viscera.

dna washed away by tap water.

his skin purified once again.

mark looked up and met the eyes in the reflection, making sure to pick off specks of skull fragment and the fatty tissue of brain matter from strands of his hair.

what a fantasy.

a blink and it's all gone.

just like you.

you and your attention.

your undying devotion. a huff and the flame gets snuffed.

better yet. . you light and pass the torch to someone fucking else.

it's no good.

there's no use being mad at you and your uninspiring devotion. how special is your love, really, if it is so easily obtained?

and why does the fact that it no longer belongs to mark so upsetting?

why'd the realization that anyone who called you pretty would have you fantasizing about baby's breath bouquets - something old, something new, something borrowed, something blue, and a sixpence in your shoe - make his blood run that much hotter?

why'd it make him stare down into the sink, faucet running, as he tried to slow down his breathing? gripping the edges of your porcelain sink until he heard it creak. counting forwards to a hundred, then back again.

he did all the things the therapist his mother took him to recommended he do when those feelings came up. things to see, smell, and touch, and taste. but the only thing that came through the ringing in his ears was the vivid fantasy of tearing your boy apart.

he could see the light leaving his eyes. he could smell the acrid stench of piss running down the coward's leg. and god knows he'd only ever touch him to dispose of his body.

and at the end, he'd taste the tears collecting at your cupid's bow when you sought comfort in his presence. just like the old days. it'd all be worth it in the end.

. . he shouldn't have read your dairy.

not because debbie raised him to 'respect privacy' - because who doesn't keep shit in their notes app in this day and age? - but because it put him in a shitty mood.

but he was also glad he did it.

it revealed what your problem was.

and mark's always been your problem solver.

mark was imaginative.

mark was smart.

mark was also patient.

surely, you'll get bored.

you'll preoccupy your mind with mundane things: how the world spins, for example. what you'll make of yourself. what people will think of you.

ouroboros: swallowing yourself whole trying to find the beginning to the end.

will you be loved? how will you be loved?

you're a glutton obsessing over not being enough in the first place. more, more, more.

you'll dizzy yourself.

come full circle, nausea and vertigo, habitually crawling back to him.

you're a distracted little thing.

you always have been.

it's in your nature.

mark tries not to be too hard on you about your romantic pursuits.

after all, you'll go after what you think you deserve.

and if that's dysfunction, then so be it.

however. . . your standards could be a little higher. had it been any other person occupying your mind. . mark wouldn't have cared.

oh, not at all.

he cares fuck all about your meaningless schoolyard crushes but the one thing that boils mark's blood is all of the abuse.

the hoops you have to jump through for the smallest shred of applause.

and really, how pathetic do you have to be? why can't you see that he's using you? as entertainment. as a pet. as a clown.

and what you don't understand is that deep down. . mark and your boy aren't all that different.

which explains why you like him so much.

mark and your boy were sharks.

your boy could smell your blood from a mile away; see the desperation in the way you sauntered past him, salivating at the thought of being the apple of his eye.

he saw you for what you were: prey.

and they saw right through your flimsy little costume of new clothing and perfumed wrists.

your boy and his group of cronies didn't laugh at your jokes because they thought you were funny. they laughed at the idea of you believing they found you entertaining.

your mediocre attempts at relevancy were funny - hilarious, even - because of how eager you were to impress them.

and the only reason why they hadn't used and discarded you like a plastic bag with warm dog shit inside of it was because they were more than happy tossing a coin into traffic, making you fetch just so they could entertain themselves watching you get hit by a bus.

but everything for your boy, right?

you and that fucking boy.

whatever it is, mark's more than willing to find a way to make all of that stop. he's devised some plans to make everything go back to the way they used to be.

it'd always been you and mark.

mark and you.

he planned to keep it that way.

and so, he was on his best behavior.

he'd let you have your boy.

he'd push down the bile that crawled up his throat whenever he imagined his hands on you. whenever he saw your face light up whenever your phone pings with a notification.

mark can be a very good actor.

he'd act as if his stares weren't deadly when you looked up and caught him looking at you. he could melt those icey eyes, the ones that glaze over in anger, and turn them into their usual warm brown.

he's on his best behavior.

attentive, even.

he's so, so interested in what you've got going on.

who are you talking to? yes you can tell me. no i won't get mad. yes. i promise. him? yeah, I remember. why didn't you tell me?

no, i'm not mad.

good for you!

no, i won't threaten him.

who do you think i am~?

mark knows better than to be outright poisonous towards you. not when there was another boy willing to stuff your pretty little head with cotton.

you are far too sensitive to hear anything that isn't a candied lie. if he plays nice, it gives him the upper-hand.

there is no need to vent to a diary when your best friend is sitting in front of you, doe eyed and innocent, the way he pretended to be when you two were twelve and his mom would check up on you in his room. or when teachers would walk past and he had to pretend he wasn't pressing the sharp point of his pencil into your thigh.

mark loves your parent(s).

they aren't that much different than you.

in fact, mark has come to find that there aren't many people that match him in terms of intelligence.

he can see why you came out the way you did. un-special, if he's feeling kind. the other word he'd like to use is not nice to call someone.

pining after approval, your parent(s) were very easy to like.

very easy to control.

"i just don't know if they've told you, yet. . it seems kinda unfair that i'll be the one to say." mark mutters under his breath, tracing shapes into the dining room table as your parent(s) sit across from him.

"mark," your parent reaches across the table, hoping to grab his hand, only for mark to pull it out of their reach. "if something's happening. . we want to know. we need to know."

"it's just that. ." mark pauses, gives a few seconds to really build the tension. "it's a bit embarrassing."

super.

he's worried about you, you see? there's a group of guys you've been chasing around in school. . and mark doesn't think they have your best interest in mind.

mark has heard. . things.

but you've gone cold on him.

he's worried you might be. .

well, he's worried you might be having sex.

with a few. .

. . all of them?

oh, who gives a shit? the more the better. and the more mark spills, plucks things out of thin air, the more petrified your parents look.

he makes sure to say it.

sex.

hisses, purrs it, whispers it like it's such a bad word.

he even wills himself to look embarrassed, averting his eyes like it's a shameful thing.

it brings him back to the day debbie caught him with some girl after a baseball game.

she had just been some random. a shiny object that called mark's attention. something he could put his dick into while he tucked his face into her neck and imagined the sounds you'd make.

his mom should've known he was already having sex. however, having been caught with his pants down and balls deep in someone wasn't necessarily the way he planned to break it to her.

he heard his mom and his dad arguing in the next room that night and, coincidentally, nolan came in and gave him 'the talk' to the best of his ability.

humans are fragile, mark.

yes, they are.

but the bruises on her were not his fault.

she was soft.

and she'd liked it.

nevertheless, your parents are not as forgiving as mark's.

they promise him it's not a big deal. that he did good. that he's good.

a good kid, a good student, a good friend.

but as soon as he's gone, he knows they are searching your room top to bottom.

he flies up to your room and peeks in through the curtains to watch them toss open closet doors, rummaging through clothing, bookbags, notebooks, whatever they can find.

and finally, your bed.

your diary with all the juicy, dirty - downright violent, jesus - fantasies mark wrote by forging your handwriting.

and your nightstand.

wherein tucked underneath your cute underwear lays a shiny pack of condoms.

at least you're being safe.

you'll never hear the end of it.

it's too good to miss and mark doesn't care if he has to wait all day for you to get home. he wants to watch your everything crash and burn.

not that he'll have to wait much, anyway.

your parent's on the phone, trying to contain red hot anger from spewing out like a backed up volcano, hissing at you to get home, now.

you poor thing.

you poor, poor, thing.

you don't know what to tell them when they toss the pack of condoms at your feet.

when they shove the journal in your face, showing you all the depraved things you wrote in that cute little scrawl.

the boys, the nights out in which you claimed to be at mark's: helping him out with a project.

yeah, right, stop lying, already!

"give me your phone. now."

fingers feverishly tapping and swiping, going through texts as tears stream down your flushed face.

you've got a date tonight.

and you hadn't told your parent(s).

what a coincidence, oh my!

your boy must've planned to seal the deal that night. and mark would be damned if he didn't have you first.

mark doesn't need to worry.

that's definitely not happening now, is it?

in fact, you won't be able to go anywhere that isn't class for the rest of the school year. not unless you're monitored by mark. and isn't it embarrassing, mark having to be some sort of guardian?

"I thought you were smarter than this."

and you're too good to yell back.

you're too good to argue and try to explain that it wasn't you.

you didn't buy condoms. you didn't write that. you didn't do anything.

but if it wasn't you, who was it?

who did?

you look every bit of a cornered animal. it's very you: to freeze in situations like that. back to the door, facing the window just enough for mark to be able to peek at every emotion going past your face through the crack of your curtains.

he watches it flicker past your eyes, the way the muscles in your neck tense up when you squeak out those ugly, strangled, sniffed out cries. the ones you try to hold back when you're crying alone in your room and you want no one else to hear them.

the ones you'd let out at your desk when you were itty bitty and your parent had dropped you off at kindergarten, promising you they'd be right back, but they never were.

you are so much like the way you used to be.

mark wishes things hadn't changed.

he wishes you were just as innocent, as good. he wishes no one would've turned you into what you are now.

he wishes you wouldn't have been stupid enough to let them.

you don't say anything.

you don't even push past your parent when they're done berating you, just stare down at the floor until their mouth has dried, and they shoulder check past you.

you only slowly turn to push the door closed, grab your computer and send a message to the only person you think you can confide in.

he arrives in ten minutes.

enough to make it believable, climb up a tree and sneak into your room.

you fall into his arms immediately, sobbing.

mark hopes you don't feel him smiling against your shoulder as he comforts you.

your boy has been different since the last time you talked to him. distant, distracted. different. you catch him zoning out whenever the two of you are studying in the library, not reciprocating when you try to play footsies with him.

you're not sure if it has to do with the night you had to cancel your date. sure, it was last minute but he'd told you that it was okay. but with everything going on at home, you don't have the patience to hear him lie.

"seriously, what's up?" you ask, kicking his shoe softly.

your boy looks up at you.

his eyes used to gleam with confidence. the type of cockiness that'd make your cheeks burn and butterflies flutter like mad in your stomach. but they looked empty then. he looked like he hadn't slept well. that night or the one before.

he looked around, making sure no one was within earshot. you leaned forward in response, your curiosity peaked.

"this is going to sound weird but. . do you ever get the feeling that you're being watched?"

you blinked.

"uh. . hm. ."

come to think of it. . sometimes you did. you've been sensitive to eyes on you since you can remember. the hyper vigilance is something you've grown accustomed to, making peace with the fact that it might not be a curse after all, and instead some sort of safety feature.

but it felt different.

not like the irrational tickle in your stomach whenever you think of a possibility. but the speckling feeling across your skin, crawling with a million legs, the kind that makes you hallucinate a breath against your neck. the type that has your head rolling, looking for an intruder.

nothing.

but you didn't tell your boy.

because your boy was talking about himself and you've learned to insert yourself into it could be rude.

you settle with saying, "what do you mean?"

he shrugged a shoulder. "i dunno. watched. I get that sometimes. see something from the corner of my eye. and when I turn to look it's gone."

you felt your heart pick up speed. strange. the same thing had been happening to you.

you let out a nervous laugh. "if you're saying this to scare me I'm gonna get really mad, y'know?"

"i'm serious." he said, almost urgently. "and here's this: i was walking to my car after baseball practice and found some weird red shit smeared across my windshield."

he's fucking with you.

surely, he is.

this must have something to do with the rumour circulating around school. the one in which they've seen a figure whizzing past. the one in which that figure is the reason in which some animal carcasses have been found in the baseball field, mutilated like some sort of fucked up science experiment. a villain that's found a hobby in terrorizing the town, perhaps.

"it's probably nothing." you whisper, unsure if you're trying to convince him or yourself.

"probably." he responds.

he doesn't look convinced.

and he doesn't reciprocate when you try, again, to get his attention.

your boy was gone.

gone, gone, gone.

word around the school was that he'd transfered.

but that started to feel suspicious when the students noticed the smell.

something easy to dismiss at first.

the kind of funk attributed to warm weather and not enough deodorant. growing boys and their scattered hormones.

and then it grew.

bold, loud.

ugly enough that it couldn't be ignored.

sour.

downright rancid.

and it was all coming from your boy's locker.

it got so bad a janitor had to pry his locker open.

and that's where they found a decomposed animal, tire marks through the middle of the delicate body. maggots swarming in the orifice where the eyes used to be.

you don't remember when the last time you saw him was.

you don't know if you ever will.

with his past time of mutilating animals and collecting roadkill, you're not sure you even want to.

and if you did, the only thing you'd ask is why?

mark seemed the least surprised about it.

he hadn't so much as grimaced as he told you the story of his locker being pried open.

the stench was the worst thing, apparently.

although, it wasn't enough to deter his appetite as he popped grapes between his fingers, making sure to squirt the juice onto you as he described fat, wriggling maggots falling off in swarming little balls off of the carcass.

you shiver, skin crawling, staring at the pile of homework before mark.

now that your boy had vanished into thin air, his entourage wanted nothing to do with you. you figured it was only normal. you were all preparing for finals, applying for college, planning ahead.

still, it hurt.

it hurt to think you almost had it, almost had him, but it was all taken away. you're not sure why you feel that way, but you do.

and the only thing keeping you afloat is the fact that you've found your way back to mark.

it reminds you, he'll always be there for you.

no matter what.

it's nice, you think.

spending time with your best friend.

even if it means doing mark's work again.

CHAPTER 5

#mark grayson x reader #alternate mark grayson x reader #mark grayson #invincible #invincible x reader #yandere mark grayson #yandere mark grayson x reader #bpd king #he just like me #srry for my disappearance #i was going insane #it will happen again #sinister mark #sinister mark x reader #invincible variants

130 notes · View notes

merakiui · 13 days ago

Note

somehow thinking of skully always results in the most deranged of thoughts, which is why tenth prompt immediately brought him to mind. he just seems like the kind of guy to engage easily and with little remorse in the most taboo things. feeding you a part of himself to ensure you are trapped in his timeline? well, that's simply what you do when you love somebody very much. no, he isn't getting off to this.

which is to say that, if it's not too much bother, i'd love to request some disgusting dead dove skully for prompt 10 (though 4 is also sorta similar?)

>:) disgusting dead dove Skully...... my favorite flavor!!!!! I changed the prompt sentence just slightly,,, ;;;

(cw: yandere, unhealthy behaviors/relationship, female reader, captivity, dead dove, reader is treated like a rag doll in skully's pursuit of having a jack x sally esque romance, delusion, obsession, gore, non-con dissection, drugging, mutilation, skully's freak levels are criminal and absolutely deranged)

(monstrously yandere prompts)

"Good morning, my darling!"

Beaming from ear to ear, your cheerful kidnapper wheels a metal cart into the brick cell he's calling a bedroom. Your roommates are rats and spiders and whatever other monsters dwell in the darkness down here. This is where he lives, too, apparently. This strange boy who hides away in a mad scientist's stone cellar... It's patently absurd, a horrific tale fit for the shelves, and yet it's your reality.

He claims he's saved you from the cruel and callous Dr. Ashengrotto and his terrible, terrible needles. You'd be nothing but a lab rat kept awake on a cocktail of fatal flora, and your death would be by the doctor's design. It's all miserable business anyhow, or so Skully says with a casual wave of his hand. You're in better hands down here. Because this is Skully's domain, and he has you all to himself.

Dr. Ashengrotto will never step foot down here because it is here where all of his abandoned creations lurk, and many of them cannot wait to sink their serrated maws into the throat of the man who left them to rot.

That's why Skully, this odd boy who feasts on rats and allows spiders to spin their webs between the valley of his fingers, is more than happy to have you here with him. He swiped you right from Dr. Ashengrotto's operating table, and he's quite proud of it. Deadly nightshade is a glorious poison, or so he'll boast. He's so pleased he learned the distinctions to better knock out the doctor. It's not his intention to kill Dr. Ashengrotto—although he very well could.

Rather, he just wants his specimen. That's why he's done well to keep you safe down here.

But is it truly any better?

"Did you know," he continues, disregarding your silence and the way you squish yourself into the corner to get away from him, "the brain prevents you from biting off the tip of your tongue? A very curious example of animalistic instincts. We are wired for self-inflicted violence, and yet our brain refuses to allow the body to entertain that."

You eye him warily. Something glints on the tray. A blade... A big blade, actually. Dread pools in the pits of your stomach. Bile is already scraping at your throat. You want to run, but he snipped the tendons in your legs and so now you're no better than a baby bird. You could crawl, but he'd easily catch up.

"If I'm to call myself yours, much like Lord Jack did with his dear Sally, it's only fair we look the part, no?" Spidery digits fall away from the cart, and he bends down to peer at you. Those peculiar spiral eyes blink one at a time. You wonder if he's ever gone outside. He's less human and more...creature. Does he even know how to be human? "So where shall we begin? If you try hard enough and perish every self-preservation instinct, your teeth could snip off the tip of your tongue. I can do the same with mine and we can swap them!" But he's quick to fluster, and his hands fly up to hide his glowing cheeks. "A-Ah... But perhaps that's too forward of me. Forgive me. It was wildly uncouth to suggest such blatant intimacy so early into our courtship."

You've never known kidnapping to be courtship. You're beginning to think Dr. Ashengrotto and his terrible, terrible needles would be better than this madness.

"We can do this instead." He lifts a cleaver from the cart and runs his finger along the blade. A bead of blood pools at his fingertip. "This shall do nicely." Turning to you with a smile, he rests his arm on the stone tablet positioned just beside the tray of surgical equipment. "It's only proper if I go first. A gentleman must always escort and reassure his lady. Oh, but I must hurry. Dr. Ashengrotto will wake soon, and I will know quite the tongue lashing if his tools are missing." He squeezes his eyes shut and winces at the memory, as if Dr. Ashengrotto is somehow more terrifying than the blade he's poised above his wrist.

Skully lifts his arm and then—

"Oh!"

He sets the blade down. You breathe a relieved sigh that soon sticks in your throat. "I almost forgot." He lifts a murky beaker from the table. "Please drink this. It will take all the shivers away. I promise. Don't be scared." He leans in close and brings the glass to your mouth, but that only serves to make you struggle with more force. Skully sighs, pouting in disappointment, and grabs hold of your chin to force the foul-tasting potion down your throat. You cough and grab at your neck. There's nothing you can do. You drank it.

"W-What was..."

"Fret not, my dear. This will melt away your nerves. You won't even know the concept of fear soon."

With that, he picks the cleaver up.

You watch, eyes wide and mouth gaping, as he raises his arm high. You see the dotted lines he's drawn on his wrist. And before you can scream he's brought the cleaver down to cleanly sever his hand from his arm.

It must be painful. It has to be!

So then how is he humming? Why isn't he crying and screaming, holding his now handless arm in agony? How can he casually fashion a tourniquet for his weeping wound, the blood so thick and sticky it spills onto the cart and drips on the floor in little puddles?!

"Soon, our bodies shall be wed. Isn't this exciting? We'll be husband and wife!" He swipes his finger through the blood pooling on the surface of the cart, doodling a crooked heart.

You feel sick, but nothing will come up. Your stomach churns. Something is scratching at your eyes. You feel heavy, as if the weight of the world is pressing down on your shoulders. There's so. much. blood.

You shut your eyes for a moment, but when you wake it's to the foul stench of gore and the prick of pain as the needle works through your flesh.

Skully sews his hand onto the stump where yours once was and, in return, yours replaces the empty space on his arm.

He cradles your cheek with his—your—bloodied hand. It's cold. "You're so beautiful," he whispers, his thumb rubbing just beneath your eyes, cherishing your cheekbones. "I've never known such agonizing happiness. Aah, it's like Cupid is stirring my heart into soup, stabbing it over and over with his arrows. You're lovely, an absolute scream."

You can hardly move, but your eyes manage to slide down the length of your limp arm to find the hand that's now surgically attached. His hand. The hand that has never belonged to you.

You give a choked sob. Skully smiles and leans in to lick your tears away.

- - -

He wants to open you up and rifle around in your insides. "If only I could give you my heart," he laments. "I don't deserve this organ. Not when I have such beastly thoughts!"

You think, if it were possible, he'd wrap your intestines around his neck like they're a feather boa, a grotesque treasure. But he doesn't want to kill you, even though sometimes he tells you you'd be such a pretty corpse, so he never allows his knife to touch your stomach.

That's why he's decided to give you the next best thing.

The needle is so close to your eye. You can't move. He's given you something, so all you can do is lie rigidly still like the dead. Again, Skully hums a haunting melody.

"Please..." you beg. "Please don't do this."

He blinks down at you with one eye. The other is sitting preserved in a glass jar, from when he plucked it out himself and cut away the pesky optic nerves, all while rambling about how fantastic this is. You wonder if he's immune to pain. Is he even alive? There's a bandage wrapped around his head, concealing his empty eye socket. Soon, you won't be able to move your mouth to voice any pleas. Silent tears crawl down your cheeks instead.

When you look at him, scrawny boy with his strange, toothy grin and his bedraggled hair, you wonder if he's one of Dr. Ashengrotto's long-forgotten experiments.

"I'll be gentle," he promises.

You spend those next few hours in indescribable torment.

You pass out just before you lose the sight in your left eye.

When you wake, Skully is holding you in his arms, brushing a hand through your hair to soothe you. He hums a sweet lullaby, his sharp, curled nails sticking at every knot in your hair. He yanks through them, undeterred.

"Are you awake?" He peers down at you. "Oh, what a relief! It's been days since you shut your eyes. I was beginning to worry I'd hurt you..."

His frown quickly quirks up into a bright smile. "But not to worry! You're still alive. I'm so pleased!"

He feels around in the dimly lit space for something. When you open your eyes, you see yourself. The vision in your left eye is blurred. Skully props you up so you can get a better look at yourself in the fractured glass.

You're looking back with an orange eye.

And yours is nestled snugly in his socket. A perfect fit.

"Isn't it wonderful?" he says, ignoring your wheezing, hyperventilating cries and the blood that trickles from your ruined socket. "My most important parts are inside you—my hand and my eye and patches of my flesh—and yours are here within me. I shall be a temple that cherishes your pieces. With this, you'll be mine forevermore."

He gathers you in his lanky arms and squeezes you in a hug that robs you of your air.

"And when you perish, I shall take your heart and sew it in next to mine. Then we'll never be apart."

#twisted chit chat #yandere twst

73 notes · View notes

mostlysignssomeportents · 1 year ago

Text

Humans are not perfectly vigilant

I'm on tour with my new, nationally bestselling novel The Bezzle! Catch me in BOSTON with Randall "XKCD" Munroe (Apr 11), then PROVIDENCE (Apr 12), and beyond!

Here's a fun AI story: a security researcher noticed that large companies' AI-authored source-code repeatedly referenced a nonexistent library (an AI "hallucination"), so he created a (defanged) malicious library with that name and uploaded it, and thousands of developers automatically downloaded and incorporated it as they compiled the code:

https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/

These "hallucinations" are a stubbornly persistent feature of large language models, because these models only give the illusion of understanding; in reality, they are just sophisticated forms of autocomplete, drawing on huge databases to make shrewd (but reliably fallible) guesses about which word comes next:

https://dl.acm.org/doi/10.1145/3442188.3445922

Guessing the next word without understanding the meaning of the resulting sentence makes unsupervised LLMs unsuitable for high-stakes tasks. The whole AI bubble is based on convincing investors that one or more of the following is true:

There are low-stakes, high-value tasks that will recoup the massive costs of AI training and operation;

There are high-stakes, high-value tasks that can be made cheaper by adding an AI to a human operator;

Adding more training data to an AI will make it stop hallucinating, so that it can take over high-stakes, high-value tasks without a "human in the loop."

These are dubious propositions. There's a universe of low-stakes, low-value tasks – political disinformation, spam, fraud, academic cheating, nonconsensual porn, dialog for video-game NPCs – but none of them seem likely to generate enough revenue for AI companies to justify the billions spent on models, nor the trillions in valuation attributed to AI companies:

https://locusmag.com/2023/12/commentary-cory-doctorow-what-kind-of-bubble-is-ai/

The proposition that increasing training data will decrease hallucinations is hotly contested among AI practitioners. I confess that I don't know enough about AI to evaluate opposing sides' claims, but even if you stipulate that adding lots of human-generated training data will make the software a better guesser, there's a serious problem. All those low-value, low-stakes applications are flooding the internet with botshit. After all, the one thing AI is unarguably very good at is producing bullshit at scale. As the web becomes an anaerobic lagoon for botshit, the quantum of human-generated "content" in any internet core sample is dwindling to homeopathic levels:

https://pluralistic.net/2024/03/14/inhuman-centipede/#enshittibottification

This means that adding another order of magnitude more training data to AI won't just add massive computational expense – the data will be many orders of magnitude more expensive to acquire, even without factoring in the additional liability arising from new legal theories about scraping:

https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

That leaves us with "humans in the loop" – the idea that an AI's business model is selling software to businesses that will pair it with human operators who will closely scrutinize the code's guesses. There's a version of this that sounds plausible – the one in which the human operator is in charge, and the AI acts as an eternally vigilant "sanity check" on the human's activities.

For example, my car has a system that notices when I activate my blinker while there's another car in my blind-spot. I'm pretty consistent about checking my blind spot, but I'm also a fallible human and there've been a couple times where the alert saved me from making a potentially dangerous maneuver. As disciplined as I am, I'm also sometimes forgetful about turning off lights, or waking up in time for work, or remembering someone's phone number (or birthday). I like having an automated system that does the robotically perfect trick of never forgetting something important.

There's a name for this in automation circles: a "centaur." I'm the human head, and I've fused with a powerful robot body that supports me, doing things that humans are innately bad at.

That's the good kind of automation, and we all benefit from it. But it only takes a small twist to turn this good automation into a nightmare. I'm speaking here of the reverse-centaur: automation in which the computer is in charge, bossing a human around so it can get its job done. Think of Amazon warehouse workers, who wear haptic bracelets and are continuously observed by AI cameras as autonomous shelves shuttle in front of them and demand that they pick and pack items at a pace that destroys their bodies and drives them mad:

https://pluralistic.net/2022/04/17/revenge-of-the-chickenized-reverse-centaurs/

Automation centaurs are great: they relieve humans of drudgework and let them focus on the creative and satisfying parts of their jobs. That's how AI-assisted coding is pitched: rather than looking up tricky syntax and other tedious programming tasks, an AI "co-pilot" is billed as freeing up its human "pilot" to focus on the creative puzzle-solving that makes coding so satisfying.

But an hallucinating AI is a terrible co-pilot. It's just good enough to get the job done much of the time, but it also sneakily inserts booby-traps that are statistically guaranteed to look as plausible as the good code (that's what a next-word-guessing program does: guesses the statistically most likely word).

This turns AI-"assisted" coders into reverse centaurs. The AI can churn out code at superhuman speed, and you, the human in the loop, must maintain perfect vigilance and attention as you review that code, spotting the cleverly disguised hooks for malicious code that the AI can't be prevented from inserting into its code. As "Lena" writes, "code review [is] difficult relative to writing new code":

https://twitter.com/qntm/status/1773779967521780169

Why is that? "Passively reading someone else's code just doesn't engage my brain in the same way. It's harder to do properly":

https://twitter.com/qntm/status/1773780355708764665

There's a name for this phenomenon: "automation blindness." Humans are just not equipped for eternal vigilance. We get good at spotting patterns that occur frequently – so good that we miss the anomalies. That's why TSA agents are so good at spotting harmless shampoo bottles on X-rays, even as they miss nearly every gun and bomb that a red team smuggles through their checkpoints:

https://pluralistic.net/2023/08/23/automation-blindness/#humans-in-the-loop

"Lena"'s thread points out that this is as true for AI-assisted driving as it is for AI-assisted coding: "self-driving cars replace the experience of driving with the experience of being a driving instructor":

https://twitter.com/qntm/status/1773841546753831283

In other words, they turn you into a reverse-centaur. Whereas my blind-spot double-checking robot allows me to make maneuvers at human speed and points out the things I've missed, a "supervised" self-driving car makes maneuvers at a computer's frantic pace, and demands that its human supervisor tirelessly and perfectly assesses each of those maneuvers. No wonder Cruise's murderous "self-driving" taxis replaced each low-waged driver with 1.5 high-waged technical robot supervisors:

https://pluralistic.net/2024/01/11/robots-stole-my-jerb/#computer-says-no

AI radiology programs are said to be able to spot cancerous masses that human radiologists miss. A centaur-based AI-assisted radiology program would keep the same number of radiologists in the field, but they would get less done: every time they assessed an X-ray, the AI would give them a second opinion. If the human and the AI disagreed, the human would go back and re-assess the X-ray. We'd get better radiology, at a higher price (the price of the AI software, plus the additional hours the radiologist would work).

But back to making the AI bubble pay off: for AI to pay off, the human in the loop has to reduce the costs of the business buying an AI. No one who invests in an AI company believes that their returns will come from business customers to agree to increase their costs. The AI can't do your job, but the AI salesman can convince your boss to fire you and replace you with an AI anyway – that pitch is the most successful form of AI disinformation in the world.

An AI that "hallucinates" bad advice to fliers can't replace human customer service reps, but airlines are firing reps and replacing them with chatbots:

https://www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-know

An AI that "hallucinates" bad legal advice to New Yorkers can't replace city services, but Mayor Adams still tells New Yorkers to get their legal advice from his chatbots:

https://arstechnica.com/ai/2024/03/nycs-government-chatbot-is-lying-about-city-laws-and-regulations/

The only reason bosses want to buy robots is to fire humans and lower their costs. That's why "AI art" is such a pisser. There are plenty of harmless ways to automate art production with software – everything from a "healing brush" in Photoshop to deepfake tools that let a video-editor alter the eye-lines of all the extras in a scene to shift the focus. A graphic novelist who models a room in The Sims and then moves the camera around to get traceable geometry for different angles is a centaur – they are genuinely offloading some finicky drudgework onto a robot that is perfectly attentive and vigilant.

But the pitch from "AI art" companies is "fire your graphic artists and replace them with botshit." They're pitching a world where the robots get to do all the creative stuff (badly) and humans have to work at robotic pace, with robotic vigilance, in order to catch the mistakes that the robots make at superhuman speed.

Reverse centaurism is brutal. That's not news: Charlie Chaplin documented the problems of reverse centaurs nearly 100 years ago:

https://en.wikipedia.org/wiki/Modern_Times_(film)

As ever, the problem with a gadget isn't what it does: it's who it does it for and who it does it to. There are plenty of benefits from being a centaur – lots of ways that automation can help workers. But the only path to AI profitability lies in reverse centaurs, automation that turns the human in the loop into the crumple-zone for a robot:

https://estsjournal.org/index.php/ests/article/view/260

If you'd like an essay-formatted version of this post to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

https://pluralistic.net/2024/04/01/human-in-the-loop/#monkey-in-the-middle

Image: Cryteria (modified) https://commons.wikimedia.org/wiki/File:HAL9000.svg

CC BY 3.0 https://creativecommons.org/licenses/by/3.0/deed.en

Jorge Royan (modified) https://commons.wikimedia.org/wiki/File:Munich_-_Two_boys_playing_in_a_park_-_7328.jpg

CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/deed.en

Noah Wulf (modified) https://commons.m.wikimedia.org/wiki/File:Thunderbirds_at_Attention_Next_to_Thunderbird_1_-_Aviation_Nation_2019.jpg

CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/deed.en

#pluralistic #ai #supervised ai #humans in the loop #coding assistance #ai art #fully automated luxury communism #labor

379 notes · View notes

arminaa8 · 6 months ago

Text

Saving Fic Epubs and Metadata Using Calibre

I FINALLY spent time learning how to set up Calibre with the FanFicFare plugin in order to easily keep track of fics that I've read. What's really awesome about it, though, is it does way more than just "keep track":

It saves epubs for future downloading (in case the fic is pulled from AO3)

It automatically "scrapes" metadata (title, tags, warnings, etc.) from the fic and includes it in Calibre's built-in spreadsheet

Allows you to create custom categories for things like notes and personal ratings, as well as categories for metadata not scraped by default (word count, for example).

Every bit of information scraped is SEARCHABLE and SORTABLE! Tags, authors, published date, etc.

However, the instructions for how to do all this are not clear-cut and are scattered on different sites and forums. So I've created a little guide based on what worked for me!

Here are my categories (if you know me the fic shown is not a surprise). I couldn't fit them into one long horizontal screenshot so I split them into two.

Most of these are default categories, but some (Thoughts, Words, Summary, Ratings, and Notes) are not. I've excluded default categories I didn't need, and created custom ones for the information I wanted to include.

Tutorial below the cut!

Download and install Calibre.

Once the program is open, click Preferences > Get plugins to enhance calibre Search for and install the FanFicFare and EpubMerge plugins (EpubMerge works in tandem with FFF and allows for downloading an entire fic series into one file).

_________________

Custom Columns

Preferences > Preferences > Add your own columns The custom column screen is shown below. Anything unchecked is a category I didn't want to include in my list. Anything with a column icon next to it means its a custom column I created. To create a new column, click +Add custom column.

The settings I made for each of my custom categories are shown below. Take note of the "column type" for each category. You can make any kind of columns you'd like!

Once you've made your custom categories make sure to click Apply on the "Add your own columns" screen. Now we need to configure the FanFicFare plugin to assign data to some of the custom categories. Click the down arrow next to the FanFicFare plugin icon on the main Calibre screen, then click Configure FanFicFare. On the next screen click the Custom Columns tab.

You will see a list of your custom columns with drop-down menus next to each. For columns you want the plugin to automatically fill, click the drop-down and select the matching data from AO3. There are many options to choose from, including pairing, language, warnings, etc. Note that I left the Notes and Thoughts columns blank. This is because I will input that information manually for each fic.

_________________

Downloading Individual Fics

Arrow next to FanFicFare > Download from URLs Paste in entire-work fic URLs into the black box. I personally found it tedious to copy/paste each link, so instead I found a Firefox extention called Copy All Tab URLs that does exactly what it says on the tin: copies all URLs from any open tabs. Much easier. Click OK. Then, WAIT. It takes a minute to fetch the data. If a fic is restricted, the plugin may show a pop-up asking you to log into AO3 so that it can access the fic.

You're good to go when you see the following pop-up in the bottom-right corner. Click Yes.

_________________

Downloading An Entire Series

FanFicFare combined with EpubMerge allows you to download an entire series into one epub file!

Arrow next to FanFicFare > Get Story URLs from Web Page

Paste in the link to the SERIES page.

Click For Anthology Epub to download everything in that series into a single epub.

The next screen lists all the links in that series. Nothing to do here but click OK.

#calibre #fanficfare #fanficfare plugin #ao3 #archiveofourown #fanfiction #fanfic

30 notes · View notes

probablyasocialecologist · 2 years ago

Text

Regardless of what companies and investors may say, artificial intelligence is not actually intelligent in the way most humans would understand it. To generate words and images, AI tools are trained on large databases of training data that is often scraped off the open web in unimaginably large quantities, no matter who owns it or what biases come along with it. When a user then prompts ChatGPT or DALL-E to spit out some text or visuals, the tools aren’t thinking about the best way to represent those prompts because they don’t have that ability. They’re comparing the terms they’re presented with the patterns they formed from all the data that was ingested to train their models, then trying to assemble elements from that data to reflect what the user is looking for. In short, you can think of it like a more advanced form of autocorrect on your phone’s keyboard, predicting what you might want to say next based on what you’ve already written and typed out in the past. If it’s not clear, that means these systems don’t create; they plagiarize. Unlike a human artist, they can’t develop a new artistic style or literary genre. They can only take what already exists and put elements of it together in a way that responds to the prompts they’re given. There’s good reason to be concerned about what that will mean for the art we consume, and the richness of the human experience.

[...]

AI tools will not eliminate human artists, regardless of what corporate executives might hope. But it will allow companies to churn out passable slop to serve up to audiences at a lower cost. In that way, it allows a further deskilling of art and devaluing of artists because instead of needing a human at the center of the creative process, companies can try to get computers to churn out something good enough, then bring in a human with no creative control and a lower fee to fix it up. As actor Keanu Reeves put it to Wired earlier this year, “there’s a corporatocracy behind [AI] that’s looking to control those things. … The people who are paying you for your art would rather not pay you. They’re actively seeking a way around you, because artists are tricky.” To some degree, this is already happening. Actors and writers in Hollywood are on strike together for the first time in decades. That’s happening not just because of AI, but how the movie studios and steaming companies took advantage of the shift to digital technologies to completely remake the business model so workers would be paid less and have less creative input. Companies have already been using AI tools to assess scripts, and that’s one example of how further consolidation paired with new technologies are leading companies to prioritize “content” over art. The actors and writers worry that if they don’t fight now, those trends will continue — and that won’t just be bad for them, but for the rest of us too.

#ai #artificial intelligence #chatgpt #ai art #silicon valley

286 notes · View notes

dearest-departed · 2 months ago

Text

alright—

let today mark the moment i crawl out of my miserable shell to try something new.

i’ve been meaning to create something, a specific environment in my own cozy little corner of the web where we can discuss things like politics, religion, personal experience- the works. so the past few days i’ve been cooking this up. (be forewarned, this is a lot)

a trial run, perhaps- the ball is in your court, a segment of discussion topics from yours truly. i want to hear from YOU, my tumblr following, or strangers we pick up along the way.

there’s no pressure to interact with me. not in the slightest- but a gentle push here and there can’t hurt. right? (right?)

mm. now, you may be wondering;

“Andrew, what sparked this idea?”

well i’m glad you asked!

when this idea first came to me i was dissociating in the back of an oven on wheels. total out of body, watching-from-afar type of dissociation. my first question to you would’ve been- what are your experiences with that? but for my first time initiating these discussions.. i wanted to go with something a more.. philosophical approach.

so my question to you, Tumblr, is—

Would you sacrifice your thoughts and beliefs for a presumably better life, and continue to sacrifice them to keep that life?

i’ll lead with an example;

society has undergone a transformation— you have your bottom feeders, the scum- discarded and thrown away. your “common folk”, middle class- relatively ignored and disregarded by most. and on top of the food chain, the upper echelon, the elites. (wonderfully done andrew, you’ve just explained TODAYS society!)

you find yourself amongst the lowest of the low, just barely scraping by. so let’s say you’ve just been granted an opportunity to move up the ranks and have your “rags to riches” moment. but in order to be granted this title, to metamorphose (or so to speak)— you have to betray your core principles. this could be religiously, morally and ethically, or even socially. let’s say (for examples sake), this means plowing down a thousand year old tree that’s connected to many spiritual traditions— or something a bit less hands on, like allowing your likeness to be the face of an ad campaign that promotes a more modern approach to lobotomizing the masses. (cute, right?)

now;

really think about it. one’s first response might be “no, no way i’d do that”. so put yourself in those shoes and really imagine it. you’re treated like shit day in and day out, kicked, spat at, berated and belittled. maybe you’re at your wits end. who knows? this is your imagined scenario to do as you please with. shoot me replies in my asks, and i’ll share my own thoughts with you.

disclaimer; i don’t expect this to go anywhere. ignore me, or watch from afar— but i wanted a way to really engage with tumblr in a way that wasn’t just “oh, send me questions! pick my brain for hours on end and make me really contemplate the things i keep shoved so so so so so so so very far down!” i’d also love to hear your questions, your prompts. so have fun or.. send me anthrax in the mail. whatever

based on how well this does i’ll have a lot more to ask you guys in the future. fill up my time with something to contemplate, and i’ll do the same.

until we meet again,

forever will i be that rock stuck in your sole. you can try to get me out, but no amount of poking and prodding and digging will remove the roots i’ve spread throughout your flesh and bone.

#the ball is in your court - thought experiments and discussions #andrew graves #tcoaal #fictive #the coffin of andy and leyley

9 notes · View notes

mariacallous · 4 months ago

Text

Are there generative AI tools I can use that are perhaps slightly more ethical than others? —Better Choices

No, I don't think any one generative AI tool from the major players is more ethical than any other. Here’s why.

For me, the ethics of generative AI use can be broken down to issues with how the models are developed—specifically, how the data used to train them was accessed—as well as ongoing concerns about their environmental impact. In order to power a chatbot or image generator, an obscene amount of data is required, and the decisions developers have made in the past—and continue to make—to obtain this repository of data are questionable and shrouded in secrecy. Even what people in Silicon Valley call “open source” models hide the training datasets inside.

Despite complaints from authors, artists, filmmakers, YouTube creators, and even just social media users who don’t want their posts scraped and turned into chatbot sludge, AI companies have typically behaved as if consent from those creators isn’t necessary for their output to be used as training data. One familiar claim from AI proponents is that to obtain this vast amount of data with the consent of the humans who crafted it would be too unwieldy and would impede innovation. Even for companies that have struck licensing deals with major publishers, that “clean” data is an infinitesimal part of the colossal machine.

Although some devs are working on approaches to fairly compensate people when their work is used to train AI models, these projects remain fairly niche alternatives to the mainstream behemoths.

And then there are the ecological consequences. The current environmental impact of generative AI usage is similarly outsized across the major options. While generative AI still represents a small slice of humanity's aggregate stress on the environment, gen-AI software tools require vastly more energy to create and run than their non-generative counterparts. Using a chatbot for research assistance is contributing much more to the climate crisis than just searching the web in Google.

It’s possible the amount of energy required to run the tools could be lowered—new approaches like DeepSeek’s latest model sip precious energy resources rather than chug them—but the big AI companies appear more interested in accelerating development than pausing to consider approaches less harmful to the planet.

How do we make AI wiser and more ethical rather than smarter and more powerful? —Galaxy Brain

Thank you for your wise question, fellow human. This predicament may be more of a common topic of discussion among those building generative AI tools than you might expect. For example, Anthropic’s “constitutional” approach to its Claude chatbot attempts to instill a sense of core values into the machine.

The confusion at the heart of your question traces back to how we talk about the software. Recently, multiple companies have released models focused on “reasoning” and “chain-of-thought” approaches to perform research. Describing what the AI tools do with humanlike terms and phrases makes the line between human and machine unnecessarily hazy. I mean, if the model can truly reason and have chains of thoughts, why wouldn’t we be able to send the software down some path of self-enlightenment?

Because it doesn’t think. Words like reasoning, deep thought, understanding—those are all just ways to describe how the algorithm processes information. When I take pause at the ethics of how these models are trained and the environmental impact, my stance isn’t based on an amalgamation of predictive patterns or text, but rather the sum of my individual experiences and closely held beliefs.

The ethical aspects of AI outputs will always circle back to our human inputs. What are the intentions of the user’s prompts when interacting with a chatbot? What were the biases in the training data? How did the devs teach the bot to respond to controversial queries? Rather than focusing on making the AI itself wiser, the real task at hand is cultivating more ethical development practices and user interactions.

13 notes · View notes

mushiemellows · 11 months ago

Text

On Rivd and AI

So last night I made this post and said I'd elaborate more in the morning and when I had the time to do a bit of research. Upon doing said research, I realized that I had misunderstood the concerns being raised with the Rivd situation, but that isn't the case any more. However, some of my thoughts on ai still stand. Heads up, this is going to be a long post. Some actual proper blogging for once wow.

I'm going to discuss the Rivd phishing scam, what can be done for fic writers as ai begins to invade fan spaces, and my elaborated thoughts on Language Learning Models. Warning for transparency: I did utilize chat gpt for this post, NOT for the text itself but to provide examples of the current state of LLMs. Some articles I link to will also be ai generated, and their generated quality is part of what I'll be warning about. This is not a generated post and you can tell because I've got those nifty writing things called "voice" and "style."

ANYWAYS:

Okay so what was the Rivd situation? So two days ago this post was uploaded on tumblr, linking back to a twitter thread on the same topic. I saw it late last night because I was traveling. A reddit post was also uploaded 3 days ago. According to google trends, there was a slight uptick in search traffic the week of June 23rd, and a more severe uptick last week (June 30th-July 6th). That's all to say, this website did not exist until last week, caused a stir, and immediately was put down.

Rivd is not longer up. Enough people contacted its web hosting service Cloudflare and they took the site down. This happened yesterday, from the looks of it.

So, then, what was Rivd? And more importantly, what was the point of scraping a chunk of ao3 and re-uploading it? There seems to be 2 possible theories.

1) The more innocent of the two: they genuinely want to be an ao3 competitor. I can't look at the website any more, and very little positive results appear when googled, but I did find one ai-generated puff piece called "Exploring Rivd: The Premier Platform for Movie-Based Fanfiction" posted to Medium last week by one "Steffen Holzmann" (if that is your real name... x to doubt). This account appeared the same week that Rivd had that first little uptick in google queries, so it is undoubtedly made by the people running the website themselves to create an air of legitimacy. Medium appears to be a safe enough website that you can click that link if you really want to, but you shouldn't. It's a bad generated article, there's very little to glean from it. But it is a remnant source on what Rivd was claiming to be, before it was taken down. Here's the conclusion from the article, the only portion that gave any actual information (and it barely offers any):

Rivd is the ultimate platform for movie-based fanfiction, offering a diverse range of content, a supportive community, and robust interactive features. Whether you’re a writer looking to share your work or a reader seeking new adventures in your favorite movie universes, Rivd provides the perfect platform to engage with a passionate and creative community. Start your journey on Rivd today and immerse yourself in the world of fanfiction.

There's a second article by Holzmann titled "Mastering the Art of Fanfiction Writing in 2024" that's essentially similar ai bull, but trades explaining that fans can write Star Wars fic for explaining that you can make OC's and maybe get a beta (not that that's advice I've ever heeded. Beta? Not in this house we don't.) This was posted six days ago and similarly spends half the time jerking Rivd off. That's all to say, if they are to be believed at face value, this website wanted to just be a fic hosting site. Scraping Ao3 would have made it seem like there was already an active user base for anyone they were attempting to attract, like buying your first 50,000 instagram followers. Anyone actually looking to use this as a fic site would have quickly realized that there's no one on it and no actual fan engagement. There's already fan community spaces online. This website offers nothing ao3 or ffn or wattpad or livejournal or tumblr or reddit didn't already.

Similarly, it reeks of tech bro. Between the scraping and the ai articles, the alarms are already going off. According to that Reddit thread, they were based out of Panama, though that doesn't mean much other than an indicator that these are the type of people to generate puff articles and preemptively base their business off-shore. Holzmann, it should be mentioned, also only has 3 followers, which means my tiny ass blog already has more reach than him. Don't go following that guy. The two comments on the first article are also disparaging of Rivd. This plan didn't work and was seen right through immediately.

If fan communities, and those who write fic know anything, it's how to sniff out when someone isn't being genuine. People write fic for the love of the game, at least generally. It's a lot of work to do for free, and it's from a place of love. Ao3 is run on volunteers and donations. If this genuinely is meant to be a business bro website to out-compete ao3, then they will be sorely disappointed to learn that there's no money in this game. It would be short lived anyway. A website like this was never going to work, or if it was, it would need to ban all copyrighted and explicit materials. You know, the pillars of fic.

So then what was the point of all of this? Unless there was a more nefarious plan going on.

2) Rivd was a phishing scam. This is so so so much more likely. The mark for the scam isn't fic readers, it's fic writers. Here's how it works: they scrape a mass of ao3 accounts for their stories, you catch it, you enter a lengthy form with personal info like your full name and address etc. requesting they take your work down, they sell your data. Yes, a lot of personal info is required to take copyrighted materials down on other sites, too. That's what makes it a good scam. Fic already sits in a legal grey area (you have a copyright over your fic but none of the characters/settings/borrowed plot within it. You also CANNOT make money off of fic writing). So the site holds your works ransom, and you can't go to Marvel or Shueisha or fuck it the ghost of Ann Rice herself to deal with this on your behalf. Thankfully, enough people were able to submit valid DMCA's to Cloudflare to deal with the issue from the top.

Remember this resolution for the next time this situation arises (because of course there will be a next time). Go through higher means rather than the site itself. These scams are only getting more bold. Me personally? I'm not going to give that amount of personal info to a website that shady. Be aware of the warning signs for phishing attacks. Unfortunately, a lot of the resources online are still around text/email phishing. We live in a time where there's legal data harvesting and selling, and illegal data harvesting and selling, and the line in between the two is thin and blurry. Here's an FTC article on the signs of phishing scams, but again, it's more about emails.

I should note, I do not think that Rivd is connected to the ransomware virus of the same name that popped up two or three years ago [link is to Rivd page on PCrisk, a cypersecurity/anti-malware website]. It's probably just coincidence.... but even so. A new business priding itself on SEO and all that tech guy crap should know not to name itself the same thing as a literal virus meant to scam out out of a thousand dollars.

That's all to say, this was absolutely a scam meant to take personal info from ao3 writers. And that blows. To love art and writing and creation so much just to have your works held hostage over data feels really bad. Of course it does!

So, should you lock down your ao3 account?

That, to me, is a little trickier. You can do as you please, of course, and you should do what makes you feel safest. Me personally, though, I don't plan on it. I really, really like that guests can interact with my work from the outside. Ao3 still uses an invite system, so a lot of regular users still don't have accounts for any number of reasons. I read a lot of the time logged out anyways. I started writing again last year after all the info on the ao3 Language Learning Model training had already come out. Like I talked about in my last post, I set out to write things that a computer couldn't produce. (from here on out, I'm going to be using One Piece fic as an example, simply because that's what I write. It's also in a really prime position for a lot of ai discussion due to the size of the fandom and the size of the original work)

I'm going to be honest, I'm not all that afraid of my work being scraped for LLMs to train off of. Text generators have existed for a decade now, but they really popped off last year. LLM's are at a point, though, where the tech is sort of stagnating after the initial boom. The flaws of current ai chat bots are so baked into the flaws of ai itself that they won't be ironed out soon. They make up info, they're easily manipulated, they simply are not people. The flaw here is deep- you can trace it to the fact that we don't know enough about neuroscience itself, and even then neuroscientists don't make ai. Ai programers do. Complex info already has to make one synaptic jump without losing any understanding from person to person, and then it has to be perfectly abstracted for a machine to comprehend. It can't be done with the tools we currently have. I DO hate to admit it, but there are some decent uses for ai. if you are specifically a python programmer. The rest of us? Maybe not.

I don't like generated fic, I don't read it on principle. Checking on ao3, though, there's only 4 total generated op fics (3 in spanish, one in english, 3 xreader, 1 zosan but 2nd person zosan? and the author edited the framework the computer generated text. I still didn't enjoy the fic, tho.) But that breaks down to .005% of all op fics. I went into chat gpt and asked it to generate a drabble for me based on One Piece, and this is what it wrote:

[Image ID: A screenshot of chat GPT. The first line shows my prompt. It reads: You- write me a short 100 word fanfiction based on the manga One Piece. Next line, Chat GPT- "On the shores of an uncharted island, the Straw Hat Pirates stumbled upon a hidden treasure trove. Among the glittering gold and jewels, Luffy discovered an old, weathered journal. Inside, he found sketches and notes detailing the adventures of a legendary pirate who once roamed the Grand Line. Excited by the prospect of a new adventure, the crew set sail. following clues left in the journal. Along the way, they faced formidable foes and forged new alliances, unraveling the mysteries of the lost pirate's legacy. With each island they conquered, they grew closer to uncovering the ultimate treasure- a legacy that would shape their own journey." End ID]

Okay so like, this blows, yeah? 75,000 One Piece fics to train off of on ao3 and it still can't like, write a decent little story. This reads more like an episode summary or a book dust jacket blurb. Drabbles have an internal structure that make them interesting, much like a single sentence short story. You have to get very baby shoes never worn with them, yeah? Ai doesn't understand that, it just knows how many words it has to generate. It doesn't have any sense for creative language, metaphor, simile, poetry. I asked it to write me a poem and it wrote me the same poem with thesaurus replacements each time it re-generated one. A lot about dreams and stars and midnight and whispers, over and over again. The same rhyme scheme and cadence too.

It's also- vitally- so unhorny that it's almost a crime. For a test, I asked it to generate a story about a uhhhhh robot and an archeologist kissing. Just to see what it would give me. I ran 4 tests, a robot falling in love with an archeologist, a robot kissing an archeologist, an archeologist kissing a robot, and an archeologist falling in love with a robot. All 4 were incredibly chaste, quiet little drabbles about the innocence of discovery or whatever the fuck. Here's one of them:

[Image ID: A screenshot of chat GPT. The first line shows my prompt. It reads: You- write me a short 100 word story about an archeologist kissing a robot. Next line, Chat GPT- "Int he dimly lit chamber, Professor Evelyn Porter stared in awe at the ancient artifact- a remarkably lifelike robot buried for millennia. As she leaned closer to examine its intricacies, a surge of curiosity and admiration swept over her. In a moment of spontaneous fascination, she brushed her lips against the cool metal of its cheek, a gesture of reverence and connection across ages. The robot, dormant for eons, stirred faintly in response to the unexpected contact. It was fleeting, almost surreal exchange between past and future, bridging the gap between human exploration and technological marvels of ancient civilizations." End ID]

So like, some of these phrases gets to what's vaguely intriguing about this dynamic, sure. But at the end of the day it's still uhhhh chaste as hell. A reverent kiss??? Not what I want in my fic!!!! This is all to say, LLM's can scrape ao3 all they want, that doesn't mean they can USE what they're stealing. The training wheels have to stay on for corporate palatability. I'm stealing, I'm taking these dolls out of Shueisha's grubby hands and I'm making them sexy kiss for FREE.

In my opinion, the easiest way to keep your work out of the hands of ai is to write something they cannot use. If the grey area of copyright lies in how much is transformed, then motherfucking TRANSFORM it. Write incomprehensible smut. Build surreal worlds. Write poems and metaphors and flush out ideas that a computer could never dream of. Find niches. Get funky with it. Take it too far. and then take it even farther. Be a little freaking weirdo, you're already writing fic so then why be normal about it, you know? Even if they rob you, they can't use it. Like fiber in the diet, undigestible. Make art, make magic.

Even so, I don't mind if the computer keeps a little bit of my art. If you've ever read one of my fics, and then thought about it when you're doing something else or listening to a song or reading a book, that means something I made has stuck with you just a little bit. That;'s really cool to me, I don't know you but I live in your brain. I've made you laugh or cry or c** from my living room on the other side of the world without knowing it. It's part of why I love to write. In all honesty, I don't mind if a computer "reads" my work and a little bit of what I created sticks with it. Even if it's more in a technical way.

Art, community, fandom- they're all part of this big conversation about the world as we experience it. The way to stop websites like Rivd is how we stopped it this week. By talking to each other, by leaning on fan communities, by sending a mass of DMCA's to web host daddy. Participation in fandom spaces keeps the game going, reblogging stuff you like and sending asks and having fun and making art is what will save us. Not to sound like a sappy fuck, but really caring about people and the way we all experience the same art but interpret it differently, that's the heart of the whole thing. It's why we do this. It's meant to be fun. Love and empathy and understanding is the foundation. Build from there. Be confident in the things you make, it's the key to having your own style. You'll find your people. You aren't alone, but you have to also be willing to toss the ball back and forth with others. It takes all of us to play, even if we look a little foolish.

#meta #fandom #fanfic #ao3 #again i put this in my last post but this is JUST about LLMs #ai image generation is a whole other story #and also feel free to have opposing thoughts #i'm total open to learning more about this topic #LONG post

25 notes · View notes

chaoskirin · 1 year ago

Text

How to use Nightshade to Protect Your Art

Nightshade is a program that is relatively easy to use. You can search for it using the search term "glaze nightshade."

You WILL have to download popular image models so Nightshade recognizes what your art is and is able to poison it. This is done automatically the first time you run the program.

I have done extensive research into this, and have even talked Sean Shan of University of Chicago and have been assured that YOUR data is NOT being retained. This is a case of using AI to fight against itself. At this point, it is the best option to prevent your art and photography from being scraped.

Even though this program presents no danger to end users, you should be informed of this.

After everything is downloaded, you should select an image you wish to "Shade."

Once you select your image, Nightshade will pick a tag that it believes covers what's in the image. Sometimes this tag is wrong or not useful. (For example, I loaded a drawing of Brian May into the program, and it tagged it "woman." I changed it to "man.") The tag must be one word, and should be relatively general.

Images with less detail should have less poisoning applied. For my art, I use the default setting. While this does cause noticeable artifacts, it is not so much that it distorts the image. It just looks like a compressed jpg.

You will also need to choose the render quality. I usually choose the highest setting, which is the slowest, and takes about 20 minutes. It's worth it.

Then, choose a spot to save the image. It will save as the original file name with data appended. I generally delete this appended info from the new file before I post it.

When you do post it on social media, your alt text should contain the tag you used when you created it. For example, I posted a skull and put "A hyena skull in greyscale" in the alt text. This ensures that language models will pick up the art as "skull" and this will contribute to poisoning the skull dataset.

If you are posting the image on your own website, you should make sure to add the tag to the metadata of your image.

Then you post it!

Again, if you have any art that you want to run through Nightshade, please contact me and I'll gladly do it for you. There will be a web version of nightshade eventually, which will make the process much easier. But for now, don't be afraid to rely on people whose GPUs can handle it!

#glaze #nightshade #glaze nightshade

34 notes · View notes

jbfly46 · 5 months ago

Text

Your All-in-One AI Web Agent: Save $200+ a Month, Unleash Limitless Possibilities!

Imagine having an AI agent that costs you nothing monthly, runs directly on your computer, and is unrestricted in its capabilities. OpenAI Operator charges up to $200/month for limited API calls and restricts access to many tasks like visiting thousands of websites. With DeepSeek-R1 and Browser-Use, you:

• Save money while keeping everything local and private.

• Automate visiting 100,000+ websites, gathering data, filling forms, and navigating like a human.

• Gain total freedom to explore, scrape, and interact with the web like never before.

You may have heard about Operator from Open AI that runs on their computer in some cloud with you passing on private information to their AI to so anything useful. AND you pay for the gift . It is not paranoid to not want you passwords and logins and personal details to be shared. OpenAI of course charges a substantial amount of money for something that will limit exactly what sites you can visit, like YouTube for example. With this method you will start telling an AI exactly what you want it to do, in plain language, and watching it navigate the web, gather information, and make decisions—all without writing a single line of code.

In this guide, we’ll show you how to build an AI agent that performs tasks like scraping news, analyzing social media mentions, and making predictions using DeepSeek-R1 and Browser-Use, but instead of writing a Python script, you’ll interact with the AI directly using prompts.

These instructions are in constant revisions as DeepSeek R1 is days old. Browser Use has been a standard for quite a while. This method can be for people who are new to AI and programming. It may seem technical at first, but by the end of this guide, you’ll feel confident using your AI agent to perform a variety of tasks, all by talking to it. how, if you look at these instructions and it seems to overwhelming, wait, we will have a single download app soon. It is in testing now.

This is version 3.0 of these instructions January 26th, 2025.

This guide will walk you through setting up DeepSeek-R1 8B (4-bit) and Browser-Use Web UI, ensuring even the most novice users succeed.

What You’ll Achieve

By following this guide, you’ll:

1. Set up DeepSeek-R1, a reasoning AI that works privately on your computer.

2. Configure Browser-Use Web UI, a tool to automate web scraping, form-filling, and real-time interaction.

3. Create an AI agent capable of finding stock news, gathering Reddit mentions, and predicting stock trends—all while operating without cloud restrictions.

A Deep Dive At ReadMultiplex.com Soon

We will have a deep dive into how you can use this platform for very advanced AI use cases that few have thought of let alone seen before. Join us at ReadMultiplex.com and become a member that not only sees the future earlier but also with particle and pragmatic ways to profit from the future.

System Requirements

Hardware

• RAM: 8 GB minimum (16 GB recommended).

• Processor: Quad-core (Intel i5/AMD Ryzen 5 or higher).

• Storage: 5 GB free space.

• Graphics: GPU optional for faster processing.

Software

• Operating System: macOS, Windows 10+, or Linux.

• Python: Version 3.8 or higher.

• Git: Installed.

Step 1: Get Your Tools Ready

We’ll need Python, Git, and a terminal/command prompt to proceed. Follow these instructions carefully.

Install Python

1. Check Python Installation:

• Open your terminal/command prompt and type:

python3 --version

• If Python is installed, you’ll see a version like:

Python 3.9.7

2. If Python Is Not Installed:

• Download Python from python.org.

• During installation, ensure you check “Add Python to PATH” on Windows.

3. Verify Installation:

python3 --version

Install Git

1. Check Git Installation:

• Run:

git --version

• If installed, you’ll see:

git version 2.34.1

2. If Git Is Not Installed:

• Windows: Download Git from git-scm.com and follow the instructions.

• Mac/Linux: Install via terminal:

sudo apt install git -y # For Ubuntu/Debian

brew install git # For macOS

Step 2: Download and Build llama.cpp

We’ll use llama.cpp to run the DeepSeek-R1 model locally.

1. Open your terminal/command prompt.

2. Navigate to a clear location for your project files:

mkdir ~/AI_Project

cd ~/AI_Project

3. Clone the llama.cpp repository:

git clone https://github.com/ggerganov/llama.cpp.git

cd llama.cpp

4. Build the project:

• Mac/Linux:

make

• Windows:

• Install a C++ compiler (e.g., MSVC or MinGW).

• Run:

mkdir build

cd build

cmake ..

cmake --build . --config Release

Step 3: Download DeepSeek-R1 8B 4-bit Model

1. Visit the DeepSeek-R1 8B Model Page on Hugging Face.

2. Download the 4-bit quantized model file:

• Example: DeepSeek-R1-Distill-Qwen-8B-Q4_K_M.gguf.

3. Move the model to your llama.cpp folder:

mv ~/Downloads/DeepSeek-R1-Distill-Qwen-8B-Q4_K_M.gguf ~/AI_Project/llama.cpp

Step 4: Start DeepSeek-R1

1. Navigate to your llama.cpp folder:

cd ~/AI_Project/llama.cpp

2. Run the model with a sample prompt:

./main -m DeepSeek-R1-Distill-Qwen-8B-Q4_K_M.gguf -p "What is the capital of France?"

3. Expected Output:

The capital of France is Paris.

Step 5: Set Up Browser-Use Web UI

1. Go back to your project folder:

cd ~/AI_Project

2. Clone the Browser-Use repository:

git clone https://github.com/browser-use/browser-use.git

cd browser-use

3. Create a virtual environment:

python3 -m venv env

4. Activate the virtual environment:

• Mac/Linux:

source env/bin/activate

• Windows:

env\Scripts\activate

5. Install dependencies:

pip install -r requirements.txt

6. Start the Web UI:

python examples/gradio_demo.py

7. Open the local URL in your browser:

http://127.0.0.1:7860

Step 6: Configure the Web UI for DeepSeek-R1

1. Go to the Settings panel in the Web UI.

2. Specify the DeepSeek model path:

~/AI_Project/llama.cpp/DeepSeek-R1-Distill-Qwen-8B-Q4_K_M.gguf

3. Adjust Timeout Settings:

• Increase the timeout to 120 seconds for larger models.

4. Enable Memory-Saving Mode if your system has less than 16 GB of RAM.

Step 7: Run an Example Task

Let’s create an agent that:

1. Searches for Tesla stock news.

2. Gathers Reddit mentions.

3. Predicts the stock trend.

Example Prompt:

Search for "Tesla stock news" on Google News and summarize the top 3 headlines. Then, check Reddit for the latest mentions of "Tesla stock" and predict whether the stock will rise based on the news and discussions.

Congratulations! You’ve built a powerful, private AI agent capable of automating the web and reasoning in real time. Unlike costly, restricted tools like OpenAI Operator, you’ve spent nothing beyond your time. Unleash your AI agent on tasks that were once impossible and imagine the possibilities for personal projects, research, and business. You’re not limited anymore. You own the web—your AI agent just unlocked it! 🚀

Stay tuned fora FREE simple to use single app that will do this all and more.

#DeepSeek #artificial intelligence #AI #locally run AI #free Ai

7 notes · View notes

snickerdoodlles · 2 years ago

Text

pulling out a section from this post (a very basic breakdown of generative AI) for easier reading;

AO3 and Generative AI

There are unfortunately some massive misunderstandings in regards to AO3 being included in LLM training datasets. This post was semi-prompted by the ‘Knot in my name’ AO3 tag (for those of you who haven’t heard of it, it’s supposed to be a fandom anti-AI event where AO3 writers help “further pollute” AI with Omegaverse), so let’s take a moment to address AO3 in conjunction with AI. We’ll start with the biggest misconception:

1. AO3 wasn’t used to train generative AI.

Or at least not anymore than any other internet website. AO3 was not deliberately scraped to be used as LLM training data.

The AO3 moderators found traces of the Common Crawl web worm in their servers. The Common Crawl is an open data repository of raw web page data, metadata extracts and text extracts collected from 10+ years of web crawling. Its collective data is measured in petabytes. (As a note, it also only features samples of the available pages on a given domain in its datasets, because its data is freely released under fair use and this is part of how they navigate copyright.) LLM developers use it and similar web crawls like Google’s C4 to bulk up the overall amount of pre-training data.

AO3 is big to an individual user, but it’s actually a small website when it comes to the amount of data used to pre-train LLMs. It’s also just a bad candidate for training data. As a comparison example, Wikipedia is often used as high quality training data because it’s a knowledge corpus and its moderators put a lot of work into maintaining a consistent quality across its web pages. AO3 is just a repository for all fanfic -- it doesn’t have any of that quality maintenance nor any knowledge density. Just in terms of practicality, even if people could get around the copyright issues, the sheer amount of work that would go into curating and labeling AO3’s data (or even a part of it) to make it useful for the fine-tuning stages most likely outstrips any potential usage.

Speaking of copyright, AO3 is a terrible candidate for training data just based on that. Even if people (incorrectly) think fanfic doesn’t hold copyright, there are plenty of books and texts that are public domain that can be found in online libraries that make for much better training data (or rather, there is a higher consistency in quality for them that would make them more appealing than fic for people specifically targeting written story data). And for any scrapers who don’t care about legalities or copyright, they’re going to target published works instead. Meta is in fact currently getting sued for including published books from a shadow library in its training data (note, this case is not in regards to any copyrighted material that might’ve been caught in the Common Crawl data, its regarding a book repository of published books that was scraped specifically to bring in some higher quality data for the first training stage). In a similar case, there’s an anonymous group suing Microsoft, GitHub, and OpenAI for training their LLMs on open source code.

Getting back to my point, AO3 is just not desirable training data. It’s not big enough to be worth scraping for pre-training data, it’s not curated enough to be considered for high quality data, and its data comes with copyright issues to boot. If LLM creators are saying there was no active pursuit in using AO3 to train generative AI, then there was (99% likelihood) no active pursuit in using AO3 to train generative AI.

AO3 has some preventative measures against being included in future Common Crawl datasets, which may or may not work, but there’s no way to remove any previously scraped data from that data corpus. And as a note for anyone locking their AO3 fics: that might potentially help against future AO3 scrapes, but it is rather moot if you post the same fic in full to other platforms like ffn, twitter, tumblr, etc. that have zero preventative measures against data scraping.

2. A/B/O is not polluting generative AI

…I’m going to be real, I have no idea what people expected to prove by asking AI to write Omegaverse fic. At the very least, people know A/B/O fics are not exclusive to AO3, right? The genre isn’t even exclusive to fandom -- it started in fandom, sure, but it expanded to general erotica years ago. It’s all over social media. It has multiple Wikipedia pages.

More to the point though, omegaverse would only be “polluting” AI if LLMs were spewing omegaverse concepts unprompted or like…associated knots with dicks more than rope or something. But people asking AI to write omegaverse and AI then writing omegaverse for them is just AI giving people exactly what they asked for. And…I hate to point this out, but LLMs writing for a niche the LLM trainers didn’t deliberately train the LLMs on is generally considered to be a good thing to the people who develop LLMs. The capability to fill niches developers didn’t even know existed increases LLMs’ marketability. If I were a betting man, what fandom probably saw as a GOTCHA moment, AI people probably saw as a good sign of LLMs’ future potential.

3. Individuals cannot affect LLM training datasets.

So back to the fandom event, with the stated goal of sabotaging AI scrapers via omegaverse fic.

…It’s not going to do anything.

Let’s add some numbers to this to help put things into perspective:

LLaMA’s 65 billion parameter model was trained on 1.4 trillion tokens. Of that 1.4 trillion tokens, about 67% of the training data was from the Common Crawl (roughly ~3 terabytes of data).

3 terabytes is 3,000,000,000 kilobytes.

That’s 3 billion kilobytes.

According to a news article I saw, there has been ~450k words total published for this campaign (*this was while it was going on, that number has probably changed, but you’re about to see why that still doesn’t matter). So, roughly speaking, ~450k of text is ~1012 KB (I’m going off the document size of a plain text doc for a fic whose word count is ~440k).

So 1,012 out of 3,000,000,000.

Aka 0.000034%.

And that 0.000034% of 3 billion kilobytes is only 2/3s of the data for the first stage of training.

And not to beat a dead horse, but 0.000034% is still grossly overestimating the potential impact of posting A/B/O fic. Remember, only parts of AO3 would get scraped for Common Crawl datasets. Which are also huge! The October 2022 Common Crawl dataset is 380 tebibytes. The April 2021 dataset is 320 tebibytes. The 3 terabytes of Common Crawl data used to train LLaMA was randomly selected data that totaled to less than 1% of one full dataset. Not to mention, LLaMA’s training dataset is currently on the (much) larger size as compared to most LLM training datasets.

I also feel the need to point out again that AO3 is trying to prevent any Common Crawl scraping in the future, which would include protection for these new stories (several of which are also locked!).

Omegaverse just isn’t going to do anything to AI. Individual fics are going to do even less. Even if all of AO3 suddenly became omegaverse, it’s just not prominent enough to influence anything in regards to LLMs. You cannot affect training datasets in any meaningful way doing this. And while this might seem really disappointing, this is actually a good thing.

Remember that anything an individual can do to LLMs, the person you hate most can do the same. If it were possible for fandom to corrupt AI with omegaverse, fascists, bigots, and just straight up internet trolls could pollute it with hate speech and worse. AI already carries a lot of biases even while developers are actively trying to flatten that out, it’s good that organized groups can’t corrupt that deliberately.

#generative ai #pulling this out wasnt really prompted by anything specific #so much as heard some repeated misconceptions and just #sighs #nope #incorrect #u got it wrong #sorry #unfortunately for me: no consistent tag to block #sigh #ao3

101 notes · View notes

auncyen · 10 months ago

Text

currently reading "Automate the Boring Stuff with Python" to refresh myself on some things, learn new things like, for example, web scraping (had tried to get into it previously and didn't quite get how to make it work)

anyway I seem to be understanding a little better, but also it's near bedtime, I'm ready to start winding down, but oh here's a url the book is recommending I check out to understand one bit better, I'll just go to that now so...I...can...save... it...

...

Maybe it is time for bed.

#LOL #I actually might stay up a little longer but #I'm certainly done with python atm #methinks someone realized that link was in the book

9 notes · View notes

papercranesong · 11 days ago

Text

Mythbusting Generative AI: The Ethical ChatGPT Is Out There

I've been hyperfixating learning a lot about Generative AI recently and here's what I've found - genAI doesn’t just apply to chatGPT or other large language models.

Small Language Models (specialised and more efficient versions of the large models)

are also generative

can perform in a similar way to large models for many writing and reasoning tasks

are community-trained on ethical data

and can run on your laptop.

"But isn't analytical AI good and generative AI bad?"

Fact: Generative AI creates stuff and is also used for analysis

In the past, before recent generative AI developments, most analytical AI relied on traditional machine learning models. But now the two are becoming more intertwined. Gen AI is being used to perform analytical tasks – they are no longer two distinct, separate categories. The models are being used synergistically.

For example, Oxford University in the UK is partnering with open.ai to use generative AI (ChatGPT-Edu) to support analytical work in areas like health research and climate change.

"But Generative AI stole fanfic. That makes any use of it inherently wrong."

Fact: there are Generative AI models developed on ethical data sets

Yes, many large language models scraped sites like AO3 without consent, incorporating these into their datasets to train on. That’s not okay.

But there are Small Language Models (compact, less powerful versions of LLMs) being developed which are built on transparent, opt-in, community-curated data sets – and that can still perform generative AI functions in the same way that the LLMS do (just not as powerfully). You can even build one yourself.

No it's actually really cool! Some real-life examples:

Dolly (Databricks): Trained on open, crowd-sourced instructions

RedPajama (Together.ai): Focused on creative-commons licensed and public domain data

There's a ton more examples here.

(A word of warning: there are some SLMs like Microsoft’s Phi-3 that have likely been trained on some of the datasets hosted on the platform huggingface (which include scraped web content like from AO3), and these big companies are being deliberately sketchy about where their datasets came from - so the key is to check the data set. All SLMs should be transparent about what datasets they’re using).

"But AI harms the environment, so any use is unethical."

Fact: There are small language models that don't use massive centralised data centres.

SLMs run on less energy, don’t require cloud servers or data centres, and can be used on laptops, phones, Raspberry Pi’s (basically running AI locally on your own device instead of relying on remote data centres)

If you're interested -

You can build your own SLM and even train it on your own data.

Let's recap

Generative AI doesn't just include the big tools like chatGPT - it includes the Small Language Models that you can run ethically and locally

Some LLMs are trained on fanfic scraped from AO3 without consent. That's not okay

But ethical SLMs exist, which are developed on open, community-curated data that aims to avoid bias and misinformation - and you can even train your own models

These models can run on laptops and phones, using less energy

AI is a tool, it's up to humans to wield it responsibly

It means everything – and nothing

Everything – in the sense that it might remove some of the barriers and concerns people have which makes them reluctant to use AI. This may lead to more people using it - which will raise more questions on how to use it well.

It also means that nothing's changed – because even these ethical Small Language Models should be used in the same way as the other AI tools - ethically, transparently and responsibly.

So now what? Now, more than ever, we need to be having an open, respectful and curious discussion on how to use AI well in writing.

In the area of creative writing, it has the potential to be an awesome and insightful tool - a psychological mirror to analyse yourself through your stories, a narrative experimentation device (e.g. in the form of RPGs), to identify themes or emotional patterns in your fics and brainstorming when you get stuck -

but it also has capacity for great darkness too. It can steal your voice (and the voice of others), damage fandom community spirit, foster tech dependency and shortcut the whole creative process.

Just to add my two pence at the end - I don't think it has to be so all-or-nothing. AI shouldn't replace elements we love about fandom community; rather it can help fill the gaps and pick up the slack when people aren't available, or to help writers who, for whatever reason, struggle or don't have access to fan communities.

People who use AI as a tool are also part of fandom community. Let's keep talking about how to use AI well.

Feel free to push back on this, DM me or leave me an ask (the anon function is on for people who need it to be). You can also read more on my FAQ for an AI-using fanfic writer Master Post in which I reflect on AI transparency, ethics and something I call 'McWriting'.

#fandom #fanfiction #ethical ai #ai discourse #writing #writers #writing process #writing with ai #generative ai #my ai posts

4 notes · View notes

immeasurablesaladagere · 10 months ago

Note

Ok so hear me out for this one ya—

Park often likes to watch analog horror (if we’re going to stick to cannon she probably has watched monument mythos if you need an example of a analog horror) and can’t shake the feeling that someone is there now watching her and she regresses out of fear and ends up going to house/wilson to get comforted and maybe a good nights rest

—🍭🍬🍪

One Little!Park for you, my good friend. Also a little hc to go with it so the way she talks about House and Wilson makes sense, to me she seems like the type to use formal titles with people when she's regressed like Mr. and Mrs, but in this case I think she'd call them both Dr. House and Dr. Wilson. They've tried to tell her she doesn't need to but she does it anyway.

-----

Word Count: 1560

Summery: Park watches horror media in her free time and has recently gotten into the Mandela Catalogue. She also chose to watch it to de-stress the day before performing a sleep study by herself. In the dark. She gets scared of monsters coming after her and is looked after by House and Wilson.

-----

It was quiet in the sleep lab. Too quiet. The kind of eery, unnatural silence that happened during power-outages. Every little sound Park made at her desk seemed so much louder than she logically knew it was, and the faint beeping of machines through the glass was frustratingly insufficient as background noise. She just couldn’t shake the feeling of someone staring at the back of her head as she studied the monitors, and her entire body was on edge.

Perhaps it wasn’t the best idea to watch horror media before performing a sleep study.

She thought it would be fine. She’d grown up watching horror media of all kinds, and the scarier the better, but none of the dozens of slasher films she had watched got under her skin like the Mandela Catalogue had managed to. The imagery of distorted humanoid faces lurking in the dark corners of your room, waiting for you to step too close or look the wrong way clung irrationally to her thoughts and had her looking over her shoulder every few minutes. She didn’t know why that was her instinctual reaction, it certainly shouldn’t have been. Assuming the creatures were real, looking at them, acknowledging their presence, would only seal her fate. She would be dead soon after, replaced by the monster wearing her skin. But they weren’t real, she reminded herself. It was a fictional web series using computer-made elements to elicit a fear response.

Still, even rationalizing it in her own head didn’t shake the nervousness. She was tense and shaky in her chair, like her muscles were ready to jump up and run at any time, and her ears were listening intently for any slight sound behind her, like knowing the creature was there would somehow protect her from it. It’s not real, Chi. It’s not real.

And perhaps even worse than the irrational fear was the creeping fuzzy feeling in the back of her mind. It hadn’t come out of nowhere; their current case had made the last few days both physically and mentally taxing and the sleep study was one of the last of several blind avenues they were stumbling down for any hope of an answer. She had been watching the Mandela Catalogue to paradoxically relax, and now it was coming back to bite her.

She almost wished the patient would do something; anything to keep her brain busy, and more importantly, big, but it was looking more and more like it would be a dead end. He was soundly asleep beyond the glass.

A faint clatter from down the hall made her jump. Several somethings, probably sharp medical instruments, had been knocked over in another room. She was supposed to be alone down here. There was no one else in the sleep lab on the schedule, she’d checked. Who— or what, her mind whispered, was down there with her? She sat perfectly still as the rattling continued. Metal scraped against the floor like it was being dragged, and then came the footsteps, shambling and irregular. Inhuman.

Any grip Park had on her headspace was suddenly gone. She was just a scared little kid in a big dark lab by herself with a monster coming down the hall towards her, and her mind raced for a solution. She quickly looked around the little office she was confined to for a place to hide, and whined as quietly as she could manage when she realized there weren’t any places that weren’t in view of the door where the monster would see her. The footsteps were closer now and headed for her, she just knew it. The office lights must be attracting it! She had to do something, she didn’t wanna die!

Then she got an idea. She shot up and went to the filing cabinet next to the desk and carefully shuffled it away from the wall, trying not to make any loud sounds. “Come on, come on…”

Finally the gap was just big enough for her to squeeze between the cabinet and the wall and curl up with the desk at her back. It was at that moment that the door to the office swung open and the monster stepped inside, then stopped. She didn’t dare breathe. Hot tears swam in her eyes and she scrunched up as tightly as she could into a ball. Please leave, please leave.

To her terror, the monster came further into the room, it’s mismatched footsteps shuffling closer and closer to where she was hiding until she could see the humanoid shadow looming over her, ready to grab her.

“Park. What the hell are you doing?” Said the gruff, irritated, and very human voice. She knew that voice. Had it gotten Dr. House? He didn’t seem like the kind of person to be caught by a monster… was it really him?

“Doctor Park.” Dr. House snapped. If he wasn’t a monster, then Dr. House would keep her safe. She decided to risk it. Hesitantly, she slowly scooted out from behind the cabinet and turned to look at him. He didn’t look like a monster, but he also didn’t look very happy with her.

“Are you going to tell me what’s wrong with you, or are you just going to keep ogling me?”

“I was hiding from monsters.” She mumbled, and admitting it, she did sound a little silly.

Dr. House made a face like she was a weird bug he was studying. “Okay, so… Is this a regression thing, or have you spontaneously developed a brain tumour?”

“Little.” The nervousness in her chest was slowly going away now that her caregiver was here. She knew that technically, the monsters could still get her even if he was there, but it still made her feel better. Plus, Dr. House was pretty scary when he was mad, maybe he could scare the monsters away?

Dr. House let out a big, long sigh, but he did look a little less mad. “Excellent. What monsters are you talking about?”

Park fiddled with her lab coat as the scary memories of the videos she watched came back. “I watched scary videos, and I thought…”

“Let me guess, you heard me knock over the lab cart and thought I was a monster because you watched these scary videos right before working a sleep study shift.”

She nodded, definitely a little embarrassed now.

“How old?”

She had to think about it. It wasn’t super young, but not super old either. “Seven..?” That sounded right.

“So too young to perform a sleep study, then.” Dr. House looked at the man in the bed and then back to her. “…Okay. That’s fine, I was bored anyway. Why don’t you find an empty bed to sleep in and get some rest? You’ve been babysitting this guy long enough.”

She was really tired, but she didn’t want to do that. The idea of sleeping all alone made the nervous feeling come back. They could get her if she was by herself! And she didn’t even have her safe blankie with her! Maybe she made a face, because Dr. House seemed to understand.

“You don’t want to be by yourself, do you?”

She shook her head. “No.”

"There aren't any monsters, you know." He assured, almost gently, "I would know, I checked."

"I know, but... It's still scary..."

Dr. House tapped his cane in thought, then pulled out his phone. “…Wilson? You still in your office? …Good, because one of our children needs you. Yes. No, I know it’s my night, but Daddy’s got a wicked sleep study to take over. It’ll be like take-you-daughter-to-work day, but you just get to let her sleep. Uh-huh. Yes darling, see you soon.” He hung up. “Alright, Papa Wilson is going to take you up to his office and you can sleep there with him, how does that sound?”

“That sounds good.” Park smiled. Dr. House always called Dr. Wilson ‘Papa Wilson’, which she thought was silly because Dr. Wilson wasn’t her actual dad, but she always went along with it. She liked it when Dr. Wilson looked after her because he was always really nice. He was also funny a lot, even though he didn’t mean to be.

Dr. Wilson arrived a few minutes later. He was also looking pretty sleepy from their case, but not as grumpy as Dr. House.

“Hey kid, to what do I owe the pleasure?”

“I got scared because I watched spooky videos.” She explained, rubbing at her eyes. They were heavy.

“She decided that the day before a sleep study was a good time for that. ‘Thought I was a monster, and frankly, I’m hurt.” Dr. House drawled, occupied with watching the monitors and reading over the few notes she managed to take.

“Well, you are pretty creepy.” Dr. Wilson said without missing a beat, then turned to Park. “C’mon, we can stop by the cafeteria on the way back if you want a bedtime snack.”

She gasped softly. “Animal crackers?” She really liked those, but the cafeteria didn’t have them all the time so she hoped they did today.

Dr. Wilson smiled and held out his hand for her to take. “I’ll see what I can do. Let’s go, it’s late.”

Park took his hand and gave Dr. House a little wave as they left. “Goodnight, Dr. House.”

“‘Night, kid.”

#sfw age regression #sfw agere #agere blog #age regression #fandom agere #house md #house md agere #fanfic #chi park #james wilson #gregory house

12 notes · View notes