turbidcurrent-images - Tumblr blog

turbidcurrent-images · 3 days ago

Text

Drawbacks

There are a large number of problems you can run into when generating images that can really point out the failures in the model.

Plagiarism

Of course, the big one, image generation models are trained on extant images and will parrot back the styles they see. The only defense of this I have is that with a large enough sample size they just develop a vaguely generic style of their own and I never use artist names in prompts or style loras¹. That's the most ethical it is possible to be here, just never intentionally stealing art.

Bias

Of course, there's a limited amount of training data available, even with the boundless expanse of the Internet available. An LLM is only capable of drawing inferences based on the data it has, and will make those inferences based on the relations it sees in that data. Reference the multiple stories where models began racially profiling people, or the cancer detection model that learned that medical pictures of melanomas are taken with a ruler to indicate size of the tumor, so any picture with a ruler in shot must contain cancer.

Most models, when asked for a person, will give you a Caucasian. When asked for a furry, you'll usually get a dog. This can be worked around with explicit prompting.

Flesh Friends

LLMs don't know what things look like. They know that things that display these characteristics are often labeled in this manner, but they don't know that a dog has one head, two ears, four legs, and a tail arranged just so, so it has no qualms about making a horrible fleshcrafted monstrosity when you ask for a dog.

This is where negative prompting comes in, where we chant the mystical abjurations "deformed, mutated, ugly, disfigured, long body, bad anatomy, bad hands, missing fingers, extra digits, fewer digits, very displeasing, worst quality, bad quality, conjoined, bad ai-generated" and hope the model takes the hint. Usually this works to some degree. Better with more current models than earlier ones.

A model will also create additional figures in the scene if it feels like it needs to. For instance, if you have a furry with hooves in your prompt and decide to prompt for sneakers as well, the model will generate another character to wear those sneakers, because it does happen to know that sneakers don't go on hooves.

The Two-Body Problem

Even when you have a perfectly fine set of anatomy prompts, the model can't really recognize specific people. You can train a lora to regularly reproduce a character, and outside of that, using the same set of prompts will get you fairly regular results.

But all this goes out the window when dealing with two distinct people. The model will blithely swap characteristics like hair color, gender, clothing, specified body parts, and species around.

I've tried multiple different methods to try and block out individual identities, and the model still blends them. This is a known drawback, and so you don't really use multiple characters in a scene unless you're okay with the results this problem gets you.

Order vs. Chaos

Computers are famously known for being able to handle repetition and patterns, it's what we made them for. So it is fascinating to me when a model cannot keep the left and right sides of a desk consistent, or has a colonnade with columns that look like M.C. Escher tried meeting a deadline while drunk.

Contrariwise, it can do foliage, bushes, and similar chaotic systems well because the eye skips over any potential imperfections. Here:

Locally generated with the prompt "a battered military surplus jeep with a broken headlight navigating a muddy trail, photorealistic, driver in shadow, jungle, raining, wet, bokeh, moonless night, establishing shot, action shot, dynamic angle"

Observe, if you will, the weird shape of the tow hooks on top of the bumper. The windshield wipers at different angles and weirdly drippy. The odd greeble on the driver's side. Now check out the perfectly acceptable bushes and palm trees.

How could this happen to a perfectly good computer? Look at it. It's hallucinating.

¹A LoRA, or Low-Rank Adaptation, is a small, specifically-trained model that supplements the larger generator model. It can alter themes, reproduce characters, or provide particular elements for a scene.

#image generation #procedural generation #large language model #discussion

0 notes

turbidcurrent-images · 5 days ago

Text

Welcome

This blog is where I, as an outsider to the scene, talk about my experiences procedurally generating images using large language models.

You may notice some carefully chosen wording in that last sentence. "AI art" is both inaccurate and misleading. These models are not artificial intelligence any more than teaching a parakeet to talk is. And these images are not art any more than the people who write prompts are artists. We are management praying to a poorly-understood deity, hoping our inclusion of the appropriate supplications and abjurations blesses us with our desires.

The models themselves start from noise and pareidolia down, seeking meaning in the clouds, telling us "Well, this part kinda looks like a boob if you squint" and then doing the squinting for us.

And usually, these images are "good enough". Good enough to make tokens for tabletop characters, good enough to envision your fursona in a safe space, good enough to have a cheeky wank to should you so desire.

If you download the models yourself, they're not even significantly more taxing on your system than playing the latest videogame.

The problem, as so often stated in multiple memes and fora, is capitalism. Capitalism harnesses server banks and drains rivers for these images, capitalism tries to sell you these images, capitalism vomits them forth until all content is subsumed in slop.

Used sparingly, used ethically, a large language model generating images can be a useful tool. Generate something that looks kind of like what you want in actual art and then hire an actual artist to make the changes you want and add details you desire. You can generate all the gardevoirs buying white bread you like, but they'll never be able to tell you how much they love paving over nature preserves, you gotta pay a person to get them to do that.

This will not be an image-heavy blog. I have some thoughts I want to get out, will probably get some hate mail that will be fun to respond to. As a moderate I'm going to get hate from the people who want AI art to go away entirely and the AI bros who think I'm insulting them by not calling them real artists.

To be fair, I am insulting the AI bros. I've seen what you call art and I've seen better in elementary school hallways.

As for the abolitionists, I agree with you. I hated 'AI art' with a passion. The current implementation is bad for the environment, the economy, and creative expression. But the genie is out of the bottle, it's not going away. This doesn't mean we just give up and go hog wild churning out slop.

But it has a use. I hope to show this use and provide some tips for those wishing to use procedural image generation for your own personal concerns.

This isn't going to be a huge long blog or even regularly updated, when I have something to say I'll say it, when I run out I'll stop.

Thank you for taking the time to get in my weird little head with me. Let's see where it goes.

#procedural generation #large language model #blog intro #pinned intro #image generation

0 notes