#lossy video | Explore Tumblr Posts and Blogs

tabithamoon · 10 months

Text

pro tip: don't fall for audiophile nonsense

"Hi-Res" audio is a meme. I know this sounds like the "cinematic FPS" argument but your ears are literally incapable of hearing greater than 16-bit 48kHz audio.

breaking it down, first the 48kHz part: digital audio is stored in samples, which is a measure of the amplitude at a single point in time. the sample rate is how frequently the signal is sampled, forming our digital sound. and thanks to the Nyquist–Shannon sampling theorem, which dictates that any band-limited signal can be represented perfectly with digital samples taken at twice it's frequency, for the human limit of 20kHz, 48kHz is actually more than enough as it can encode a signal up to 24kHz

as for the 16-bit part, that dictates the dynamic range of the audio, in other words, the difference between the loudest signal a format can reproduce compared to the quietest one when it starts blending in with noise

16-bit audio, with proper dither and correct encoding, has a practical 120dB of dynamic range, which is "greater than the difference between a mosquito somewhere in the same room and a jackhammer a foot away". you don't need 24-bit audio, although it doesn't hurt, unlike >48kHz audio (see this excellent post about how ultrasonics can lead to distortion on most gear, actively hurting sound quality, by Chris Montgomery, founder of xiph.org, the foundation behind the FLAC, Vorbis and Opus audio codecs, so you can know for damn sure he knows more about digital audio than most of us combined)

don't spend extra for hi-res files and such. They're really only useful if you're a producer, as the extra data helps you keep good quality while mixing your track.

And for the love of god don't fall into the MQA snake oil rabbit hole, it's lossy and the "origami" bullshit they're peddling adds distortion and artifacts to the file, and they went bankrupt recently so that just shows how good their product is

Don't confuse this with an excuse to listen to the shittiest 128kbps mp3s straight out of Napster, lossy vs lossless codecs is a different subject altogether (hint: just get the FLACs, but beware your bluetooth buds can't do lossless)

TL;DR just get good gear, a humble 16/44 FLAC file, and enjoy mathematically perfect music

(and later watch these amazing videos by Chris from earlier going over this much better than i ever could)

and please don't take anything i've said at face value, i'm no expert, do your own research!! that is the magic the internet enables you to do

wow an actual informative post on my trash blog go me

#informative #audiophile #digital music

32 notes · View notes

payidaresque · 2 years

Text

okay, now it's time for me to add something to the "party"

for those of you who say "it's not that big of a deal, i don't see any difference". I have some examples for you but first, let's figure out what mp4 and gif actually are as formats

MP4 is a file container. The real issue here is the compressing format its using, most of the time, it's AVC. Now, after doing some researching (click on alt button for descriptipn)

AVC is a popular lossy compression method (i took this one as an example). As the name suggests, lossy = losing data that was originally put in during compression. For that exact reason, I DO NOT convert any of my material when making gifs not to lose any data (i work with 1080p files, usually 10+ GB in size, .mkv), I use Avidemux to change the container (.mkv, .ts, -etc), and that is all. I leave everything else untouched. ANY KIND OF COMPRESSION IS BAD FOR GIFS OR FOR HIGH QUALITY FILES IN GENERAL, i learned that during my film shooting courses back in school. In fact, anything you see on streaming services, or on web in general is ALREADY COMPRESSED for the audience convinience AND for technical matters, the point is that some compression algorithms do a better job at it, and some not, depending on what are you trying to achieve. Now imagine compresing something that was already compressed? Doesn't sound like a good idea does it?

Now, to GIFs

GIF IS GRAPHICS INTERCHANGE FORMAT. AN IMAGE FORMAT, NOT VIDEO FORMAT. It was made SPECIFICALLY for animations! And since it's AN IMAGE FORMAT, it uses a different compression method bc it was made for different purposes. GIF IS NOT A "LOOPING VIDEO", GIF IS A GIF. IT'S A SEPARATE FORMAT.

Now, to examples i had for you. I did not apply any coloring not to mess up with the colors

(THEY ARE THE SAME SIZE, 268x268)

And this is how it looks in my photoshop, 100%

Zoomed in 200%

Still don't see any difference? WE ARE NOT MAKING THIS UP. If staff actually rolls this out globally, there will be no point in sharpening, coloring or basically any effort that we put in our work because .mp4 will mess it all up. ANY LOSSY COMPRESSION METHOD WILL MESS UP OUR WORK, HUNDREDS OF HOURS, HUNDREDS OF PEOPLE. So please, would you be so kind and stop calling us "dramatic" JUST because we don't want it to go to waste. I'm tagging some gifmakers to spread the word because this SHOULD NOT happen

#userbecca #userlarri #icedbrilatte #usermorgan #userelenagilbert #userdanni #tuserheidi #tusereste #signal boost #honestly it's ridiculous how people don't see it

108 notes · View notes

levysoft · 1 year

Text

Fonte: The New Yorker

Nel 2013, i lavoratori di una società di costruzioni tedesca hanno notato qualcosa di strano nella loro fotocopiatrice Xerox: quando hanno fatto una copia della planimetria di una casa, la copia differiva dall'originale in modo sottile ma significativo. Nella planimetria originale, ciascuna delle tre stanze della casa era accompagnata da un rettangolo che specificava la sua area: le stanze erano rispettivamente 14,13, 21,11 e 17,42 metri quadrati. Tuttavia, nella fotocopia, tutte e tre le camere sono state etichettate come 14,13 metri quadrati. L'azienda ha contattato l'informatico David Kriesel per indagare su questo risultato apparentemente inconcepibile. Avevano bisogno di un informatico perché una moderna fotocopiatrice Xerox non utilizza il processo xerografico fisico reso popolare negli anni sessanta. Invece, scansiona il documento digitalmente e quindi stampa il file immagine risultante. Combinalo con il fatto che praticamente ogni file immagine digitale è compresso per risparmiare spazio, e una soluzione al mistero inizia a suggerirsi.

La compressione di un file richiede due passaggi: prima, la codifica, durante la quale il file viene convertito in un formato più compatto, e poi la decodifica, per cui il processo viene invertito. Se il file ripristinato è identico all'originale, il processo di compressione è descritto come lossless: nessuna informazione è stata scartata. Al contrario, se il file ripristinato è solo un'approssimazione dell'originale, la compressione è descritta come lossy: alcune informazioni sono state scartate e ora non sono recuperabili. La compressione senza perdita è ciò che viene tipicamente utilizzato per i file di testo e i programmi per computer, perché questi sono domini in cui anche un singolo carattere errato ha il potenziale per essere disastroso. La compressione lossy viene spesso utilizzata per foto, audio e video in situazioni in cui la precisione assoluta non è essenziale. La maggior parte delle volte, non ci accorgiamo se un'immagine, una canzone o un film non è perfettamente riprodotto. La perdita di fedeltà diventa più evidente solo quando i file vengono schiacciati molto strettamente. In quei casi, notiamo quelli che sono noti come artefatti di compressione: la sfocatura delle più piccole immagini JPEG e MPEG, o il suono sottile degli MP3 a basso bitrate.

Le fotocopiatrici Xerox utilizzano un formato di compressione con perdita noto come JBIG2, progettato per l'uso con immagini in bianco e nero. Per risparmiare spazio, la fotocopiatrice identifica regioni dall'aspetto simile nell'immagine e memorizza una singola copia per tutte; quando il file viene decompresso, utilizza ripetutamente quella copia per ricostruire l'immagine. Si è scoperto che la fotocopiatrice aveva giudicato le etichette che specificavano l'area delle stanze abbastanza simili da dover conservare solo una di esse - 14,13 - e l'ha riutilizzata per tutte e tre le stanze durante la stampa della planimetria.

Il fatto che le fotocopiatrici Xerox utilizzino un formato di compressione lossy invece di uno lossless non è, di per sé, un problema. Il problema è che le fotocopiatrici stavano degradando l'immagine in modo sottile, in cui i manufatti di compressione non erano immediatamente riconoscibili. Se la fotocopiatrice producesse semplicemente stampe sfocate, tutti saprebbero che non erano riproduzioni accurate degli originali. Ciò che ha portato ai problemi è stato il fatto che la fotocopiatrice stava producendo numeri leggibili ma errati; ha fatto sembrare le copie accurate quando non lo erano. (Nel 2014, Xerox ha rilasciato una patch per correggere questo problema.)

Penso che questo incidente con la fotocopiatrice Xerox valga la pena di essere preso in considerazione oggi, poiché consideriamo ChatGPT di OpenAI e altri programmi simili, che i ricercatori di A.I. chiamano grandi modelli linguistici. La somiglianza tra una fotocopiatrice e un modello linguistico di grandi dimensioni potrebbe non essere immediatamente evidente, ma considera il seguente scenario. Immagina che stai per perdere l'accesso a Internet per sempre. In preparazione, si prevede di creare una copia compressa di tutto il testo sul Web, in modo da poterlo archiviare su un server privato. Sfortunatamente, il tuo server privato ha solo l'uno per cento dello spazio necessario; non puoi usare un algoritmo di compressione senza perdita se vuoi che tutto si adatti. Invece, scrivi un algoritmo lossy che identifica le regolarità statistiche nel testo e le memorizza in un formato di file specializzato. Poiché hai un potere computazionale praticamente illimitato da lanciare a questo compito, il tuo algoritmo può identificare regolarità statistiche straordinariamente sfumate, e questo ti permette di raggiungere il rapporto di compressione desiderato di cento a uno.

Ora, perdere l'accesso a Internet non è così terribile; hai tutte le informazioni sul Web memorizzate sul tuo server. L'unico problema è che, poiché il testo è stato così altamente compresso, non puoi cercare informazioni cercando una citazione esatta; non otterrai mai una corrispondenza esatta, perché le parole non sono ciò che viene memorizzato. Per risolvere questo problema, crei un'interfaccia che accetta query sotto forma di domande e risponde con risposte che trasmettono l'essenza di ciò che hai sul tuo server.

Quello che ho descritto assomiglia molto a ChatGPT, o alla maggior parte di qualsiasi altro modello linguistico di grandi dimensioni. Pensa a ChatGPT come a un JPEG sfocato di tutto il testo sul Web. Conserva gran parte delle informazioni sul Web, nello stesso modo in cui un JPEG conserva gran parte delle informazioni di un'immagine ad alta risoluzione, ma, se stai cercando una sequenza esatta di bit, non la troverai; tutto ciò che otterrai mai è un'approssimazione. Ma, poiché l'approssimazione è presentata sotto forma di testo grammaticale, che ChatGPT eccelle nella creazione, di solito è accettabile. Stai ancora guardando un JPEG sfocato, ma la sfocatura si verifica in un modo che non rende l'immagine nel suo complesso meno nitida.

Questa analogia con la compressione con perdita non è solo un modo per comprendere la funzione di ChatGPT nel riconfezionare le informazioni trovate sul Web utilizzando parole diverse. È anche un modo per capire le "allucinazioni", o risposte senza senso a domande fattuali, a cui i grandi modelli linguistici come ChatGPT sono fin troppo inclini. Queste allucinazioni sono artefatti a compressione, ma, come le etichette errate generate dalla fotocopiatrice Xerox, sono abbastanza plausibili che identificarle richiede il confronto con gli originali, il che in questo caso significa il Web o la nostra conoscenza del mondo. Quando pensiamo a loro in questo modo, tali allucinazioni sono tutt'altro che sorprendenti; se un algoritmo di compressione è progettato per ricostruire il testo dopo che il novantanove per cento dell'originale è stato scartato, dovremmo aspettarci che porzioni significative di ciò che genera saranno interamente fabbricate.

Questa analogia ha ancora più senso quando ricordiamo che una tecnica comune utilizzata dagli algoritmi di compressione con perdita è l'interpolazione, cioè stimare ciò che manca guardando ciò che si trova su entrambi i lati del divario. Quando un programma di immagini visualizza una foto e deve ricostruire un pixel che è stato perso durante il processo di compressione, guarda i pixel vicini e calcola la media. Questo è ciò che fa ChatGPT quando gli viene chiesto di descrivere, ad esempio, perdere un calzino nell'asciugatrice usando lo stile della Dichiarazione di Indipendenza: sta prendendo due punti nello "spazio lessico" e generando il testo che occuperebbe la posizione tra di loro. ("Quando nel corso degli eventi umani, diventa necessario che uno separi i suoi indumenti dai loro compagni, al fine di mantenere la pulizia e l'ordine dei loro. . . .") ChatGPT è così bravo in questa forma di interpolazione che le persone la trovano divertente: hanno scoperto uno strumento "sfocatura" per i paragrafi invece delle foto e si stanno divertendo a giocarci.

22 notes · View notes

skitterhop · 1 month

Text

perceive

#lossy video #abstract #furry #trippy #draw #2024

300 notes · View notes

sarasa-cat · 22 days

Text

Recovering and recharging after 48 hours of too much (that once per fortnight overwhelm that is every alternating wednesday-thursday).

Tired and overwhelmed with allergy hell this evening. Eyes so swollen for a handful of weeks now. t_t

JOURNALS & JOURNALING

Have been itch-blurry-eyed relaxing and unwinding from fortnightly overwhelm by gathering inspiration for different kinds of approaches to journaling. Art journaling, travel journaling, inspo-and-ideas for projects journaling, and just plain old journal journaling.

Regularly amused by videos in which people flip through their pretty journal to show off the arty pages while their cat points out the good pages and photobombs their shot.

Thinking about how:

I have journaled on and off since I was a tween. But mostly it was far more off than on.

Same with sketchbooks and other kinds of art-focused "journals" -- lots of off, lots of unfinished pages.

Even more that has been in digital format and lost to time (or lost to dead storage formats).

I only have physical journals and sketchbooks from 2004 onward. Everything before that is in landfills a zillion miles & years from here.

After looking at various people's travel journals where each page is a collage of writing, drawings/paintings, photographs, and various ephemera, plus learning about or reverse engineering some of their process for how they put these together both during and after travel.... I GREATLY REGRET NEVER MAKING TRAVEL JOURNALS IN THE PAST. (** there are a bunch of trips, long and short, that I very much wish I had captured in travel journals rather than only in photos that I may or may not still have, but I have a semi-photographic memory -- as in slightly lossy visual memories -- of places and can "view" snapshots and short video-clips in my head of being in certain places even if only visiting once ... I think this is the reason why I regret not making actual travel journals of being in those places. That weird dreamlike nature of being able to unfocus or close my eyes and just BE in some other place and see it and hear it and sort of look around, but it is a weird glitchy ephemeral sort of VR. If only I collected things and made it all into travel journals back then... anyhow). Contemplating how to remedy the past, if at all, but also very much planning for upcoming travel and making a journal out of it.

Thinking a lot about the different kinds of PHYSICAL journals & sketchbooks I want to keep right now in addition to the DIGITAL journals that I use, on and off.

Also contemplating how all of this is such a work-in-progress in which what I like to do best changes and evolves. BUT feeling a burning need to be more intentional about this going forward.

Concern over future self having regret makes me want to be more intentional. When I look at people's organized notebooks, organized sketchbooks, organized journals, etc etc, I feel a sense of Missed Opportunity for having lots of disorganized notebooks, sketchbooks, and journals. And then I'm just hmmmmmm....

2 notes · View notes

scifigeneration · 4 months

Text

AI could improve your life by removing bottlenecks between what you want and what you get

by Bruce Schneier, Adjunct Lecturer in Public Policy, Harvard Kennedy School

Artificial intelligence is poised to upend much of society, removing human limitations inherent in many systems. One such limitation is information and logistical bottlenecks in decision-making.

Traditionally, people have been forced to reduce complex choices to a small handful of options that don’t do justice to their true desires. Artificial intelligence has the potential to remove that limitation. And it has the potential to drastically change how democracy functions.

AI researcher Tantum Collins and I, a public-interest technology scholar, call this AI overcoming “lossy bottlenecks.” Lossy is a term from information theory that refers to imperfect communications channels – that is, channels that lose information.

Multiple-choice practicality

Imagine your next sit-down dinner and being able to have a long conversation with a chef about your meal. You could end up with a bespoke dinner based on your desires, the chef’s abilities and the available ingredients. This is possible if you are cooking at home or hosted by accommodating friends.

But it is infeasible at your average restaurant: The limitations of the kitchen, the way supplies have to be ordered and the realities of restaurant cooking make this kind of rich interaction between diner and chef impossible. You get a menu of a few dozen standardized options, with the possibility of some modifications around the edges.

That’s a lossy bottleneck. Your wants and desires are rich and multifaceted. The array of culinary outcomes are equally rich and multifaceted. But there’s no scalable way to connect the two. People are forced to use multiple-choice systems like menus to simplify decision-making, and they lose so much information in the process.

People are so used to these bottlenecks that we don’t even notice them. And when we do, we tend to assume they are the inevitable cost of scale and efficiency. And they are. Or, at least, they were.

The possibilities

Artificial intelligence has the potential to overcome this limitation. By storing rich representations of people’s preferences and histories on the demand side, along with equally rich representations of capabilities, costs and creative possibilities on the supply side, AI systems enable complex customization at scale and low cost. Imagine walking into a restaurant and knowing that the kitchen has already started work on a meal optimized for your tastes, or being presented with a personalized list of choices.

There have been some early attempts at this. People have used ChatGPT to design meals based on dietary restrictions and what they have in the fridge. It’s still early days for these technologies, but once they get working, the possibilities are nearly endless. Lossy bottlenecks are everywhere.Imagine a future AI that knows your dietary wants and needs so well that you wouldn’t need to use detail prompts for meal plans, let alone iterate on them as the nutrition coach in this video does with ChatGPT.

Take labor markets. Employers look to grades, diplomas and certifications to gauge candidates’ suitability for roles. These are a very coarse representation of a job candidate’s abilities. An AI system with access to, for example, a student’s coursework, exams and teacher feedback as well as detailed information about possible jobs could provide much richer assessments of which employment matches do and don’t make sense.

Or apparel. People with money for tailors and time for fittings can get clothes made from scratch, but most of us are limited to mass-produced options. AI could hugely reduce the costs of customization by learning your style, taking measurements based on photos, generating designs that match your taste and using available materials. It would then convert your selections into a series of production instructions and place an order to an AI-enabled robotic production line.

Or software. Today’s computer programs typically use one-size-fits-all interfaces, with only minor room for modification, but individuals have widely varying needs and working styles. AI systems that observe each user’s interaction styles and know what that person wants out of a given piece of software could take this personalization far deeper, completely redesigning interfaces to suit individual needs.

Removing democracy’s bottleneck

These examples are all transformative, but the lossy bottleneck that has the largest effect on society is in politics. It’s the same problem as the restaurant. As a complicated citizen, your policy positions are probably nuanced, trading off between different options and their effects. You care about some issues more than others and some implementations more than others.

If you had the knowledge and time, you could engage in the deliberative process and help create better laws than exist today. But you don’t. And, anyway, society can’t hold policy debates involving hundreds of millions of people. So you go to the ballot box and choose between two – or if you are lucky, four or five – individual representatives or political parties.

Imagine a system where AI removes this lossy bottleneck. Instead of trying to cram your preferences to fit into the available options, imagine conveying your political preferences in detail to an AI system that would directly advocate for specific policies on your behalf. This could revolutionize democracy.Ballots are bottlenecks that funnel a voter’s diverse views into a few options. AI representations of individual voters’ desires overcome this bottleneck, promising enacted policies that better align with voters’ wishes. Tantum Collins, CC BY-ND

One way is by enhancing voter representation. By capturing the nuances of each individual’s political preferences in a way that traditional voting systems can’t, this system could lead to policies that better reflect the desires of the electorate. For example, you could have an AI device in your pocket – your future phone, for instance – that knows your views and wishes and continually votes in your name on an otherwise overwhelming number of issues large and small.

Combined with AI systems that personalize political education, it could encourage more people to participate in the democratic process and increase political engagement. And it could eliminate the problems stemming from elected representatives who reflect only the views of the majority that elected them – and sometimes not even them.

On the other hand, the privacy concerns resulting from allowing an AI such intimate access to personal data are considerable. And it’s important to avoid the pitfall of just allowing the AIs to figure out what to do: Human deliberation is crucial to a functioning democracy.

Also, there is no clear transition path from the representative democracies of today to these AI-enhanced direct democracies of tomorrow. And, of course, this is still science fiction.

First steps

These technologies are likely to be used first in other, less politically charged, domains. Recommendation systems for digital media have steadily reduced their reliance on traditional intermediaries. Radio stations are like menu items: Regardless of how nuanced your taste in music is, you have to pick from a handful of options. Early digital platforms were only a little better: “This person likes jazz, so we’ll suggest more jazz.”

Today’s streaming platforms use listener histories and a broad set of features describing each track to provide each user with personalized music recommendations. Similar systems suggest academic papers with far greater granularity than a subscription to a given journal, and movies based on more nuanced analysis than simply deferring to genres.

A world without artificial bottlenecks comes with risks – loss of jobs in the bottlenecks, for example – but it also has the potential to free people from the straightjackets that have long constrained large-scale human decision-making. In some cases – restaurants, for example – the impact on most people might be minor. But in others, like politics and hiring, the effects could be profound.

#technology #science #artificial intelligence #logistics

3 notes · View notes

moviesludge · 1 year

Photo

gif filter selection -

#1 - Lossy Diffusion (set at “20″) - 2.86 mb

#2 - Non-Lossy Diffusion - 4.89 mb

#3 - Pattern - 4.48 mb

Typically pattern is the best filter to use for clear sharp gifs, but in situations where you have a lot of dark area, you can see how non-lossy diffusion looks better because you can’t see the little patchwork forms of the pixels that show up in the pattern filter. The dotty grain of diffusion works better in some instances because it can blend into the scene better. This is a from high quality video sample. Messy details will show up more easily in a lower quality source.

I often use lossy diffusion because it produces the smallest sizes. The grain in it will often complement the visual and give it an evenly dispersed textured look, but in dark scenes it can really look static and crappy like it does here. Sometimes I make these and I don’t notice right away. I circled back around to this one to see which filter looked best.

34 notes · View notes

glitterdoe · 1 month

Text

its a lil understated visually most of the time but lossy video is a clown btw :3

#a scene clown!

11 notes · View notes

canmom · 1 year

Text

AI and Shannon information

there’s an argument I saw recently that says that, since an AI image generator like Stable Diffusion compresses a dataset of around 250TB down to just 4GB of weights, it can’t be said to be storing compressed copies of the images in the dataset. the argument goes that, with around 4 billion images in the dataset, each image only contributes around 4 bytes to the training data.

I think this is an interesting argument, and worth digging into more. ultimately I don’t think I agree with the conclusion, but it’s productive just in the sense of trying to understand what the hell these image generators are, and also to the understanding of artwork in general.

(of course this came up in an argument about copyright, but I’m going to cut that out for now.)

suppose I had a file that's just a long string of 010101... with 01 repeating N times in total. I could compress that to two bits of data, a number (of log2(N) bits) that says how many times to repeat it, and an algorithm that repeats the string N times. this is an example of Run Length Encoding. a more sophisticated version of this idea gives the DEFLATE algorithm underlying zip and png files.

that's lossless compression, meaning the compressed image can be decompressed to an exact copy of the original data. by contrast, lossy compression exploits properties of the human visual system by discarding information that we are unlikely to notice. its output only approximates the original input, but it can achieve that much greater compression ratios.

our compression algorithms are tuned to certain types of images, and if they're fed something that doesn't fit, like white noise with no repeating patterns to exploit or higher frequencies to discard, they'll end up making it larger, not smaller.

depending on the affinities of the algorithm, some things 'compress well' like an PNG image with a lot of flat colours, and some things 'compress poorly' like a film of snowflakes and confetti compressed with H.264. an animation created to be encoded with GIF, with hard pixel art edges and indexed colours and may perform poorly in an algorithm designed for live film such as WebP.

now, thought experiment: suppose that I have a collection of books that are all almost identical except for a serial number. let's say the serial number is four bytes long, so that could be as many as 2^32=4,294,967,296 books. say the rest of the book is N bytes long. so in total a book is N+4 bytes. my 'dataset' is thus 2^32×(N+4) bytes. my compression algorithm is simple: the algorithm holds the N bytes of book similar to LenPEG, the encoded file is a four byte serial number, and I simply append the two.

how much data is then used to represent any given book in the algorithm? well there's 2^32 books, so if the algorithm holds N bytes of uncompressed book, we could make the same argument as 'any given image corresponds to just 4 bytes in Stable Diffusion's weights' to say that any given book is represented by just N/2^32 bytes! probably much less than a byte per book, wow! in fact 2^32 is arbitrary, we could push it as high as we like, and have the 'average amount of data per book' asymptote towards zero. obviously this would be disingenuous because the books are almost exactly the same, so in fact, once we take into account both the decoder and the encoded book, we’re storing the book in N+4 bytes.

so ultimately the combination of algorithm and encode together is what gives us a decompressed file. of course, usually the encoder is a tiny part of this data that can be disregarded. for example, the ffmpeg binary weighs in at just a few megabytes. it’s tiny! with it, I can supply (typically) hundreds of megabytes of compressed video data using, say, H.265, and it will generate bitmap data I can display on my screen at high definition. this is a great compression ratio compared to what is likely many terabytes if stored as uncompressed bitmaps, or hundreds of gigabytes of losslessly compressed frames. with new codecs like AV1 I could get it even smaller.

compression artefacts with algorithms like JPEG and H.264/5 are usually very noticeable - ‘deep frying’, macroblocking, banding etc. this is not true for all compression algorithms. there are algorithms that substitute ‘plausible looking’ data from elsewhere in the document. for example if you scan a text file, you can store just one picture of the letter A, and replace every A with that example. this is great as long as you only replace the letter A. there was a controversy a few years ago where Xerox scanners using the JBIG2 format were found to be substituting numbers with different numbers for the sake of compression - e.g. replacing a 6 with an 8. unlike the JPEG ‘deep frying’, this kind of information loss is unnoticeable unless you have the original to compare.

in fact, normal text encoding is an example of this method. I can generate the image of text - this post for example - on your screen by passing you what is essentially a buffer of indices into a font file. each letter is thus stored as one or two bytes. the font file might be, say, a hundred kilobytes. the decoding algorithm takes each codepoint, looks it up in the font file to fetch a vector drawing of a particular letter, rasterises it and displays the glyph on your screen. I could also take a screenshot of this text and encode it with, say, PNG. this would generate an equivalent pixel representation, but it would be a much larger file, maybe hundreds of kilobytes.

so UTF-8 encoded text and a suitable font renderer is a really great encoding scheme. it stores a single copy of the stuff that’s redundant between all sorts of images (the shapes of letter glyphs), it’s easily processed and analysed on the computer, and it has the benefit that a human can author it very easily using a keyboard - even easier than we could actually draw these letters on paper. compared to handwritten text, you lose the particular character of a person’s handwriting, but we don’t usually consider that important since the intent of text is to convey words as efficiently as possible.

come back to AI image generators. most of that 4GB is encoding whatever common patterns the neural net's training process detected, redundant across thousands of very similar images. the text prompt is the part that becomes analogous to 'compressed data' in that it specifies a particular image from the class of images that the algorithm can produce. it’s only tens of bytes long and it’s even readable and authorable by humans. as an achievement in lossy image compression, even with its limitations, this is insanely impressive.

AI image generators of the ‘diffusion’ type spun out of research into image compression (starting with Ho et al.). the researchers discovered that it was possible to interpolate the ‘latent space’ produced by their compression system, and ‘decode’ new images that share many of the features of the images they were trying to compress and decompress.

ok, so, the point of all this.

Shannon information and the closely related ‘entropy’ a measure of how ‘surprising’ a new piece of data is. in image compression, it measures how much information you need to distinguish a particular piece of data from a ‘default’ set of assumptions. if you know something about the sort of data you expect to see, you need correspondingly less information to distinguish the particulars.

image compression is all about trying to exploit commonalities between images to minimise the amount of ‘extra’ information you need to distinguish a specific image from the other ones you expect to be called on to decode. for example, in video encoding, it’s observed that often you see a patch of colours moving as a unit. so instead of storing the same block of pixels with a slight offset on successive frames, you can store it just once, and store a vector saying how far it’s moved and in which direction - this is a technique called block motion compensation. using this technique, you can save some data for videos that fit this pattern, since they’re not quite as surprising.

the success of AI does end up suggesting the rather startling conclusion with the right abstraction, there isn't a huge amount of Shannon information distinguishing any two works, at least at the level of approximation the AI can achieve. this isn't entirely surprising. in AI code generation - well how much code is just boilerplate? in illustration, how many pictures depict very similar human faces from a similar 3/4 angle in similar lighting? the AI might theoretically store one representation of a human face, and then just enough information to distinguish how a particular face differs from the default face.

compare this with a Fourier series. you transform a periodic function, and you get first the broad gist (a repeating pattern -> a single sine wave), and then a series of corrections (sine waves at higher frequencies). if you stop after a few corrections, you get pretty close. that’s roughly how JPEG works, incidentally.

the AI's compression is very lossy; it will substitute a generic version that's only approximately the same as a particular picture, that only approximately realises some text prompt. since text prompts are quite open ended (they can only add so much shannon information), there are a huge amount of possible valid ‘decodings’. the AI introduces some randomness in its process of ‘decoding’ and gives you a choice.

to get something more specific, you must fine-tune your prompt with more information, or especially provide the AI with existing images to tune itself to. one of the main ways you can fine-tune your prompt is by invoking the name of a specific artist. in the big block of encoded algorithm data and its ‘latent space’, the weights representing the way that that specific artist’s work differs from the ‘core model’ will be activated. how much information are you drawing on there? it’s hard to tell, and it will further depend how much that artist is represented in the dataset.

by training to a specific artist, you provide the AI a whole bunch of Shannon information, considerably more than four bytes - though maybe not that much once the encoding is complete, just enough to distinguish that artist’s work from other artists in the ‘latent space’. in this sense, training an AI on someone's work specifically, or using their name to prompt it to align with their work, is creating a derivative work. you very much could not create a ‘similar’ image without providing the original artworks as an input. (is it derivative enough to trigger copyright law? that’s about to be fought.)

I say this neutrally - art thrives on derivative work: collage, sampling in music, studies of other artists... and especially just plain old inspiration and iteration. but “it isn't derivative” or it “isn’t an encoding” isn't a good hill to defend AI from. the Stable Diffusion lawsuit's description of the AI as an automated collage machine is a limited analogy, but it's at least as good as saying an AI is just like a human artist. the degree to which an AI generated image is derivative of any particular work is trickier to assess. the problem is you, as a user of the AI, don’t really know what you’re leaning on because there’s a massive black box obscuring the whole process. your input is limited to the amount of Shannon information contained in a text prompt.

ok, that’s the interesting part over. everything about copyright and stuff I’ve said before. AI is bad news for artists trying to practice under capitalism, copyright expansion would also be bad news and likely fail to stop AI; some artists in the USA winning a class action does not do much for artists anywhere else. etc. etc. we’ll probably find some way to get the worst of both worlds. but anyway.

#ai art #again #computer science #shannon information

20 notes · View notes