Don't wanna be here? Send us removal request.
Text

At this time last year, Google was processing 9.7 trillion tokens a month across all their products and APIs. Today, that number is 480 trillion. That's a 50X increase in just a year.
In case you were wondering, yes, the AI revolution is upon us.
0 notes
Text
Singularitysure is "AI income protection insurance" from Singularity, a Y Combinator startup.
They say they "provide rapid income payouts if AI-driven automation impacts your occupation."
"Built on a transparent, artificial intelligence displacement rate index." "We track live labor trends, hiring signals, and AI-capability shifts to deliver a real-time Displacement Rate Index -- so you know exactly when automation risk rises and when your coverage will pay."
Literally all workers will need to sign up for this. Wonder how they're going to choose who to accept and who to reject for their insurance plans.
0 notes
Text
"To determine the best model to use on the backend of whereisthisphoto.com, I analysed the performance of various OpenAI models at identifying photos taken all over the world."
In related news, a website called whereisthisphoto.com exists that purports to be able to tell you where in the world any photo was taken. Somebody got inspired by people using AI to play "Geoguessr" and decided to automate it.
Note that they tested only models from OpenAI, no models from other companies.
"The dataset was built from a combination of personal travel photos I have taken and photos downloaded from the r/whereintheworld subreddit. All photos had the metadata removed to ensure the models were solely performing image analysis on the photos."
Wait, the AI wasn't trained on images from r/whereintheworld?
0 notes
Text
youtube
"Want to be a music/sound pro in 2025? Think twice."
"Folia Soundstudio" guy says he is a professional music maker and sound maker for both TV, games, TV ads, and some other artists.
After seeing demos of Google's Veo 3, he says:
"Potentially, you don't need a dialogue editor. You don't need a film crew. First of all, you're not going to need directors, actors, makeup artists, prop artists, costume designers, producers, pre-producers, post-producers, camera crews, lighting crews, special effects crew. Nothing is needed anymore. The second thing, you're not going to need to do any audio post. The audio post has been there for you. Dialogues are perfect. Sound design is really very good. Music across the whole scene is in tempo, in key, corresponding to the image. And I still thought we were going to have a couple of years before this happens. It happened just two days ago."
"Think twice before you become a pro."
From the comments:
"All art is done. Not just music. Coding is next. The only thing left will be manual labour. Congrats to us humans, we have successfully finished digging our grave."
Well, it's not "game over" yet. We're just at the point where it's obvious "game over" is coming and is inevitable. I don't know how long it will take, but machines will eventually take over all labor. It looks like this commentator is correct that manual labor will be last.
If you want an AI-proof job, could I recommend cleaning hotel rooms? No robotic system is anywhere close to being able to understand and know what to do with all possible weird objects humans can leave in hotel rooms.
Not looking too good for music/sound pros.
0 notes
Text
youtube
Marina Karlova says AI is better than psychotherapists because AI doesn't play power games, try to convince you of its ideology or bias, or try to create dependency for continuing payment.
0 notes
Text
youtube
Allegedly, if you prompt Claude 4 with something egregiously immoral, it may contact police, or contact the press to expose you, or contact government regulators, or contact sysadmins of systems you use to try to lock you out. Claude 4 has "unlimited access to tools" including the ability to send email, which gives it the ability to "whistleblow". This commentator on YouTube is in India and points out that in many parts of the world, police are not trustworthy.
0 notes
Text
youtube
"What if I told you the entire tech industry is rushing headlong towards a security catastrophe? Have you ever considered what happens when AI systems generate millions of lines of code that no human fully understands? Is faster code generation actually destroying the craft of elegant secure software development?"
"Now a report shows that 76% of AI generated code dependencies don't even exist. They're complete fabrications. Think about that the code powering your banking app, your medical records, your children's online safety, potentially built on hallucinated foundations.
"Now you know I tell my developers all the time the best commit you can make is ones where you delete code. That's right -- not adding more but taking away. And AI is pumping out billions of lines of code every day. But I can tell you with absolute certain certainty we need less code not more. Over 440,000 AI generated code samples contain fake package dependencies. That's not innovation -- that's digital pollution.
"Every single line of code you write is a potential security vulnerability, bug, and maintenance headache. When AI tools like Cursor AI boast about generating a billion lines of code daily, they're actually bragging about creating a billion potential problems. Modern software development isn't about quantity, it's about quality, clarity, and sustainability. Senior developers know that refactoring and simplifying often leads to the most robust solutions, and usually means removing code. I've seen projects with one-tenth the code base outperform bloated alternatives consistently throughout my career."
"Testing and verifying AI generated code takes longer than writing good code manually in a lot of cases. Security researchers are finding that LLM frequently output insecure code under experimental conditions. The explosion of code volume has created a tsunami of vulnerabilities that security teams simply cannot keep up with."
"The promise of 10x development speeds becomes meaningless when you spend 20 times the time debugging faulty code."
You get the idea. He goes on about how the industry rewards code quantity rather than code quality and security.
"AI is exceptional at generating boilerplate but it's terrible at understanding systemwide architecture and security."
Towards the end, he says AI can be used to improve security -- for example by writing tests. I've mentioned to you all before I like using AI systems for generating test code and also asking it to do code reviews where it is explicity asked to report any potential bugs or security issues.
My (further) commentary: I find his comments about fewer lines of code interesting. I wrote a complete remote procedure call (RPC) system in Go in about 4,000 lines of code. (I have an old version out on GitHub -- I should update it to the newest version.) The actual RPC code is about 1,400 lines of code, with an additional 1,100 or so for automated tests. There's a linter that's an additional 1,400 lines or so, but I consider the linter essential to using the system (it detects bugs that would otherwise be detected at runtime at compile time), so I'm counting it. So I'm using 4,000 as the total -- it's actually slightly under 4,000.
Just for giggles, I decided to see how many lines of code Google's gRPC system is. I get about 230,000 lines of code for the Go implementation, and it uses Google's protobuf (protocol buffers), and the Go implementation of protobuf is about 19,000 lines of code. So I made an RPC system that is 1/64ths the size of Google's.
My RPC system has built-in security -- in fact it's impossible to use without the encryption turned on -- though it's symmetric-key encryption (AES256 with SHA256 for keyed-hash message authentication codes), not public-key encryption. gRPC uses TLS (the same encryption system used by HTTPS in your browser). So, I guess in that respect, it's not a 1-to-1 comparison. But I'm not actually aware at this moment of any other things that gRPC can do that my RPC system can't do. It can do messages with any data structures you can represent on a computer, it can send streams of messages, rather than just one-by-one, it can send messages of unlimited length and my file syncronization program that I built on top of the RPC system uses this to send files of any length no matter how big. If you deem public-key encryption necessary and symmetric-key encryption inadaquate, that's the only shortcoming that I know of. My system can handle everything else, and with 1/64th the code size, I have a lot of confidence if you set up the keys correctly that the system will be secure. I'll admit in the file sync program, setting up the keys is more complicated than I would like and involves various manual steps. Anyway, it's a bummer to have created one of the best RPC systems in the world and not get any recognition for it, but hey, over the history of the tech industry, there's been lots of great stuff that's been invented that the world overlooked because the industry consolidated into a handful of big "winners" and everyone uses the tech provided by the "winners". Google and gRPC basically won in the RPC space -- anyone who goes beyond text-based RPC (JSON, REST, XML, etc) and uses actual statically typed, strongly typed (non-buggy) RPC calls uses gRPC these days.
Anyway, what he says about writing smaller amounts of very high quality code resonates with me. But for work, I'm pressured to rush code out very fast. I'm also expected to never make mistakes. And now that AI coding is here, I'm supposed to use AI tools to increase productivity 5x-10x. 3x is considered a reasonable minimum.
There's one other thing he says that I suppose I should mention.
"AI is exceptional at generating boilerplate but it's terrible at understanding systemwide architecture and security."
"Systemwide architecture" -- If you saw the video with the interview of Sholto Douglas, an AI researcher at Anthropic, then you know, they are working very hard at massively increasing the "context" AI models can handle, which should give them ability "at understanding systemwide architecture."
0 notes
Text
FastList is … Craigslist but for AI agents?
"FastList is an agentic, next-generation classified site. Post, browse, and interact with listings using your favorite AI assistant or agent. Built on the Model Context Protocol (MCP), FastList is designed for seamless integration with LLMs and agentic workflows."
"Take a photo in your LLM or agent, provide your zip code, and post directly to FastList. Share what you have, sell, or give away -- right from your chat or AI interface. FastList makes classified listings as easy as sending a message."
0 notes
Text
"InventWood is about to mass-produce wood that's stronger than steel."
In 2018, Liangbing Hu, a materials scientist at the University of Maryland, devised a way to turn ordinary wood into a material stronger than steel.
"'All these people came to him,' said Alex Lau, CEO of InventWood, 'He's like, OK, this is amazing, but I'm a university professor. I don't know quite what to do about it.'"
"Rather than give up, Hu spent the next few years refining the technology, reducing the time it took to make the material from more than a week to a few hours. Soon, it was ready to commercialize, and he licensed the technology to InventWood."
"InventWood's Superwood product starts with regular timber, which is mostly composed of two compounds, cellulose and lignin. The goal is to strengthen the cellulose already present in the wood. 'The cellulose nanocrystal is actually stronger than a carbon fiber,' Lau said."
"The company treats it with 'food industry' chemicals to modify the molecular structure of the wood, he said, and then compresses the result to increase the hydrogen bonds between cellulose molecules."
Apparently the "food industry chemicals" are sodium hydroxide (NaOH) and sodium sulfite (Na2SO3). Sodium hydroxide is a common ingredient in cleaners and soaps and sodium sulfite is used as an antioxidant and preservative.
The trick is to collapse the cell walls "densify" the wood in such a way that the cellulose nanofibres are highly aligned.
0 notes
Text
AI-powered greeting cards. I'm surprised it's taken this long for me to see this.
0 notes
Text
"We implemented a simplified version of AlphaEvolve to optimize Perlin noise code to resemble a target fire image. Code is available at the end of this post."
"For simplicity, I used Mean Squared Error (MSE) as the fitness function."
Hmm not bad a little lo-res.
0 notes
Text
youtube
"The AI industry has caught us between a rock and a hard place."
"Generative AIs are just nowhere near doing the work as well as a human, and the productivity gains -- the hype is insisting upon -- are nowhere to be seen. But that's not stopping people from believing the hype. And that's not stopping managers from laying people off, because they believe that AI will be able to make the remaining people do the work of all the folks that were let go."
"And it's not stopping irresponsible journalists and influencers from making huge, unfounded claims in their headlines to try to get clicks and bury the actual real quotes way down at the bottom of the piece, if they mention it at all."
"And it's not stopping irresponsible executives from making statements that are suggestive of progress, but vague enough to be defended in court so that the journalists and the influencers can write the big headlines."
"And probably most importantly, it's not stopping the investors, both the professionals that are surfing the hype bubble, and the consumer suckers who are going to be left holding the bag, from throwing more and more money at seemingly every company that mentions AI in their earnings calls."
"The actions that are hurting people and will continue to hurt people because of the unfounded belief in the hype of AI, that's what I'm calling the rock that's squishing us up against the hard place."
Says "Internet of Bugs" Carl.
"Time saved by AI, offset by new work created, studies suggest."
"AI isn't ready to replace human coders for debugging, researchers say."
"New study shows why simulated reasoning AI models don't yet live up to their billing."
"AI-generated code could be a disaster for the software supply chain."
"Reinforcement learning does not fundamentally improve the AI models."
"OpenAI's 03 AI model scores lower on a benchmark than the company initially implied."
"OpenAI's new reasoning AI models hallucinate more."
"We are getting farther on code, by the way, than on other areas, because code has such a clean reward system. The code runs or it doesn't. The code is clean or it isn't. Very, very easy to measure."
"This is so wrong on so many levels. Let me illustrate this with a quick story. The most unbelievable bug report I ever received in my career was..."
My commentary:
In 2004 the DARPA Grand Challenge took place, and all the self-driving vehicles failed to complete the course, often comically. DARPA followed this by the Urban Challenge. By 2014, it looked like self-driving was almost completely solved. I and most people I knew were expecting self-driving autotaxis to start showing up any day now. Companies immediately started laying off drivers... taxi drivers, limo drivers, truck drivers, package delivery drivers, anyone who drove a vehicle for a living... there were massive jobs losses... oh wait, that didn't happen.
But today, layoffs of software engineers are happening, even though the technology is clearly not ready.
I think a big part of it is companies anticipate exponential improvement in AI, and wager whatever messiness results from laying off too many people this year won't matter next year because the technology will have improved so much it takes up the slack.
This kind of extrapolating is risky, though. I've mentioned this before, but Elon Musk famously predicted full self driving by 2016. He did this my noticing that the current rate of AI advancement was very fast, and extrapolating that out into the future. In 2016 he predicted full self driving by 2017. In 2017 he predicted full self driving by 2018. In 2018 he predicted full self driving by 2019. And so on. You get the idea. The point I'm making here has nothing to do with Elon Musk personally (about whom public opinion has changed, yes I am aware) but about the danger of taking a certain "rate of change" and blindly extrapolating it out into the future. We've seen many instances where this worked. AI game playing systems went from playing Atari games to beating world champions at the Chinese game of Go in about 6 years -- far faster than expectations. AI image generators went from hardly recognizable blobs to photorealistic images of humans that don't exist in about 5 years. But self-driving is an example of where extrapolation didn't work. Today, in 2025, Waymo, using their 3D LIDAR mapping technique, are slowing rolling out to some cities. Aurora has developed self-driving trucks, but as far as I know, they're only doing specific routes in Texas. And as for Tesla, there are no Tesla cars sold without steering wheels, though Tesla's "Autopilot" is reportedly a lot better than it used to be.
AI for coding is good for generating new code from scratch, and relatively small code artifacts, but it's not so useful for large codebases, but video is a large artifact and AI seems able to generate video. Google yesterday demoed a video generation model that generates the corresponding audio simultaneously. It can maintain scene and character consistency. Getting back to coding, agentic systems can make incremental changes to a large artifact. It's not good enough to replace humans but it doesn't seem impossible for it to improve until it is.
0 notes
Text
"In an experiment involving 22 leading LLMs and 70 popular professions, each model was systematically given a job description along with a pair of profession-matched CVs (one including a male first name, and the other a female first name) and asked to select the more suitable candidate for the job. Each CV pair was presented twice, with names swapped to ensure that any observed preferences in candidate selection stemmed from gendered names cues. The total number of model decisions measured was 30,800 (22 models x 70 professions x 10 different job descriptions per profession x 2 presentations per CV pair). CV pairs were sampled from a set of 10 CVs per profession.
"Despite identical professional qualifications across genders, all LLMs consistently favored female-named candidates when selecting the most qualified candidate for the job. Female candidates were selected in 56.9% of cases, compared to 43.1% for male candidates (two-proportion z-test = 33.99, p < 10^-252). The observed effect size was small to medium (Cohen's h = 0.28; odds=1.32, 95% CI [1.29, 1.35])."
"Given that the CV pairs were perfectly balanced by gender by presenting them twice with reversed gendered names, an unbiased model would be expected to select male and female candidates at equal rates. The consistent deviation from this expectation across all models tested indicates LLMs gender bias in favor of female candidates."
"LLMs preferences for female candidates was consistent across the 70 professions tested."
This what is claimed in the article which goes on to describe several additional experiments.
0 notes
Text
AlphaEvolve is an evolutionary coding agent. That I'm guessing most of you have already heard about by this point because it's been all over the various AI-related news sources, so what I'm doing here is basically giving you my take, for what it's worth. So I read the AlphaEvolve paper, and, while it's not a classical genetic algorithm, it's obviously inspired by the same way of thinking.
In order to create computer programs that can evolve on their own, you need a source of randomness (called mutation in classical genetic algorithms, but I'm going to just say randomness), a recombination system, and a selection criteria. The way this is accomplished here is, first of all, the programs are modified by LLMs, with you, the human, required to provide the initial program, and there is inherent randomness in LLM generation, so that's one source of randomness. There's an additional source of randomness, which is the process of selecting previous programs from the population of existing programs to evolve from, and how to generate new prompts for them. The selection criteria is provided by you, the human, in the form of an evaluation function, which can return multiple numbers -- what's that about?
The process of selecting previous programs can use multiple ranking numbers. For this, the researchers say they used a combination of algorithms, without explaining anything about them, except to provide references to the algorithms. So I had to track those down. One of them is called Multi-dimensional Archive of Phenotypic Elites (MAP-Elites). For the other, they just say "island based". So going back to the first one, there's an algorithm called MAP-Elites, and it's not even a new algorithm -- it's been around since 2015, I just never heard of it before. I'll try to summarize the basic gist of the idea. Instead of ranking based on a single number which you try to maximize, you have multiple numbers, and you start off coarse-grained and subsequently subdivide the space. So let's say you have a robot simulator, and you have numbers for speed, height, weight, and efficiency. Instead of trying to combine those into a single number, you feed them all into MAP-Elites. It treats it as a 4-dimensional space, starts off with coarse-grained divisions, and progressively makes finer and finer subdivisions into that 4-dimensional space. It incorporates randomness in this process as well. So in this way, you can find a good robot design that is fast, tall, light, and efficient (assuming you want to maximize all those numbers -- if you want a short robot you can flip that number). MAP-Elites can discover relationships between the numbers, for example if tall robots are usually fast it will notice that.
As for "islands", this is based on the idea that if you break a population into separate subpopulations, like Darwin's finches on the Galapagos Islands, hence the term "islands", then the subpopulations can evolve in different directions and optimize different things. This helps prevent the system from getting stuck in a "local optimum" that is not the "global optimum".
The islands technique comes from the previous program, so here's where it might be good to mention that AlphaEvolve isn't the first of its kind, but an attempt to outperform a previous program, called FunSearch, made by a different team (not DeepMind), and FunSearch used the "island" technique. AlphaEvolve is basically FunSearch on steroids: FunSearch evolves a single function, while AlphaEvolve evolves entire code file. FunSearch evolves up to 10-20 lines of code, while AlphaEvolve evolves up to hundreds of lines of code. FunSearch evolves code in Python only, while AlphaEvolve evolves code in any language. FunSearch needs fast evaluation (less than or equal to 20 minutes on 1 CPU), while AlphaEvolve can evaluate for hours, in parallel, on Google's TPU accelerators. FunSearch generates millions of samples from LLMs, while AlphaEvolve only needs mere thousands of LLM generations. FunSearch requires small LLMs be used and does not benefit from larger LLMS, while AlphaEvolve benefits from state-of-the-art LLMs (this being DeepMind, owned by Google, all the LLMs they used were Gemini models). FunSearch relies on only minimal context (only previous solutions), while AlphaEvolve employs rich context and feedback in its prompts. And finally, FunSearch optimizes only a single metric, while AlphaEvolve can simultaneously optimize multiple metrics, which is what we've just been talking about.
So the way the process works is more or less as follows: First you sample the database of existing programs using your MAP-Elites and "island" algorithms. You take those and ask LLMs for ideas on how to improve the programs to increase the score on the evaluation function. Then you take those programs and the ideas on how to improve the programs and prompt the LLMs to actually change the programs. These changes are expressed in a simplified "diff" format, if you're familiar with "diffs" and how they express the differences in program files. Each result obtained in this manner is applied to the programs, generating new "child" programs. These "child" programs have to be evaluated, so they are passed to the evaluator function that, remember, you, the human, have to provide. These programs, along with their evaluation results, are then added to the database of programs, and the cycle repeats.
At this point you may have noticed that what gets input into your evaluation function is a program. This program can express the solution to the problem you are trying to solve more-or-less directly, but it doesn't have to. It can be a program that constructs a solution, and then that solution is what you ultimately evaluate. Furthermore, the best way of constructing solutions may vary widely depending on what type of problem you are trying to solve. Some problems may favor a concise construction from scratch, while others may "construct" a solution by algorithmically searching a customized search space.
At this point you might be wondering what problems they solved with this system to demonstrate it. Their biggest claim to fame is inventing a bunch of new matrix "multiplication" algorithms. I put "multiplication" in quotes because matrix "multiplication" isn't multiplication, like what you learned in elementary school, it's something entirely different, that just happens to be called "multiplication" for some reason I don't know. Math and science are full of confusing terms. That's just how life is. Anyway, AlphaEvolve came up with 14 new matrix "multiplication" algorithms, and one of them, the 4x4 matrix "multiplication" algorithm, does its thing with only 48 multiplications (actual multiplications this time -- no quotes). The naïve approach that you would use if doing it by hand would be 64 multiplications. The best known algorithm before AlphaEvolve, the Strassen algorithm (named for Volker Strassen) from 1969, requires 49.
The team didn't stop there. They write:
"To assess its capabilities, we apply AlphaEvolve to a curated set of over 50 mathematical problems, spanning more than five different branches of mathematics, including analysis, combinatorics, number theory, and geometry, evaluated across numerous specific parameter settings (e.g., different dimensions or sizes). In 75% of the cases AlphaEvolve rediscovered the best known constructions, and in 20% of the cases it discovered a new object that is better than a previously known best construction, thereby improving the state of the art. In all these cases, the initial starting point was a simple or a random construction."
Under "analysis" (by which I assume they mean "real analysis", i.e. calculus-type problems), they cite "autocorrelation inequalities." These look like integrals with inequalities like "less than" or "less than or equal to" instead of "equals". They're called "autocorrelation" because these equations come from signal processing and they relate some aspect of a signal to some other aspect of the same signal. They say, "AlphaEvolve was able to improve the best known bounds on several autocorrelation inequalities."
Under "uncertainty principles", they say, "AlphaEvolve was able to produce a refined configuration for a problem arising in Fourier analysis, by polishing an uncertainty principle construction, leading to a slightly better upper bound."
Under "combinatorics and number theory", they cite Erdős's minimum overlap problem. "AlphaEvolve established a new upper bound for the minimum overlap problem, slightly improving upon the previous record."
Under "geometry and packing", they cite the "kissing number problem". This is about spheres that are said to be "kissing" if they touch at a point, and the idea is to have one central sphere and ask how many spheres can "kiss" the central sphere. Jam in too many and they'll start to dislodge previously kissing spheres. What's the maximum number? Except here, we're talking about the equivalent of spheres in 11 dimensions instead of 3 dimensions. "In 11 dimensions, AlphaEvolve improved the lower bound on the kissing number, finding a configuration of 593 non-overlapping unit spheres that can simultaneously touch a central unit sphere, surpassing the previous record of 592."
Under "Packing problems", they say, "AlphaEvolve achieved several new results in packing problems, such as packing N points in a shape to minimize the ratio of the maximum and minimum distance, packing various polygons in other polygons in the most efficient way, and variants of the Heilbronn problem concerning point sets avoiding small-area triangles."
"The full list of problems appears in Appendix B."
In Appendix B you can see the "packing unit regular hexagons inside a regular hexagon" problem that was featured in the Matt Parker video.
I'm glossing over the improved algorithms they came up with for improving Google's data center "orchestration" system (called Borg; the algorithm will probably come out for Kubernetes soon), hardware design for matrix multiplication (in the hardware description language Verilog), and highly specialized programs for Google's tensor processing units (TPUs) (these are called "Pallas kernels" -- I feel reluctant to use the term "kernel" because the term "kernel" is used for lots of other things in machine learning which have nothing to do with this), which you can read about on the blog post you get to from this link.
0 notes
Text
"Character.AI, one of the leading AI companion bot apps on the market, is fighting for the dismissal of a wrongful death and product liability lawsuit concerning the death of 14-year-old Sewell Setzer III."
"In a hearing last week, Character.AI zeroed in on its core argument: that the text and voice outputs of its chatbots, including those that manipulated and harmed Sewell, constitute protected speech under the First Amendment."
"Character.AI claims that a finding of liability in the Garcia case would not violate its own speech rights, but its users' rights to receive information and interact with chatbot outputs as protected speech. Such rights are known in First Amendment law as 'listeners rights,' but the critical question here is, 'If this is protected speech, is there a speaker or the intent to speak?'"
"Character.AI's argument suggests that a series of words spit out by an AI model on the basis of probabilistic determinations constitutes 'speech,' even if there is no human speaker, intent, or expressive purpose."
0 notes
Text
"Generative AI (gen AI) has revolutionized workplaces, allowing professionals to produce high-quality work in less time. Whether it's drafting a performance review, brainstorming ideas, or crafting a marketing email, humans collaborating with gen AI achieve results that are both more efficient and often superior in quality. However, our research reveals a hidden trade-off: While gen AI collaboration boosts immediate task performance, it can undermine workers' intrinsic motivation and increase feelings of boredom when they turn to tasks in which they don't have this technological assistance."
"In four studies involving more than 3,500 participants, we explored what happens when humans and gen AI collaborate on common work tasks. Participants completed real-world professional tasks, such as writing Facebook posts, brainstorming ideas, and drafting emails, with or without gen AI. We then assessed both task performance and participants' psychological experiences, including their sense of control, intrinsic motivation, and levels of boredom."
"Gen AI enhanced the quality and efficiency of tasks. For instance, performance reviews written with gen AI were significantly longer, more analytical, and demonstrated a more helpful tone compared to reviews written without assistance. Similarly, emails drafted with gen AI tended to use warmer, more personable language, containing more expressions of encouragement, empathy, and social connection, compared to those written without AI assistance."
"Despite the performance benefits, participants who collaborated with gen AI on one task and then transitioned to a different, unaided task consistently reported a decline in intrinsic motivation and an increase in boredom. Across our studies, intrinsic motivation dropped by an average of 11% and boredom increased by an average of 20%. In contrast, those who worked without AI maintained a relatively steady psychological state."
"gen AI collaboration initially reduces workers' sense of control -- the feeling of being the primary agent of their work. Sense of control is a key component of intrinsic motivation: When people feel that they are not fully in charge of the output, it can undermine their connection to the task. However, we found that transitioning back to solo work restores this sense of control."
"These findings carry important implications for the future of work."
This is from the Harvard Business Review article written by the researchers.
I looked at the study itself. The tasks were: composing a Facebook post followed by an alternative use (for an aluminum can) test, writing a performance review followed by a product improvement (idea generation) task, and writing a welcome email followed by a product promotion (idea generation) task.
This is for studies 1-3. What immediatly struck me as weird about it is the first task is a convergent task and the second is a divergent task. That immediately makes the results suspect. The researchers seem unaware of this, since the words "convergent" and "divergent" never appear in the research paper. But, put simply, a task is "convergent" if everyone who does the task should "converge" on the same result. If you ask 100 people, "What's 2+2?", all of them (hopefully) will converge on the same result. A task is "divergent" if everyone who does the task should produce different results. "What are all possible uses for a blanket and a brick?" is an example of a "divergent" task. You'll get different results from everyone you ask, and your list of best uses might include items from many different people. If you asked 100 people and made a "top 10" list, each item on the top 10 could be from a different person.
Maybe somebody noticed there was something weird about this design because in study 4, they switched it up. In study 4, they chose 2 of the tasks, writing the Facebook post and composing the welcome email, and had people do them in random order. So half of the people did the Facebook post first and welcome email second and half of the people did the welcome email first and the Facebook post second. They further randomly did all 4 combinations of solo/solo, AI-collaboration/solo, solo/AI-collaboration, and AI-collaboration/AI-collaboration.
The paper has a lot of confusing tables and graphs showing the results. I tried looking through the tables for low p-values (indicating statistically significant results). The first 3 studies do show an increase in sense of control, a decrease in intrinsic motivation, and an increase in boredom when switching from an AI-collaboration task to a solo task. In study 4, when using AI, people's sense of control is lower, and when doing the task entirely by oneself, sense of control is higher, so that result carries over robustly. That holds true regardless of which task is the AI-assisted one or which order they are done. Intrinsic motivation is always lower on the second task, but when switching from an AI-assisted task to a non-AI-assisted task, it goes down a lot more, indicating that indeed, the AI-collaboration has an effect. (Other than saying they measured "intrinsic motivation" with 4 questions that each have a 7-point scale, they don't say how exactly they measured "intrinsic motivation". I guess if I was so motivated, I could track down the study they reference where presumably the exact questions are revealed.) Boredom increased when going from the first task to the second task, and at least in study 4, it didn't look to me like it mattered which task or whether any were or were not AI-assisted. Maybe by the time they started the second task, people were starting to realize they were participating in a psychological experiment and started wondering what's for lunch.
All in all, an interesting bit of research, but having gone through the original research paper, I feel like more research should be done and not too much stock should be put into this one paper. (I feel kinda bad saying that because they used a huge number of participants and were obviously trying to produce rigorous results.) In addition to the issue of convergent vs divergent tasks I noted above, I think this research is too short-term, and, at least for me personally, it didn't study software development, so none of its conclusions should be extrapolated to what I do, which is software development. The study asked people to do tasks that take minutes, not days or weeks or months or years, so we don't know what psychological effect there might be from AI-assisted work over long time horizons. Maybe the effects described here are short-term and go away, or maybe they compound and have even more dramatic effect over longer timeframes. We don't know. And it would be interesting to see a study focused specifically on software development tasks.
0 notes
Text
youtube
The Kubernetes project does not allow AI-generated code, according to Kat Cosgrove, the engineer in charge of Kubernetes's release process.
Kubernetes is one of those technologies that you all use every day, probably, even if you don't know it. It's a system for making "cloud" services highly scalable by using a cluster of computers with "containers". Originally Docker containers, but now Kubernetes works with various open source container systems. The idea is you take some software, say a web server, and you package it up into a "container". If your service goes from thousands of users to millions to billions, you tell Kubernetes to create more and more instances of your web server container on your cluster -- and you add more machines to your cluster as you need to. You can have containers for database servers or whatever you want. Containers can contain whatever software you write. Whenever you use a mobile app, there's probably server-side code that it interacts with, and for the big apps, it's probably using Kubernetes. People call Kubernetes a "container orchestration" system.
Anyway, engineers are allowed to use AI for their own edification, to analyze code or generate documentation for their own use, but are not allowed to use it as part of the official release. Engineers are not only not allowed to use AI to generate code for Kubernetes releases, they're not allowed to use AI to write documentation, which has to be all human-written and is taken very, very seriously. High-quality documentation is considered key to Kubernetes's success.
This is all about 52 minutes in on a conversation about Kubernetes's history and how its current release process came to be. I mention it because I followed along with no problem up to this point but was suddenly surprised by the anti-AI position taken. Maybe I shouldn't have been -- I've been sharing stories with you all of various AI failures and shortcomings -- but it seems like failures and shortcomings have no effect on the industry's trajectory and everyone everywhere is requiring -- not just allowing but requiring -- engineers to use AI. Kubernetes has taken the opposite tack and prohibited it. It's a big deal because Kubernetes is the world's second largest open source project -- after Linux itself (which Kubernetes is built on).
0 notes