#any time ML boosters try to say that ML is great for science | Explore Tumblr posts and blogs

cellarspider · 1 year ago

Text

I work at a biological research institute. The labs who know what they're doing don't use Large Language Models. They're focused using Linear Mixed Models (confusingly acronym'd to LMMs). Lots of image segmentation and video tracking. This is still tetchy and needs a lot of careful planning to make it work right, but the most successful project has actually taken a lot of painful drudgery away from lab techs and students. They've been working on those projects since before the current craze. They're knowledgeable and realistic about the limitations of what they can achieve with it.

...The labs that don't seem to have a strong basis in machine learning are applying Large Language Models to whatever projects they can think of. I sat in on a computational science group mini-symposium a few months ago, and some of the computational scientists had been internally contracted to make ChatGPT-based tools and other LLMs. Trying to do automatic annotation or summarization of complex results, if I remember correctly. The computational scientists didn't seem to fully understand the biology they're trying to measure, and provided no good metrics on how they were scoring reliability of their results.

So, you have people requesting LLMs who don't know what they can do, paired with people who can implement LLMs without understanding the end goal. The projects themselves were mostly of low impact, and those that were more potentially impactful were the least well-characterized.

A lot of the stuff we deal with is out at the edge of known biology, and in a lot of cases, ML simply acts as an extra unknown factor that adds in needless complexity. A lot of boosters of ML do not seem to understand that, because they need to know both the subject matter and the method to assess if it's appropriate to the task.

And let's recall, LLMs are not based on some fundamentally new ML architecture. They are just Large. New tech has allowed them to run faster and pull in more data, which has, as said above, not been well-vetted. This is absolutely a fad.

Thank the hecking stars I've managed to train the biologists in my group to be wary of LLMs.

#of course that's not even getting into the ethics issues #or the reproducibility issues #any time ML boosters try to say that ML is great for science #it's not their ML they're actually talking about #and I cannot stress enough that LLMs are not built on new architecture #I'm reminded of all the combinatorial drug trials industry was doing last time I sat in on an immunology short course a couple years ago #find one thing that might work and then just combine it with everything else like you're playing a point-and-click adventure game #and you've run out of ideas but you figure ONE of these has got to work

37K notes · View notes