squeeze1749 - Tumblr blog

squeeze1749 · 5 months ago

Text

It runs slower because the javascript frontend is the bottleneck on a slow device.

Deepseek V3 and R1 are 600+ billion parameters each. So you immediately have a lower bound of 600 GB even if every parameter is only 1 byte (usually they're 2 bytes or 4 bytes). If you want, you can go on huggingface and download it for yourself to see how long it takes with your internet connection. ChatGPT doesn't make you download terabytes of data just to get an answer. Massive LLMs are not running in the browser.

That's the value these companies provide. They run the model on their fast servers. You pay them to deal with that technical nightmare so you don't have to.

As for the export controls, Deepseek apparently has a huge cluster of H100s. The point is that the export controls did not actually control the exports. This model is not a product of them scrimping and saving and making do with what little they had. This is one of the best models in the world trained on one of the best compute clusters in the world. "Compute scale alone" is still the best predictor of model quality.

well. in AI news, it seems like a Chinese company has managed to match OpenAI's most advanced models in efficacy and is taking it to market for 3% of what OpenAI charges.

which may be the first tolling of the bell for the ai bubble to finally pop

5K notes · View notes

squeeze1749 · 5 months ago

Text

They don't really need motor inputs in training data. By the time they're transfering the model to the real world, it already knows how to move around. But it doesn't know how to handle all the crazy stuff it's never seen before when it has to actually touch grass for the first time. There just isn't very much video footage of robots doing things and there isn't much first-person footage of people doing mundane things.

My vague understanding of the embodied pipeline is that most of it happens in simulated environments and then they try to transfer that to the real world. It's much cheaper to run a million simulations than it is to build a million robots or wait for one robot to do a million things. Boston Dynamics often talks about this approach.

We can make robots do basically whatever we want in simulations now. The hurdle is transferring that to the real world.

Meta has definitely been working on building up the foundations for embodied stuff with SegmentAnything and the Ego-Exo4D dataset and probably a bunch of other stuff I'm not remembering right now. But Ego-Exo4D is only 1300 hours of video and we can't generate synthetic data for that yet. I don't think VR would help them because they don't need more simulated data. I think the always-on first-person cameras in their sunglasses are a better way to scrape the human experience at scale.

I think they were betting on remote work and education and basically becoming the next generation of the Chromebook.

They also need some way to make money that isn't illegal in the EU (and increasingly the rest of the world). Google has GCP and various doodads and can sell SEO ranking, apple has phones and laptops and they can sell the default search engine in Safari, microsoft has laptops and software licenses and Azure, Amazon has AWS and Amazon, Netflix can sell access to content. Meta has nothing except the world's most lucrative surveillance network. No cloud services, no physical devices, no software or content to license. So I guess they figured VR devices were the easiest market to break into since everyone else thinks it's a stupid product and won't try to compete.

Taken at face value the metaverse is so obviously stupid that I have a hard time understanding why Facebook would invest anything in it at all. I kinda wonder if originally the idea was to subsidize data production for embodied AI? which may or may not work, but would be potentially non-stupid.

Like fundamentally the big issue in robotics and embodied AI is that you can't train the robotic version of GPT because you don't have the data. You need millions of hours, not just of video footage, but of video footage in which people are manipulating objects with motor inputs that you can directly record, and there's just no way you can get that much data manually. From a business standpoint then, the metaverse stuff would have a clear purpose -- get enough humans producing motor input/video data that you can train an embodied AI from it, and then have exclusive ownership of the data needed to make humanoid robotics. In this view, the metaverse doesn't need to be self-sustaining, it just needs to be cheaper than hiring thousands of people to wear specialized recording equipment and mess around with objects for years at a time.

Of course this is probably not correct, a fallacy of trying to make sense from senseless action. Sometimes companies just do really stupid stuff. But it's at least plausible enough to keep an eye on.

12 notes · View notes

squeeze1749 · 6 months ago

Text

Better architectures don't really come from searching the existing architecture space. Convolution layers are a new thing that you won't ever find if you only search for better ways to arrange a multilayer perceptron. AlphaZero is a new way to do RL. Attention is a new thing people bolted onto LSTMs and RNNs, and then later it became the transformer. None of these is a straightforward rearrangement of existing pieces.

Evolutionary algorithms are optimization algorithms. They find local minima just like gradient descent and every other optimization algorithm. They don't generate new ideas or create new capabilities.

The architecture can't be trained to look like something that doesn't exist yet. That's a basic limitation of machine learning. When a model trains on data, the model can't wave a wand to fix itself and instead train on some better data that doesn't exist yet.

Eventually, they'll get RL and NLP working together well enough to create self-improving models. OpenAI is doing work in that direction with the "o" series of models and others probably will too. But the "self-improvement" will still be the same scientific method everyone uses today. Only automated. It will still have to train lots of small models as experiments and so on. The scientific method is the only way to create better architectures, is my point. No one can shortcut that by training a model that just knows what the best architecture is by magic.

still chewing on neural network training and I don't like the way that the network architecture has to be figured out by trial and error, that feels like something that should be learned too!

evolutionary algorithms would achieve that but only at enormous cost: over 300,000 new human brains are created each day but new LLMs only come out at a rate of a what, a handful a year? a dozen?

(on top of biological evolution we also have cultural evolution, which LLMs can benefit from if they can access external resources, and also market mechanisms for resource allocation which LLMs are also subject to).

27 notes · View notes