#lstm
Explore tagged Tumblr posts
allthelovetoart · 5 months ago
Text
Tumblr media Tumblr media Tumblr media
I will always reassemble to fit perfectly in you
Lonely is the Muse - Halsey
Background Photo behind album // A camera sitting on top of a table next to a book -
Daizy Isumi
11 notes · View notes
volansystechnologies · 1 year ago
Text
0 notes
girlwithmanyproblems · 2 years ago
Text
guys how to apply regression and then lstm ?
1 note · View note
israbelle · 1 year ago
Text
AGI means artificial general intelligence, the theoretical "computer with a soul" who crosses the borders of being alive so that would definitely cause more confusion lmaoo
Tumblr media
71K notes · View notes
skilldux · 8 months ago
Text
Long Short-Term Memory (LSTM) neural networks have become a potent tool in the fast-developing field of artificial intelligence (AI) for processing sequential input. There are many courses available to help you master LSTM in deep learning, regardless of your level of experience. This post will walk you through the fundamentals of LSTM neural networks and provide a list of some of the top online training programs.
0 notes
softlabsgroup05 · 1 year ago
Text
Tumblr media
Discover the fundamentals of artificial intelligence through our Convolutional Neural Network (CNN) mind map. This visual aid demystifies the complexities of CNNs, pivotal for machine learning and computer vision. Ideal for enthusiasts keen on grasping image processing. Follow Softlabs Group for more educational resources on AI and technology, enriching your learning journey.
0 notes
culturesupport · 1 year ago
Text
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
Do you Think You Can Be Multi Robot Person Builded and Fall in Love With Robot From Your Own Sense Feelings After Your Body Copied into A Robot Body ?
Feeling Other Robots Love Feelings Trough Connected Robot Feelings ?
0 notes
neurospring · 3 months ago
Text
History and Basics of Language Models: How Transformers Changed AI Forever - and Led to Neuro-sama
I have seen a lot of misunderstandings and myths about Neuro-sama's language model. I have decided to write a short post, going into the history of and current state of large language models and providing some explanation about how they work, and how Neuro-sama works! To begin, let's start with some history.
Before the beginning
Before the language models we are used to today, models like RNNs (Recurrent Neural Networks) and LSTMs (Long Short-Term Memory networks) were used for natural language processing, but they had a lot of limitations. Both of these architectures process words sequentially, meaning they read text one word at a time in order. This made them struggle with long sentences, they could almost forget the beginning by the time they reach the end.
Another major limitation was computational efficiency. Since RNNs and LSTMs process text one step at a time, they can't take full advantage of modern parallel computing harware like GPUs. All these fundamental limitations mean that these models could never be nearly as smart as today's models.
The beginning of modern language models
In 2017, a paper titled "Attention is All You Need" introduced the transformer architecture. It was received positively for its innovation, but no one truly knew just how important it is going to be. This paper is what made modern language models possible.
The transformer's key innovation was the attention mechanism, which allows the model to focus on the most relevant parts of a text. Instead of processing words sequentially, transformers process all words at once, capturing relationships between words no matter how far apart they are in the text. This change made models faster, and better at understanding context.
The full potential of transformers became clearer over the next few years as researchers scaled them up.
The Scale of Modern Language Models
A major factor in an LLM's performance is the number of parameters - which are like the model's "neurons" that store learned information. The more parameters, the more powerful the model can be. The first GPT (generative pre-trained transformer) model, GPT-1, was released in 2018 and had 117 million parameters. It was small and not very capable - but a good proof of concept. GPT-2 (2019) had 1.5 billion parameters - which was a huge leap in quality, but it was still really dumb compared to the models we are used to today. GPT-3 (2020) had 175 billion parameters, and it was really the first model that felt actually kinda smart. This model required 4.6 million dollars for training, in compute expenses alone.
Recently, models have become more efficient: smaller models can achieve similar performance to bigger models from the past. This efficiency means that smarter and smarter models can run on consumer hardware. However, training costs still remain high.
How Are Language Models Trained?
Pre-training: The model is trained on a massive dataset to predict the next token. A token is a piece of text a language model can process, it can be a word, word fragment, or character. Even training relatively small models with a few billion parameters requires terabytes of training data, and a lot of computational resources which cost millions of dollars.
Post-training, including fine-tuning: After pre-training, the model can be customized for specific tasks, like answering questions, writing code, casual conversation, etc. Certain post-training methods can help improve the model's alignment with certain values or update its knowledge of specific domains. This requires far less data and computational power compared to pre-training.
The Cost of Training Large Language Models
Pre-training models over a certain size requires vast amounts of computational power and high-quality data. While advancements in efficiency have made it possible to get better performance with smaller models, models can still require millions of dollars to train, even if they have far fewer parameters than GPT-3.
The Rise of Open-Source Language Models
Many language models are closed-source, you can't download or run them locally. For example ChatGPT models from OpenAI and Claude models from Anthropic are all closed-source.
However, some companies release a number of their models as open-source, allowing anyone to download, run, and modify them.
While the larger models can not be run on consumer hardware, smaller open-source models can be used on high-end consumer PCs.
An advantage of smaller models is that they have lower latency, meaning they can generate responses much faster. They are not as powerful as the largest closed-source models, but their accessibility and speed make them highly useful for some applications.
So What is Neuro-sama?
Basically no details are shared about the model by Vedal, and I will only share what can be confidently concluded and only information that wouldn't reveal any sort of "trade secret". What can be known is that Neuro-sama would not exist without open-source large language models. Vedal can't train a model from scratch, but what Vedal can do - and can be confidently assumed he did do - is post-training an open-source model. Post-training a model on additional data can change the way the model acts and can add some new knowledge - however, the core intelligence of Neuro-sama comes from the base model she was built on. Since huge models can't be run on consumer hardware and would be prohibitively expensive to run through API, we can also say that Neuro-sama is a smaller model - which has the disadvantage of being less powerful, having more limitations, but has the advantage of low latency. Latency and cost are always going to pose some pretty strict limitations, but because LLMs just keep geting more efficient and better hardware is becoming more available, Neuro can be expected to become smarter and smarter in the future. To end, I have to at least mention that Neuro-sama is more than just her language model, though we only talked about the language model in this post. She can be looked at as a system of different parts. Her TTS, her VTuber avatar, her vision model, her long-term memory, even her Minecraft AI, and so on, all come together to make Neuro-sama.
Wrapping up - Thanks for Reading!
This post was meant to provide a brief introduction to language models, covering some history and explaining how Neuro-sama can work. Of course, this post is just scratching the surface, but hopefully it gave you a clearer understanding about how language models function and their history!
31 notes · View notes
argumate · 5 months ago
Text
hithisisawkward said: Master’s in ML here: Transformers are not really monstrosities, nor hard to understand. The first step is to go from perceptrons to multi-layered neural networks. Once you’ve got the hand of those, with their activation functions and such, move on to AutoEncoders. Once you have a handle on the concept of latent space ,move to recurrent neural networks. There are many types, so you should get a basic understading of all, from simple recurrent units to something like LSTM. Then you need to understand the concept of attention, and study the structure of a transformer (which is nothing but a couple of recurrent network techniques arranged in a particularly clever way), and you’re there. There’s a couple of youtube videos that do a great job of it.
thanks, autoencoders look like a productive topic to start with!
16 notes · View notes
bharatpatel1061 · 21 days ago
Text
Memory and Context: Giving AI Agents a Working Brain
Tumblr media
For AI agents to function intelligently, memory is not optional—it’s foundational. Contextual memory allows an agent to remember past interactions, track goals, and adapt its behavior over time.
Memory in AI agents can be implemented through various strategies—long short-term memory (LSTM) for sequence processing, vector databases for semantic recall, or simple context stacks in LLM-based agents. These memory systems help agents operate in non-Markovian environments, where past information is crucial to decision-making.
In practical applications like chat-based assistants or automated reasoning engines, a well-structured memory improves coherence, task persistence, and personalization. Without it, AI agents lose continuity, leading to erratic or repetitive behavior.
For developers building persistent agents, the AI agents service page offers insights into modular design for memory-enhanced AI workflows.
Combine short-term and long-term memory modules—this hybrid approach helps agents balance responsiveness and recall.
Image Prompt: A conceptual visual showing an AI agent with layers representing short-term and long-term memory modules.
3 notes · View notes
munmun · 3 months ago
Text
stream of consciousness about the new animation vs. coding episode, as a python programmer
holy shit, my increasingly exciting reaction as i realized that yellow was writing in PYTHON. i write in python. it's the programming language that i used in school and current use in work.
i was kinda expecting a print("hello world") but that's fine
i think using python to demonstrate coding was a practical choice. it's one of the most commonly used programming languages and it's very human readable.
the episode wasn't able to cram every possible concept in programming, of course, but they got a lot of them!
fun stuff like print() not outputting anything and typecasting between string values and integer values!!
string manipulation
booleans
little things like for-loops and while-loops for iterating over a string or list. and indexing! yay :D
* iterable input :D (the *bomb that got thrown at yellow)
and then they started importing libraries! i've never seen the turtle library but it seems like it draws vectors based on the angle you input into a function
the gun list ran out of "bullets" because it kept removing them from the list gun.pop()
AND THEN THE DATA VISUALIZATION. matplotlib!! numpy!!!! my beloved!!!!!!!! i work in data so this!!!! this!!!!! somehow really validating to me to see my favorite animated web series play with data. i think it's also a nice touch that the blue on the bars appear to be the matplotlib default blue. the plot formatting is accurate too!!!
haven't really used pygame either but making shapes and making them move based on arrow key input makes sense
i recall that yellow isn't the physically strongest, but it's cool to see them move around in space and i'm focusing on how they move and figure out the world.
nuke?!
and back to syntax error and then commenting it out # made it go away
cool nuke text motion graphics too :D (i don't think i make that motion in python, personally)
and then yellow cranks it to 100,000 to make a neural network in pytorch. this gets into nlp (tokenizers and other modeling)
a CLASS? we touch on some object oriented programming here but we just see the __init__ function so not the full concept is demonstrated here.
OH! the "hello world" got broken down into tokens. that's why we see the "hello world" string turn into numbers and then... bits (the 0s and 1s)? the strings are tokenized/turned into values that the model can interpret. it's trying to understand written human language
and then an LSTM?! (long short-term memory)
something something feed-forward neural network
model training (hence the epochs and increasing accuracy)
honestly, the scrolling through the code goes so fast, i had to do a second look through (i'm also not very deeply versed in implementing neural networks but i have learned about them in school)
and all of this to send "hello world" to an AI(?) recreation of the exploded laptop
not too bad for a macbook user lol
i'm just kidding, a major of people used macs in my classes
things i wanna do next since im so hyped
i haven't drawn for the fandom in a long time, but i feel a little motivated to draw my design of yellow again. i don't recall the episode using object oriented programming, but i kinda want to make a very simple example where the code is an initialization of a stick figure object and the instances are each of the color gang.
it wouldn't be full blown AI, but it's just me writing in everyone's personality traits and colors into a function, essentially since each stick figure is an individual program.
5 notes · View notes
deepdrearn · 4 months ago
Text
Tumblr media
The author’s lone goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result. We take a previously strong language model based only on boring LSTMs and get it to within a stone’s throw of a stone’s throw of state-of-the-art byte level language model results on enwik8. This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author’s small studio apartment far too warm in the midst of a San Franciscan summer. The final results are achievable in plus or minus 24 hours on a single GPU as the author is impatient.
3 notes · View notes
hoadv2 · 6 months ago
Text
📈 Dự đoán giá cổ phiếu hiệu quả bằng mô hình LSTM – Cơ hội đột phá cho nhà đầu tư! 💰
🌟 Bạn đang tìm kiếm cách dự đoán xu hướng thị trường chứng khoán để tối ưu hóa lợi nhuận? Hôm nay, mình sẽ giới thiệu với các bạn một phương pháp AI mạnh mẽ – mô hình LSTM (Long Short-Term Memory) 🤖. Đây là loại mạng nơ-ron đặc biệt, rất hiệu quả trong việc phân tích chuỗi thời gian, giúp bạn dự đoán giá cổ phiếu chính xác hơn!
💡 Lợi ích nổi bật của mô hình LSTM: 1️⃣ Xử lý dữ liệu chuỗi thời gian cực tốt ⏳. 2️⃣ Khả năng dự đoán chính xác ngay cả khi dữ liệu phức tạp 📊. 3️⃣ Giảm thiểu rủi ro nhờ phân tích sâu xu hướng thị trường 🔍.
🛠️ Trong bài viết trên website, mình đã hướng dẫn chi tiết cách áp dụng mô hình LSTM vào phân tích giá cổ phiếu, từ việc thu thập dữ liệu, xử lý dữ liệu đến xây dựng và đánh giá mô hình 💼.
🌐 Bạn sẽ học được gì?:
Các bước triển khai mô hình LSTM với mã nguồn đầy đủ ���.
Mẹo tối ưu hóa mô hình để dự đoán chính xác hơn 💪.
Phân tích các lỗi thường gặp và cách khắc phục 🛑.
Đừng bỏ lỡ bài viết nếu bạn muốn làm chủ công nghệ dự đoán giá cổ phiếu và cải thiện chiến lược đầu tư của mình 📈💵! 👉 Cách dự đoán giá cổ phiếu hiệu quả bằng mô hình LSTM
Khám phá thêm những bài viết giá trị tại aicandy.vn
4 notes · View notes
lastscenecom · 1 year ago
Quote
畳み込みニューラル ネットワーク (CNN) CNN は、コンピューター ビジョンの世界における大きな武器です。彼らは、特殊なレイヤーのおかげで、画像内の空間パターンを認識する才能を持っています。この能力により、画像を認識し、その中にある物体を見つけ、見たものを分類することが得意になります。これらのおかげで、携帯電話で写真の中の犬と猫を区別できるのです。 リカレント ニューラル ネットワーク (RNN) RNN はある種のメモリを備えているため、文章、DNA シーケンス、手書き文字、株式市場の動向など、データのシーケンスが関係するあらゆるものに最適です。情報をループバックして、シーケンス内の以前の入力を記憶できるようにします。そのため、文中の次の単語を予測したり、話し言葉を理解したりするなどのタスクに優れています。 Long Short-Term Memory Networks (LSTM) LSTM は、長期間にわたって物事を記憶するために構築された特別な種類の RNN です。これらは、RNN が長いシーケンスにわたって内容を忘れてしまうという問題を解決するように設計されています。段落の翻訳や TV シリーズで次に何が起こるかを予測するなど、情報を長期間保持する必要がある複雑なタスクを扱う場合は、LSTM が最適です。 Generative Adversarial Networks (GAN) 2 つの AI のイタチごっこを想像してください。1 つは偽のデータ (画像など) を生成し、もう 1 つは何が偽物で何が本物かを捕まえようとします。それがGANです。この設定により、GAN は信じられないほどリアルな画像、音楽、テキストなどを作成できます。彼らはニューラル ネットワークの世界の芸術家であり、新しい現実的なデータをゼロから生成します。
ニューラル ネットワークの背後にある数学 |データサイエンスに向けて
6 notes · View notes
girlwithmanyproblems · 11 months ago
Text
3rd July 2024
Goals:
Watch all Andrej Karpathy's videos
Watch AWS Dump videos
Watch 11-hour NLP video
Complete Microsoft GenAI course
GitHub practice
Topics:
1. Andrej Karpathy's Videos
Deep Learning Basics: Understanding neural networks, backpropagation, and optimization.
Advanced Neural Networks: Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and LSTMs.
Training Techniques: Tips and tricks for training deep learning models effectively.
Applications: Real-world applications of deep learning in various domains.
2. AWS Dump Videos
AWS Fundamentals: Overview of AWS services and architecture.
Compute Services: EC2, Lambda, and auto-scaling.
Storage Services: S3, EBS, and Glacier.
Networking: VPC, Route 53, and CloudFront.
Security and Identity: IAM, KMS, and security best practices.
3. 11-hour NLP Video
NLP Basics: Introduction to natural language processing, text preprocessing, and tokenization.
Word Embeddings: Word2Vec, GloVe, and fastText.
Sequence Models: RNNs, LSTMs, and GRUs for text data.
Transformers: Introduction to the transformer architecture and BERT.
Applications: Sentiment analysis, text classification, and named entity recognition.
4. Microsoft GenAI Course
Generative AI Fundamentals: Basics of generative AI and its applications.
Model Architectures: Overview of GANs, VAEs, and other generative models.
Training Generative Models: Techniques and challenges in training generative models.
Applications: Real-world use cases such as image generation, text generation, and more.
5. GitHub Practice
Version Control Basics: Introduction to Git, repositories, and version control principles.
GitHub Workflow: Creating and managing repositories, branches, and pull requests.
Collaboration: Forking repositories, submitting pull requests, and collaborating with others.
Advanced Features: GitHub Actions, managing issues, and project boards.
Detailed Schedule:
Wednesday:
2:00 PM - 4:00 PM: Andrej Karpathy's videos
4:00 PM - 6:00 PM: Break/Dinner
6:00 PM - 8:00 PM: Andrej Karpathy's videos
8:00 PM - 9:00 PM: GitHub practice
Thursday:
9:00 AM - 11:00 AM: AWS Dump videos
11:00 AM - 1:00 PM: Break/Lunch
1:00 PM - 3:00 PM: AWS Dump videos
3:00 PM - 5:00 PM: Break
5:00 PM - 7:00 PM: 11-hour NLP video
7:00 PM - 8:00 PM: Dinner
8:00 PM - 9:00 PM: GitHub practice
Friday:
9:00 AM - 11:00 AM: Microsoft GenAI course
11:00 AM - 1:00 PM: Break/Lunch
1:00 PM - 3:00 PM: Microsoft GenAI course
3:00 PM - 5:00 PM: Break
5:00 PM - 7:00 PM: 11-hour NLP video
7:00 PM - 8:00 PM: Dinner
8:00 PM - 9:00 PM: GitHub practice
Saturday:
9:00 AM - 11:00 AM: Andrej Karpathy's videos
11:00 AM - 1:00 PM: Break/Lunch
1:00 PM - 3:00 PM: 11-hour NLP video
3:00 PM - 5:00 PM: Break
5:00 PM - 7:00 PM: AWS Dump videos
7:00 PM - 8:00 PM: Dinner
8:00 PM - 9:00 PM: GitHub practice
Sunday:
9:00 AM - 12:00 PM: Complete Microsoft GenAI course
12:00 PM - 1:00 PM: Break/Lunch
1:00 PM - 3:00 PM: Finish any remaining content from Andrej Karpathy's videos or AWS Dump videos
3:00 PM - 5:00 PM: Break
5:00 PM - 7:00 PM: Wrap up remaining 11-hour NLP video
7:00 PM - 8:00 PM: Dinner
8:00 PM - 9:00 PM: Final GitHub practice and review
4 notes · View notes
skilldux · 8 months ago
Text
Tumblr media
Long Short-Term Memory (LSTM) neural networks have become a potent tool in the fast-developing field of artificial intelligence (AI) for processing sequential input. There are many courses available to help you master LSTM in deep learning, regardless of your level of experience. This post will walk you through the fundamentals of LSTM neural networks and provide a list of some of the top online training programs.
0 notes