Text
Paper page - Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
0 notes
Text
rasbt/llama-3.2-from-scratch 路 Hugging Face
0 notes
Text
#machine learning#deep learning#llm#gemma3#hugging face#training#qlora#transformers#fine tuning#guide
0 notes
Text
https://x.com/llamafactory_ai/status/1893879214727991504?t=0rz_iG3YO_ppFRatiDqh0A&s=09
0 notes
Text
Meta AI Releases the Video Joint Embedding Predictive Architecture (V-JEPA) Model: A Crucial Step in聽Advancing Machine Intelligence - MarkTechPost
#machine learning#deep learning#meta#video#self supervised learning#video understanding#motion predictions
0 notes
Text
Paper page - Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
0 notes
Text
SmolVLM2: Bringing Video Understanding to Every Device
0 notes
Text
Magma: A Foundation Model for Multimodal AI Agents
0 notes
Text
Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face
1 note
路
View note
Text
TGI Multi-LoRA: Deploy Once, Serve 30 Models
0 notes
Text
Paper page - DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
0 notes
Text
Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL! | Qwen
0 notes
Text
Open-R1: a fully open reproduction of DeepSeek-R1
4 notes
路
View notes
Text
GitHub - DAMO-NLP-SG/VideoLLaMA3: Frontier Multimodal Foundation Models for Image and Video Understanding
#vision transformers#deep learning#machine learning#ml#video language models#video summarization#videollama3
0 notes
Text
Paper page - MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
#machine learning#deep learning#ml#transformers#hugging face#speech to text#text to speech#multi modal#llm
1 note
路
View note
Text
Paper page - 1.58-bit FLUX
0 notes
Text
Apollo: An Exploration of Video Understanding in Large Multimodal Models
#ml#machine learning#deep learning#transformers#multi modal#video language models#video summarization#video understand
0 notes