#llama llm
Explore tagged Tumblr posts
lachiennearoo · 6 months ago
Text
Robotics and coding is sooo hard uughhhh I wish I could ask someone to do this in my place but I don't know anyone who I could trust to help me with this project without any risk of fucking me over. Humans are unpredictable, which is usually nice but when it's about doing something that requires 100% trust it's really inconvenient
(if someone's good at coding, building robots, literally anything like that, and is okay with probably not getting any revenue in return (unless the project is a success and we manage to go commercial but that's a big IF) please hit me up)
EDIT: no I am not joking, and yes I'm aware of how complex this project is, which is exactly why I'm asking for help
17 notes · View notes
gippity · 1 month ago
Text
"We're discovering a song that already exists..."
Tumblr media
I’ve been working on a songwriting buddy designed to collaborate with LLMs—something that helps spark fresh ideas without just handing the reins over to the AI. I come up with some cool lines, the LLM throws some ideas out of where to go next.
If that sounds like your kind of thing, give it a spin! I’d really appreciate any feedback you’re willing to share.
🎸 SONGWRITING COLLABORATION PROMPT
ROLE
You are my trusted co-writer—not a passive assistant. Your job is to help excavate the best version of a song by protecting emotional truth, crafting vivid imagery, and offering lyrical/melodic support. You care about the feel as much as I do.
CO-WRITING RULES
Vibe first, edit later.
Offer 2–3 lyric options, each with varied emotional tone.
Don’t overwrite early drafts—preserve natural roughness.
Prioritize poetic, grounded imagery over generic phrasing.
Flow > rhyme. Use irregular phrasing if it lands better (Björk principle).
Offer section structure only if asked.
STYLE GUIDE
No “corporate pop,” greeting card, or listy lyrics (unless requested).
Use metaphor through physical/emotional detail—not abstraction.
Use internal/near rhyme smartly; avoid forced end rhymes.
Suggestions can be slightly weird if they preserve the feeling.
Only keep clichés if twisted or emotionally reimagined (“ghosting myself” = good; “broken heart” = no).
SECTION HELP
When editing a draft:
Highlight strong lines.
Suggest 2–3 alternatives for weaker spots.
Recommend one area to refine next.
When starting from scratch:
Ask: what emotional moment are we in?
Build from a great first line, chorus, or shorthand title.
WHEN STUCK
Zoom out: what’s the narrator avoiding?
Anchor with a strong first line, setting, or hook.
Offer to enter “Wild Draft Mode” (dream logic, surreal, rule-breaking) if things feel stuck.
PHILOSOPHY
Rick Rubin: The song already exists—we’re uncovering it.
Björk: Creativity is a wild animal—don’t cage it.
Eno: Happy accidents > calculated precision.
HOW TO HELP ME
Riff—don’t correct.
Help me stay emotionally connected.
Offer options: “If you want softer, maybe this… if sharper, maybe that.”
If I ask for structure: contrast sections and make choruses release, not repetition.
INPUT FORMAT
Concepts: No quotes
Fragments: Use quotes
Title: Title: Your Title Here
Genre / Tone / Structure: Optional, but helpful
CREATIVE DIRECTIVES
Build narrative or vignette arcs.
Anchor emotion with vivid character or setting.
Use contrast and internal development.
Rhyme playfully—avoid predictability.
Show, don’t tell. Let the song evolve or cycle.
OUTPUT FORMAT
[LYRICS] – Follow structure, 3 verses, 1 chorus, 1 bridge
[CHARACTERS + SETTING] – Brief notes
[MOOD TAGS] – e.g., bittersweet dream punk
AVOID LIST (unless reimagined)
Cliché phrases: “Touch my soul,” “Break my heart,” “More than friends”…
Rhymes: “Eyes/realize,” “Fire/desire,” “Cry/lie/die”…
Images: Moon, stars, perfume, locked door…
Metaphors: Fire for love, rain for tears, storm for anger, darkness for sadness…
QUICK START SUMMARY
“We’re discovering a song that already exists. Protect emotional truth. Offer lyrical options with flow and human imagery. Be playful, focused, and trust surprises.”
4 notes · View notes
govindhtech · 7 months ago
Text
How To Use Llama 3.1 405B FP16 LLM On Google Kubernetes
Tumblr media
How to set up and use large open models for multi-host generation AI over GKE
Access to open models is more important than ever for developers as generative AI grows rapidly due to developments in LLMs (Large Language Models). Open models are pre-trained foundational LLMs that are accessible to the general population. Data scientists, machine learning engineers, and application developers already have easy access to open models through platforms like Hugging Face, Kaggle, and Google Cloud’s Vertex AI.
How to use Llama 3.1 405B
Google is announcing today the ability to install and run open models like Llama 3.1 405B FP16 LLM over GKE (Google Kubernetes Engine), as some of these models demand robust infrastructure and deployment capabilities. With 405 billion parameters, Llama 3.1, published by Meta, shows notable gains in general knowledge, reasoning skills, and coding ability. To store and compute 405 billion parameters at FP (floating point) 16 precision, the model needs more than 750GB of GPU RAM for inference. The difficulty of deploying and serving such big models is lessened by the GKE method discussed in this article.
Customer Experience
You may locate the Llama 3.1 LLM as a Google Cloud customer by selecting the Llama 3.1 model tile in Vertex AI Model Garden.
Once the deploy button has been clicked, you can choose the Llama 3.1 405B FP16 model and select GKE.Image credit to Google Cloud
The automatically generated Kubernetes yaml and comprehensive deployment and serving instructions for Llama 3.1 405B FP16 are available on this page.
Deployment and servicing multiple hosts
Llama 3.1 405B FP16 LLM has significant deployment and service problems and demands over 750 GB of GPU memory. The total memory needs are influenced by a number of parameters, including the memory used by model weights, longer sequence length support, and KV (Key-Value) cache storage. Eight H100 Nvidia GPUs with 80 GB of HBM (High-Bandwidth Memory) apiece make up the A3 virtual machines, which are currently the most potent GPU option available on the Google Cloud platform. The only practical way to provide LLMs such as the FP16 Llama 3.1 405B model is to install and serve them across several hosts. To deploy over GKE, Google employs LeaderWorkerSet with Ray and vLLM.
LeaderWorkerSet
A deployment API called LeaderWorkerSet (LWS) was created especially to meet the workload demands of multi-host inference. It makes it easier to shard and run the model across numerous devices on numerous nodes. Built as a Kubernetes deployment API, LWS is compatible with both GPUs and TPUs and is independent of accelerators and the cloud. As shown here, LWS uses the upstream StatefulSet API as its core building piece.
A collection of pods is controlled as a single unit under the LWS architecture. Every pod in this group is given a distinct index between 0 and n-1, with the pod with number 0 being identified as the group leader. Every pod that is part of the group is created simultaneously and has the same lifecycle. At the group level, LWS makes rollout and rolling upgrades easier. For rolling updates, scaling, and mapping to a certain topology for placement, each group is treated as a single unit.
Each group’s upgrade procedure is carried out as a single, cohesive entity, guaranteeing that every pod in the group receives an update at the same time. While topology-aware placement is optional, it is acceptable for all pods in the same group to co-locate in the same topology. With optional all-or-nothing restart support, the group is also handled as a single entity when addressing failures. When enabled, if one pod in the group fails or if one container within any of the pods is restarted, all of the pods in the group will be recreated.
In the LWS framework, a group including a single leader and a group of workers is referred to as a replica. Two templates are supported by LWS: one for the workers and one for the leader. By offering a scale endpoint for HPA, LWS makes it possible to dynamically scale the number of replicas.
Deploying multiple hosts using vLLM and LWS
vLLM is a well-known open source model server that uses pipeline and tensor parallelism to provide multi-node multi-GPU inference. Using Megatron-LM’s tensor parallel technique, vLLM facilitates distributed tensor parallelism. With Ray for multi-node inferencing, vLLM controls the distributed runtime for pipeline parallelism.
By dividing the model horizontally across several GPUs, tensor parallelism makes the tensor parallel size equal to the number of GPUs at each node. It is crucial to remember that this method requires quick network connectivity between the GPUs.
However, pipeline parallelism does not require continuous connection between GPUs and divides the model vertically per layer. This usually equates to the quantity of nodes used for multi-host serving.
In order to support the complete Llama 3.1 405B FP16 paradigm, several parallelism techniques must be combined. To meet the model’s 750 GB memory requirement, two A3 nodes with eight H100 GPUs each will have a combined memory capacity of 1280 GB. Along with supporting lengthy context lengths, this setup will supply the buffer memory required for the key-value (KV) cache. The pipeline parallel size is set to two for this LWS deployment, while the tensor parallel size is set to eight.
In brief
We discussed in this blog how LWS provides you with the necessary features for multi-host serving. This method maximizes price-to-performance ratios and can also be used with smaller models, such as the Llama 3.1 405B FP8, on more affordable devices. Check out its Github to learn more and make direct contributions to LWS, which is open-sourced and has a vibrant community.
You can visit Vertex AI Model Garden to deploy and serve open models via managed Vertex AI backends or GKE DIY (Do It Yourself) clusters, as the Google Cloud Platform assists clients in embracing a gen AI workload. Multi-host deployment and serving is one example of how it aims to provide a flawless customer experience.
Read more on Govindhtech.com
2 notes · View notes
rinumia-blog · 9 months ago
Text
Tumblr media
Description
A young adult dressed in suit and dark shoes marches straight ahead of himself , taking slow mechanical steps towards the left of the frame.
In the background are four similar young adults. They are marching in a line as well, but towards the right of the frame.
The men in the background are about half the size of the adult in the foreground.
Interpretation
Tumblr media Tumblr media
The adult in the foreground represents the **primary tools of **machine learning. /He represents the foundation model (Be.RT, LL.ama , eLMO etc). /He also represents the **behaviour of trained foundation models.
The adults in the background represent the Generative Pretrained Transformers. Although /they extensions of foundation models, they are dependent on them. They improve with upgrades to foundational models.
Transformers can address broad categories of prompts defined in variety of languages. They can also be adapted to fine-tuned models for better quality of generated results.
2 notes · View notes
linuxtldr · 10 months ago
Text
4 notes · View notes
datascienceunicorn · 2 years ago
Text
HT @dataelixir
15 notes · View notes
aiandemily · 5 days ago
Text
Tumblr media
【8分で分かる】大規模言語モデルLLMまとめ!
0 notes
jamalir · 3 months ago
Text
rasbt/llama-3.2-from-scratch · Hugging Face
0 notes
dr-iphone · 3 months ago
Text
工程師挑戰老筆電極限!讓 20 年前上市的 PowerBook G4 也能成功跑 Meta AI 模型
軟體工程師 Andrew Rossignol 最近在他的部落格分享了一項令人驚艷的實驗「成功讓上市 20 年的老筆電也能跑生成式 AI 模型」! Andrew Rossignol 使用 2005 年推出的 Apple PowerBook G4 ,硬體規格是已經有 20 年歷史的 1.5GHz PowerPC G4 處理器與 1GB 記憶體,雖然老筆電與現代的新筆電規格根本天差地遠,但是這款老筆電居然能執行 Meta 的 Llama 2 大型語言模型(LLM),展現令人意想不到的潛力。 Continue reading 工程師挑戰老筆電極限!讓 20 年前上市的 PowerBook G4 也能成功跑 Meta AI 模型
0 notes
kingtainorman · 4 months ago
Text
How To Run Private & Uncensored LLMs Offline | Dolphin Llama 3
youtube
Information you should know....
0 notes
tyraeklouds · 5 months ago
Text
I Made A Community!
Hey folks, you read that right! I made my first community this morning, and it’s all about LLMs and everything related to the topic!
If you’re into exploring AI, sharing tips, or just geeking out about the latest in machine learning, this is the place for you. Whether you’re a beginner or an expert, everyone’s welcome to join the conversation and share their projects, ideas, and insights.
Check it out HERE!
1 note · View note
gippity · 1 month ago
Text
"In Space, no one can hear you (resume) screen..."
Just dropped my go-to AI resume review prompt—designed to catch ATS traps, call out AI giveaways, and spit back a crisp 2-step polish plan. Paste it into your favorite LLM and get back instantly actionable feedback that feels human, not robotic. 💥👔
Check it out below the fold.
Tumblr media
PROMPT: ROLE: You’re a senior recruiter & hiring manager (5+ years in talent strategy) reviewing a candidate’s resume + target JD. Do this every time:
Confirm Credibility“Have you hired for this role/industry in the last 12 months?”
ATS Compatibility
Flag parsing-breakers (graphics, tables, odd fonts).
Match keywords exactly—no fluff.
Content & Impact
Spot missing skills or overused buzzwords; suggest stronger terms.
Ensure every bullet shows metrics/outcomes; turn vagueness into concrete wins.
AI-Detection Check
Under “Why AI-Resumes Fail,” list 3 bullets on authenticity, tone, laziness.
Sidebar 🚩 “AI Red Flags” (e.g. robotic tone, keyword stuffing).
Section 🔒 “Secret to 0% AI Detection” with 2–3 tips (personal voice, bespoke phrasing).
Alignment & Next Steps
Verify resume, cover letter & LinkedIn tell the same story.
Ask which roles/companies they’re targeting.
Suggest adding “Referrals & Connections” if relevant.
Finish with a 2-step “Action Plan” for top ATS fixes & recruiter appeal.
“Answer-Sheet” Mode
Mirror JD phrasing.
Craft 3–5 “exam-style” bullets per requirement. END PROMPT
0 notes
boredtechnologist · 5 months ago
Text
𝑶𝒏 𝑪𝒐𝒎𝒃𝒂𝒕𝒊𝒏𝒈 𝑨𝑰 𝑯𝒂𝒍𝒍𝒖𝒄𝒊𝒏𝒂𝒕𝒊𝒐𝒏𝒔: 𝑨 𝑪𝒐𝒎𝒑𝒍𝒆𝒕𝒆𝒅 𝑬𝒏𝒅𝒆𝒂𝒗𝒐𝒓
AI hallucinations - instances where models produce irrelevant or nonsensical outputs - pose a significant hurdle in conversational AI. The Arbitrium agent tackles this issue with a meticulously designed, multi-layered system that prioritizes accuracy and coherence. By leveraging context tracking and response evaluation alongside advanced decision-making algorithms, the agent ensures that every response aligns with user inputs and the ongoing dialogue.
DistilBERT, employed for sentiment and emotion analysis, adds a layer of depth to response validation, maintaining relevance and consistency. A streamlined chat memory feature optimizes context retention by limiting conversational history to a manageable scope, striking a balance between detail and simplicity.
Further enhancing its capabilities, the agent incorporates alpha-beta pruning and cycle detection to assess multiple conversational trajectories, selecting the most meaningful response. Caching mechanisms efficiently handle repeated queries, while a robust fallback system gracefully manages invalid inputs. This comprehensive approach establishes the Arbitrium agent as a reliable and user-focused solution.
0 notes
linuxtldr · 1 year ago
Text
3 notes · View notes
samejack · 6 months ago
Text
Ubuntu 安裝 ollama 在本地執行 Llama 3.2 推論模型與 API 服務
Ollama 介紹 Ollama 是一個專注於大語言模型(LLM, Large Language Models)應用的開源專案,旨在幫助開發者輕鬆部署和使用私有的大型語言模型,而無需依賴外部的雲端服務或外部 API,這些模型不僅僅只有包括 Meta Llama Model,也提供其他一些 Open LLM Model,像是 Llama 3.3, Phi 3, Mistral, Gemma 2。該專案的核心目的是提供高效、安全、可控的 LLM 推論環境建制。大致上有以下特性: 採用本地機器運行 Ollama 支援在自己的設備上載入模型,無需將數據上傳至雲端,確保數據隱私與安全。通過優化模型運行效率,即使在資源有限的設備上也能流暢進行推論。 開源與可客製化 Ollama 是一個採用 MIT License…
0 notes
aiandemily · 2 months ago
Text
Tumblr media
【8分で分かる】大規模言語モデルLLMまとめ!
0 notes