#llama api
Explore tagged Tumblr posts
Text
#Meta y su Aplicación de AI + LlamaCon
El pasado martes 29 de abril, Meta lanzo la primera versión de la aplicación Meta AI: el asistente que conoce tus preferencias, recuerda el contexto y está personalizado para ti. Además, se realizo la LlamaCon, siendo esta la primera edición del evento que celebra la tecnología, el enfoque abierto y donde se compartió nuevas herramientas (Fuente Meta Argentina). A continuación, más…
0 notes
Text
#DeepSeek V3#الذكاء الاصطناعي الصيني#نماذج الذكاء الاصطناعي#سرعة المعالجة#خوارزميات الذكاء الاصطناعي#API DeepSeek#تحسينات DeepSeek V3#مقارنة النماذج#GPT-4#Llama 3.1#Cloud 3.5#معالجة البيانات#تقنيات الذكاء الاصطناعي#تطبيقات DeepSeek#مفتوح المصدر
3 notes
·
View notes
Text
Ubuntu 安裝 ollama 在本地執行 Llama 3.2 推論模型與 API 服務
Ollama 介紹 Ollama 是一個專注於大語言模型(LLM, Large Language Models)應用的開源專案,旨在幫助開發者輕鬆部署和使用私有的大型語言模型,而無需依賴外部的雲端服務或外部 API,這些模型不僅僅只有包括 Meta Llama Model,也提供其他一些 Open LLM Model,像是 Llama 3.3, Phi 3, Mistral, Gemma 2。該專案的核心目的是提供高效、安全、可控的 LLM 推論環境建制。大致上有以下特性: 採用本地機器運行 Ollama 支援在自己的設備上載入模型,無需將數據上傳至雲端,確保數據隱私與安全。通過優化模型運行效率,即使在資源有限的設備上也能流暢進行推論。 開源與可客製化 Ollama 是一個採用 MIT License…
0 notes
Note
Hey! Hi. Pretty quick question. I've been half messing around with setting up a Tumblr bot of my own, probably using llama 3 or something. A pretty major issue that I've ran into though is that, as far as I can tell, the tumblr API has no way for bots to respond to asks automatically? But this clearly wasn't an issue for Frank so could I just ask like, how did you do it? Did you use puppeteer or something to like directly interface with a browser? Or is there another way that I'm missing
In the API, an not-yet-answered ask is just a post in the "submission" state. (You can fetch these with the submissions endpoint.)
You can answer an ask through the API by editing it so that its state is something other than "submission."
Typically you would want the state to be "published", though you can also use "draft" to save the response post to drafts or "queue" to queue it.
Just changing the state alone will cause the ask to be published (or drafted/queued) with an empty answer. To supply an answer, what you need to do depends on whether you're using NPF or legacy:
With the legacy edit endpoint, you pass a parameter called "answer".
With the NPF edit endpoint, (IIRC) you append additional blocks to the "content" field containing the answer, and leave the "layout" field as-is to specify that the added blocks are not part of the ask (cf. the NPF layout spec).
I think this should all be possible in pytumblr2. The reason I'm not sure is that Frank never actually used pytumblr2, she just used pytumblr plus a bunch of workarounds and stuff that I eventually split off into pytumblr2.
30 notes
·
View notes
Note
Hi, I saw your project dissecting the lore of Bloodborne. It is very, very impressive, and has rekindled something that I had thought dead. LLAMA-3, a ground-breaking release in LLM technology, has been made available to the public. Noteworthy is that it is quite good at impartial translations. You may access this for free, with an api, on the frontends "Together ai" or "Perplexity labs" (they're site names). Text dumps in JP are surely available. I pray that this may assist you.
while i appreciate the kind words, machine translation has always sucked big ones and is incapable of picking up on the information that a human is capable of, such as puns, double meanings, innuendo, etc. it's not an impartial text and trying to derive a "pure" meaning from computer translation ignores the overt and purposeful human elements in the language arts
22 notes
·
View notes
Text


Jungkook se encuentran entre los honrados en la lista A100 de los asiáticos más impactantes de 2024 de Gold House
Keanu Reeves, Jungkook y Hayao Miyazaki se encuentran entre los honrados en la lista A100 de los asiáticos más impactantes de 2024 de Gold House .
La lista A100 de Gold House honra a los pioneros de la industria que están a la vanguardia de lo que la organización llama la "nueva era del oro". Gold House celebrará a estos homenajeados y anunciará varias iniciativas nuevas en la Gala de Oro el 11 de mayo en el centro de Los Ángeles.
Gold House también reconocerá al ejecutivo discográfico, productor y presidente de HYBE Corporation, Bang Si-Hyuk, al presentarle al fundador de BTS el premio Gold Legend.
Jung Kook
artista musical
Jung Kook, compositor y miembro del grupo superestrella BTS, lanzó en julio su primer sencillo en solitario, ��Seven”, con Latto, que debutó en la cima de las listas Billboard Hot 100 y Top Song Global de Spotify. Su primer álbum en solitario, “Golden”, debutó en noviembre.
Variety X 1Mayo 2024
Keanu Reeves, Jung Kook, Hayao Miyazaki Among Gold House’s A100 Honorees
(https://x.com/Variety/status/1785658118434705885)
GoldHouseCo X 2 may.
Jung Kook (Music Artist) Learn more about all the Honorees at http://goldhouse.org/A100. #A100 #APAHM #APIHM #GoldNewWorld #GoldExcellence #API #JungKook
#jeon jungkook#jungkook#kookie#galletita#cr. a Variety en X#congratulations jungkook#cr. a GoldHouseCo en X#felicidades jungkook#Jungkook se encuentran entre los honrados en la lista A100 de los asiáticos más impactantes de 2024 de Gold House
8 notes
·
View notes
Text
How To Use Llama 3.1 405B FP16 LLM On Google Kubernetes

How to set up and use large open models for multi-host generation AI over GKE
Access to open models is more important than ever for developers as generative AI grows rapidly due to developments in LLMs (Large Language Models). Open models are pre-trained foundational LLMs that are accessible to the general population. Data scientists, machine learning engineers, and application developers already have easy access to open models through platforms like Hugging Face, Kaggle, and Google Cloud’s Vertex AI.
How to use Llama 3.1 405B
Google is announcing today the ability to install and run open models like Llama 3.1 405B FP16 LLM over GKE (Google Kubernetes Engine), as some of these models demand robust infrastructure and deployment capabilities. With 405 billion parameters, Llama 3.1, published by Meta, shows notable gains in general knowledge, reasoning skills, and coding ability. To store and compute 405 billion parameters at FP (floating point) 16 precision, the model needs more than 750GB of GPU RAM for inference. The difficulty of deploying and serving such big models is lessened by the GKE method discussed in this article.
Customer Experience
You may locate the Llama 3.1 LLM as a Google Cloud customer by selecting the Llama 3.1 model tile in Vertex AI Model Garden.
Once the deploy button has been clicked, you can choose the Llama 3.1 405B FP16 model and select GKE.Image credit to Google Cloud
The automatically generated Kubernetes yaml and comprehensive deployment and serving instructions for Llama 3.1 405B FP16 are available on this page.
Deployment and servicing multiple hosts
Llama 3.1 405B FP16 LLM has significant deployment and service problems and demands over 750 GB of GPU memory. The total memory needs are influenced by a number of parameters, including the memory used by model weights, longer sequence length support, and KV (Key-Value) cache storage. Eight H100 Nvidia GPUs with 80 GB of HBM (High-Bandwidth Memory) apiece make up the A3 virtual machines, which are currently the most potent GPU option available on the Google Cloud platform. The only practical way to provide LLMs such as the FP16 Llama 3.1 405B model is to install and serve them across several hosts. To deploy over GKE, Google employs LeaderWorkerSet with Ray and vLLM.
LeaderWorkerSet
A deployment API called LeaderWorkerSet (LWS) was created especially to meet the workload demands of multi-host inference. It makes it easier to shard and run the model across numerous devices on numerous nodes. Built as a Kubernetes deployment API, LWS is compatible with both GPUs and TPUs and is independent of accelerators and the cloud. As shown here, LWS uses the upstream StatefulSet API as its core building piece.
A collection of pods is controlled as a single unit under the LWS architecture. Every pod in this group is given a distinct index between 0 and n-1, with the pod with number 0 being identified as the group leader. Every pod that is part of the group is created simultaneously and has the same lifecycle. At the group level, LWS makes rollout and rolling upgrades easier. For rolling updates, scaling, and mapping to a certain topology for placement, each group is treated as a single unit.
Each group’s upgrade procedure is carried out as a single, cohesive entity, guaranteeing that every pod in the group receives an update at the same time. While topology-aware placement is optional, it is acceptable for all pods in the same group to co-locate in the same topology. With optional all-or-nothing restart support, the group is also handled as a single entity when addressing failures. When enabled, if one pod in the group fails or if one container within any of the pods is restarted, all of the pods in the group will be recreated.
In the LWS framework, a group including a single leader and a group of workers is referred to as a replica. Two templates are supported by LWS: one for the workers and one for the leader. By offering a scale endpoint for HPA, LWS makes it possible to dynamically scale the number of replicas.
Deploying multiple hosts using vLLM and LWS
vLLM is a well-known open source model server that uses pipeline and tensor parallelism to provide multi-node multi-GPU inference. Using Megatron-LM’s tensor parallel technique, vLLM facilitates distributed tensor parallelism. With Ray for multi-node inferencing, vLLM controls the distributed runtime for pipeline parallelism.
By dividing the model horizontally across several GPUs, tensor parallelism makes the tensor parallel size equal to the number of GPUs at each node. It is crucial to remember that this method requires quick network connectivity between the GPUs.
However, pipeline parallelism does not require continuous connection between GPUs and divides the model vertically per layer. This usually equates to the quantity of nodes used for multi-host serving.
In order to support the complete Llama 3.1 405B FP16 paradigm, several parallelism techniques must be combined. To meet the model’s 750 GB memory requirement, two A3 nodes with eight H100 GPUs each will have a combined memory capacity of 1280 GB. Along with supporting lengthy context lengths, this setup will supply the buffer memory required for the key-value (KV) cache. The pipeline parallel size is set to two for this LWS deployment, while the tensor parallel size is set to eight.
In brief
We discussed in this blog how LWS provides you with the necessary features for multi-host serving. This method maximizes price-to-performance ratios and can also be used with smaller models, such as the Llama 3.1 405B FP8, on more affordable devices. Check out its Github to learn more and make direct contributions to LWS, which is open-sourced and has a vibrant community.
You can visit Vertex AI Model Garden to deploy and serve open models via managed Vertex AI backends or GKE DIY (Do It Yourself) clusters, as the Google Cloud Platform assists clients in embracing a gen AI workload. Multi-host deployment and serving is one example of how it aims to provide a flawless customer experience.
Read more on Govindhtech.com
#Llama3.1#Llama#LLM#GoogleKubernetes#GKE#405BFP16LLM#AI#GPU#vLLM#LWS#News#Technews#Technology#Technologynews#Technologytrends#govindhtech
2 notes
·
View notes
Text
Critical Vulnerability (CVE-2024-37032) in Ollama

Researchers have discovered a critical vulnerability in Ollama, a widely used open-source project for running Large Language Models (LLMs). The flaw, dubbed "Probllama" and tracked as CVE-2024-37032, could potentially lead to remote code execution, putting thousands of users at risk.
What is Ollama?
Ollama has gained popularity among AI enthusiasts and developers for its ability to perform inference with compatible neural networks, including Meta's Llama family, Microsoft's Phi clan, and models from Mistral. The software can be used via a command line or through a REST API, making it versatile for various applications. With hundreds of thousands of monthly pulls on Docker Hub, Ollama's widespread adoption underscores the potential impact of this vulnerability.
The Nature of the Vulnerability
The Wiz Research team, led by Sagi Tzadik, uncovered the flaw, which stems from insufficient validation on the server side of Ollama's REST API. An attacker could exploit this vulnerability by sending a specially crafted HTTP request to the Ollama API server. The risk is particularly high in Docker installations, where the API server is often publicly exposed. Technical Details of the Exploit The vulnerability specifically affects the `/api/pull` endpoint, which allows users to download models from the Ollama registry and private registries. Researchers found that when pulling a model from a private registry, it's possible to supply a malicious manifest file containing a path traversal payload in the digest field. This payload can be used to: - Corrupt files on the system - Achieve arbitrary file read - Execute remote code, potentially hijacking the system The issue is particularly severe in Docker installations, where the server runs with root privileges and listens on 0.0.0.0 by default, enabling remote exploitation. As of June 10, despite a patched version being available for over a month, more than 1,000 vulnerable Ollama server instances remained exposed to the internet.
Mitigation Strategies
To protect AI applications using Ollama, users should: - Update instances to version 0.1.34 or newer immediately - Implement authentication measures, such as using a reverse proxy, as Ollama doesn't inherently support authentication - Avoid exposing installations to the internet - Place servers behind firewalls and only allow authorized internal applications and users to access them
Broader Implications for AI and Cybersecurity
This vulnerability highlights ongoing challenges in the rapidly evolving field of AI tools and infrastructure. Tzadik noted that the critical issue extends beyond individual vulnerabilities to the inherent lack of authentication support in many new AI tools. He referenced similar remote code execution vulnerabilities found in other LLM deployment tools like TorchServe and Ray Anyscale. Moreover, despite these tools often being written in modern, safety-first programming languages, classic vulnerabilities such as path traversal remain a persistent threat. This underscores the need for continued vigilance and robust security practices in the development and deployment of AI technologies. Read the full article
2 notes
·
View notes
Text

ZOHO
¿Qué es? es un software de gestión de relaciones con clientes a pedido (CRM) que permite gestionar las relaciones con los clientes de una manera eficiente.
Es eficiente porque Zoho CRM lo ayuda a optimizar sus funciones de ventas a nivel de organización, marketing, asistencia al cliente y gestión de inventario en un solo sistema.
¿Cómo se utiliza?
Comenzar a usar Zoho CRM es sencillo.
Configure su cuenta ingresando los detalles personales y los de la organización.
Personalice el producto conforme a sus necesidades y descubra más sobre las operaciones realizadas con más frecuencia en Zoho CRM.
1. Configuración de la cuenta
2. Personalización de productos
3. Operaciones comunes
4. Obtención de productos listos para empresas.
Zoho Projects: ventajas
Automatización de flujo de trabajo con Blueprints.
Zoho usa la misma herramienta para la automatización en todas sus aplicaciones.
Esta herramienta se llama " Blueprints" Su objetivo principal es ayudar a los usuarios a crear flujos de trabajo personalizados que básicamente ilustran ciertos procesos de una manera sistemática y organizada.
Una vez configurado, Blueprint guiará a los usuarios a través de la ejecución del proceso sabiendo qué acciones deben realizarse y en qué orden.
Estos pueden incluir notificaciones y actualizaciones automáticas para miembros del equipo y colaboradores, actualizaciones de facturas con nuevas horas facturables, etc.
También es posible crear procesos de automatización complicados y utilizar la API para obtener más posibilidades.
Zoho Projects: desventajas
Falta de plantillas preconfiguradas
Actualmente casi todos los software de gestión de proyectos ofrecen plantillas preconfiguradas para diferentes industrias. No es el caso de Zoho Projects. Se puede crear plantillas personalizadas a base de sus proyectos existentes para no tener que repetir la misma información una y otra vez en casos similares.
Pero si es principiante, las plantillas pueden ser una buena base para entender cómo hacer el plan de proyecto y usar el software.
Metodología
La mejora continua es un indicador distintivo de los equipos ágiles exitosos.
El cronograma de estado ayuda a su equipo a identificar los cuellos de botella y descubrir maneras de generar el mayor valor mientras elimina tantas actividades improductivas como sea posible.
Zoho en la educación
El software de gestión de proyectos de educación es una herramienta que ayuda a los usuarios a planificar todos sus proyectos académicos, realizar seguimientos de estos y colaborar. Lleva a los profesores, estudiantes, administradores y padres a una sola plataforma para que puedan colaborar sin esfuerzo, aprender y tomar decisiones colectivamente.
youtube
2 notes
·
View notes
Text
Traducir de español a ingles.
En AO3 soy mapache_opache.
One Piece, Familia AU.
Apis conoció a Luffy cuando tenía 8 años y escapaba de la marina, después de su aventura con los Sombrero de paja los sigue en el periódico mientras aprende todo sobre su historia familiar de su abuelo, sufriendo de siestas involuntarias durante todo el proceso, Bokuden, su abuelo, muere cuando ella cumple 14 años. Sabía que este día llegaría pero aún está triste y desolada.
Un día un marine llamado Nezumi viene a su isla preguntando por el elixir de la vida eterna, buscándola a ella, decidida a no hablar y sabiendo que no tenía forma o tiempo para escapar, deja que se la lleven no queriendo tener a los aldeanos en el fuego cruzado.
Antes de que saque uno de sus viejos trucos y escape durante una tormenta como en los viejos tiempos, se encuentran con uno de los almirantes, Koby durante sus vacaciones en el East Blue.
Koby salva a Apis y arresta a Nezumi por abuso de poder, como primero tiene que llevar a Nezumi a Loguetwon y desde ahí será llevado ante la justicia en el Marineford del Nuevo Mundo promete regresar a Apis a su casa cuando termine, la lleva con él mientras tanto.
En alguna parte del camino se topan con Luffy en sus propias vacaciones personales quien decide unirse a ellos.
En el camino ganan una relación de padres e hija, aunque Apis ve a Koby más como una mamá (mamá gallina).
Koby aún está enamorado de Luffy pero a aceptado hace años que sus sentimientos no son ni serán correspondidos y está decidido a no malinterpretar a Luffy ( lo malinterpretar de todos modos de la manera incorrecta) y también piensa que Luffy está saliendo con Hancook.
Luffy recién está dándose cuenta de sus sentimientos hacia Koby e intenta actuar sobre ellos, Koby no lo está entendiendo y eso lo está frustrando ¿Que tan más obvio tiene que ser?
Apis nota todo y ayuda a sus padres con diferentes niveles de éxito.
Detalles:
Nezumi se enteró del elixir de la vida de los antiguos reportes (notas personales) del difuntos comodoro Nelson Royale.
Nezumi tenía a Apis en una celda cuando Koby hizo su inspección sorpresa.
Los sombreros de paja se toman unas pequeñas vacaciones para visitar sus islas natales y a sus seres queridos (medio spoiler: Robin visita a Jaguar en Elbaf).
Luffy va a visitar al hijo de Shanks y Makino en Dawn.
Apis todavía está un poco triste por su abuelo, cuando aterrizan en la isla Mirrorball se encuentran Jango y Fullbody en un concurso de baile, Koby y Luffy se unen con Apis para intentar animar a la pequeña, funciona mejor de lo que pensaron para todas las partes involucradas, al final pura diversión.
Nami y Sanji salen juntos románticamente, visitan el Baratie y Cocoyashi Village.
Nami y Sanji se unen a las vacaciones familiares, de todos modos, Luffy necesita su ayuda en el romance.
Teijio es uno de los mejores cocineros de la marina por eso fue asignado al barco de Koby, termina siendo adoptado por Sanji y Nami.
Luffy le pide ayuda a Nami y Sanji sobre Koby.
Apis pide la ayuda de Teijio con sus nuevos padres.
Se pueden ver los comienzos de un romance en ciernes entre Apis y Teijio.
Apis y Teijio tienen la misma edad.
Luffy llama a Apis su pequeña campeona y Apis lo llama Pa.
Koby llama a Apis miel o su pequeña aventurera y Apis lo llama Papá.
Las islas que Koby, Apis y Luffy visitan o pasan son: Loguetwon, Cocoyashi, Baratie, Ciudad Orange, Villa Syrup (Apis y Teijio tienen una pequeña aventura con los antiguos piratas de Usopp), la isla de los animales raros, Dawn y Mirrorball (entre otras)
Así es como veo a la familia:
Luffy el papá.
Koby la mamá.
Apis la hija.
Sabo el tío.
Dragón el abuelo.
Garp el ¿bisabuelo? ¿Tatara abuelo?.
Sanji el papá.
Nami la mamá.
Tajio el hijo.
Nojiko la tía.
Genzo el abuelo.
Reiju la tía.
Zeff el abuelo.
(¿soy la única que ve a Apis como la hija adoptiva perfecta para Luffy y Koby? es aventurera (como Luffy),comió una fruta del diablo (como Luffy) y se preocupa por los demás (el viejo Ryu y como Koby) ¿O a Tajio como el hijo adoptivo perfecto para Nami y Sanji? Porque Tajio es pelirrojo, parece que le gusta navegar y es cocinero, el también cree en el Gran Azul como Sanji).
Pareja principal: Luffy x Koby
Pareja secundaria: Sanji x Nami
Cualquiera puede hacer lo que quiera con este AU.
#straw hat pirates#coby one piece#kobylu#koby#lukoby#luffy x koby#one piece#luffy#coby#koby one piece
2 notes
·
View notes
Text
It's worse.
The glasses Meta built come with language translation features -- meaning it becomes harder for bilingual families to speak privately without being overheard.
No it's even worse.
Because someone has developed an app (I-XRAY) that scans and detects who people are in real-time.
No even worse.
Because I-XRAY accesses all kinds of public data about that person.
Wait is it so bad?
I-XRAY is not publicly usable and was only built to show what a privacy nightmare Meta is creating. Here's a 2-minute video of the creators doing a experiment how quickly people on the street's trust can be exploited. It's chilling because the interactions are kind and heartwarming but obviously the people are being tricked in the most uncomfortable way.
Yes it is so bad:
Because as satirical IT News channel Fireship demonstrated, if you combine a few easily available technologies, you can reproduce I-XRAYs results easily.
Hook up an open source vision model (for face detection). This model gives us the coordinates to a human face. Then tools like PimEyes or FaceCheck.ID -- uh, both of those are free as well... put a name to that face. Then phone book websites like fastpeoplesearch.com or Instant Checkmate let us look up lots of details about those names (date of birth, phone #, address, traffic and criminal records, social media accounts, known aliases, photos & videos, email addresses, friends and relatives, location history, assets & financial info). Now you can use webscrapers (the little programs Google uses to index the entire internet and feed it to you) or APIs (programs that let us interface with, for example, open data sets by the government) -> these scraping methods will, for many targeted people, provide the perpetrators with a bulk of information. And if that sounds impractical, well, the perpetrators can use a open source, free-to-use large language model like LLaMa (also developed by Meta, oh the irony) to get a summary (or get ChatGPT style answers) of all that data.
Fireship points out that people can opt out of most of these data brokers by contacting them ("the right to be forgotten" has been successfully enforced by European courts and applies globally to people that make use of our data). Apparently the New York Times has compiled an extensive list of such sites and services.
But this is definitely dystopian. And individual opt-outs exploit that many people don't even know that this is a thing and that place the entire responsibility on the individual. And to be honest, I don't trust the New York Times and almost feel I'm drawing attention to myself if I opt out. It really leaves me personally uncertain what is the smarter move. I hope this tech is like Google's smartglasses and becomes extinct.
i hate the "meta glasses" with their invisible cameras i hate when people record strangers just-living-their-lives i hate the culture of "it's not illegal so it's fine". people deserve to walk around the city without some nameless freak recording their faces and putting them up on the internet. like dude you don't show your own face how's that for irony huh.
i hate those "testing strangers to see if they're friendly and kind! kindness wins! kindness pays!" clickbait recordings where overwhelmingly it is young, attractive people (largely women) who are being scouted for views and free advertising . they're making you model for them and they reap the benefits. they profit now off of testing you while you fucking exist. i do not want to be fucking tested. i hate the commodification of "kindness" like dude just give random people the money, not because they fucking smiled for it. none of the people recording has any idea about the origin of the term "emotional labor" and none of us could get them to even think about it. i did not apply for this job! and you know what! i actually super am a nice person! i still don't want to be fucking recorded!
& it's so normalized that the comments are always so fucking ignorant like wow the brunette is so evil so mean so twisted just because she didn't smile at a random guy in an intersection. god forbid any person is in hiding due to an abusive situation. no, we need to see if they'll say good morning to a stranger approaching them. i am trying to walk towards my job i am not "unkind" just because i didn't notice your fucked up "social experiment". you fucking weirdo. stop doing this.
19K notes
·
View notes
Text
The first lamacon meta was a kind of bust
This week Meta spent its The world's first conference AI DEV, LLAMACONFocused on the development of its artificial intelligence model Llama Generative. But while there was a lot of hype, there were many, in addition to launching the Meta AI application and the new API LLAMA. In this episode, Engadget Senior Reporter Carissa Bell joins us to tell about his thoughts about Llamacon after visiting…
0 notes
Text
The first lamacon meta was a kind of bust
This week Meta spent its The world's first conference AI DEV, LLAMACONFocused on the development of its artificial intelligence model Llama Generative. But while there was a lot of hype, there were many, in addition to launching the Meta AI application and the new API LLAMA. In this episode, Engadget Senior Reporter Carissa Bell joins us to tell about his thoughts about Llamacon after visiting…
0 notes
Text
Más allá del chatbot: construyendo agentes reales con IA
Vengo escuchando desde hace un tiempo que un modelo de lenguaje al que, usando ChatGPT o Copilot, le subes archivos y le haces preguntas sobre estos artículos, es un “agente”. A simple vista, parece solo una herramienta que responde preguntas usando texto. Eso no tiene pinta de ser un agente. Pero, ¿lo es?
Tras ver este video sobre los diferentes tipos de agentes de IA que existen, creo que ya sé por qué estamos llamando "agentes" a ese uso concreto de los modelos.
Los 5 tipos de agentes de IA
Según la teoría clásica (ver “Artificial Intelligence: A Modern Approach”, 4th edition, de Stuart Russell y Peter Norvig, sección 2.4, "The Structure of Agents"), los agentes se clasifican así:
Reflexivo simple: responde con reglas fijas.
Basado en modelos: tiene una representación del entorno y memoria.
Basado en objetivos: toma decisiones según metas.
Basado en utilidad: evalúa opciones según preferencia/valor.
De aprendizaje: mejora con la experiencia.
¿Dónde encaja el caso que estamos analizando, ese modelo al que le subimos unos documentos y le hacemos preguntas sobre ellos? Eso que OpenAI llama GPTs y que Microsoft llama "agentes" en el Copilot Studio, ¿con cuál de los anteriores tipos de agentes se corresponde?
Si lo usamos solo para responder una pregunta directa → se parece al reflexivo simple.
Si analiza archivos cargados y extrae conclusiones dispersas → actúa como basado en modelos.
Si le damos tareas claras (resumir, estructurar, comparar) → parece el basado en objetivos.
Si optimiza claridad o formato según instrucciones → podría ser el basado en utilidad.
Si el sistema aprende de nosotros y mejora con el tiempo → sería un agente de aprendizaje.
Por lo tanto, GPT (o el mismo caso hecho en Copilot) por sí mismo no es un agente completo, pero integrado con sistemas (nosotros mismos, por ejemplo) que le dan contexto, metas, memoria y feedback, claramente se convierte en uno.
Entonces, ¿cómo sería un agente “de verdad���? Un agente de verdad es uno que actúa como un sistema autónomo inteligente, no solo uno que responde preguntas.
Para aclarar qué es un agente en términos más prácticos, vamos a intentar comprender la arquitectura MCP (Model Context Processing), propuesta por Anthropic para construir agentes y que está siendo adoptada por la industria.
MCP: Conectando agentes de IA con el mundo real
MCP (Model Context Protocol) es una infraestructura para que modelos de lenguaje puedan interactuar de forma segura y estructurada con herramientas externas, APIs, bases de datos y otros sistemas disponibles dentro de la organización.
Aunque no es una arquitectura cognitiva completa, puede servir como la “capa de integración” que permite a un agente cognitivo acceder a información en tiempo real, ejecutar acciones y trabajar sobre entornos reales. MCP es la “puerta al mundo real” para agentes que necesitan trabajar con datos y sistemas externos.
Ejemplo práctico: Un agente que resuelve problemas en una organización
Imaginemos un asistente corporativo inteligente que:
a) hemos diseñado con una arquitectura cognitiva basada en módulos (percepción, cognición, acción) y que, además,
b) se conecta al ecosistema de la empresa mediante el protocolo MCP (Model Context Protocol) de Anthropic.
Veamos qué funciones contendría cada uno de los tres módulos cognitivos que compondrían dicho asistente y cómo interactuaría con el mundo a su alrededor mediante MCP:
Percepción
Lee bases de datos, informes, logs, emails, APIs internas.
Recibe consultas humanas o detecta anomalías automáticamente.
Cognición
Usa uno o varios GPTs para interpretar texto, combinar datos y razonar.
Planea pasos: “analizar gastos”, “comparar con presupuestos”, “detectar desviaciones”.
Mantiene memoria de su contexto de trabajo, objetivos y estados intermedios.
Acción
Consulta sistemas, genera informes, dispara flujos de trabajo.
Toma decisiones o propone acciones con justificación.
Aprende de feedback: mejora sus planes con el tiempo.
Veamos ahora a ese agente en funcionamiento en un caso concreto:
Percibe: detecta aumento de costes en logística.
Razona: analiza contratos, identifica rutas ineficientes, predice impacto.
Actúa: propone cambios, notifica a compras, inicia seguimiento.
¿Por qué queremos construir este tipo de agentes?
Porque van más allá de un chatbot con el que conversamos, como ChatGPT.
Porque automatizan la resolución de problemas reales.
Porque combinan todos los datos de la organización, eliminándose así los silos de información aislados.
Porque actúan con propósito, objetivo. No se limitan a responder preguntas.
La IA no es solo generar texto en respuesta a una pregunta. Es una IA estructurada, autónoma y conectada. Y arquitecturas cognitivas combinadas con protocolos como MCP hacen posible que los agentes realmente trabajen con nosotros —y por nosotros— en contextos organizativos complejos. Es comportamiento estructurado, toma de decisiones, acción. Eso es un agente.
#inteligencia artificial#IA#GPT#agentes inteligentes#machine learning#MCP#Anthropic#arquitectura cognitiva#tecnología empresarial#automatización#datos empresariales#sistemas inteligentes#procesamiento multimodal#LLM#AI agents
0 notes
Text
Meta will offer its Llama AI model as an API too
Meta has unveiled a preview version of an API for its Llama large language models. The new offering will transform Meta’s popular open-source models into an enterprise-ready service directly challenging established players like OpenAI while addressing a key concern for enterprise adopters: freedom from vendor lock-in. “We want to make it even easier for you to quickly start building with Llama,…
0 notes
Text
How AI Is Revolutionizing Contact Centers in 2025
As contact centers evolve from reactive customer service hubs to proactive experience engines, artificial intelligence (AI) has emerged as the cornerstone of this transformation. In 2025, modern contact center architectures are being redefined through AI-based technologies that streamline operations, enhance customer satisfaction, and drive measurable business outcomes.
This article takes a technical deep dive into the AI-powered components transforming contact centers—from natural language models and intelligent routing to real-time analytics and automation frameworks.
1. AI Architecture in Modern Contact Centers
At the core of today’s AI-based contact centers is a modular, cloud-native architecture. This typically consists of:
NLP and ASR engines (e.g., Google Dialogflow, AWS Lex, OpenAI Whisper)
Real-time data pipelines for event streaming (e.g., Apache Kafka, Amazon Kinesis)
Machine Learning Models for intent classification, sentiment analysis, and next-best-action
RPA (Robotic Process Automation) for back-office task automation
CDP/CRM Integration to access customer profiles and journey data
Omnichannel orchestration layer that ensures consistent CX across chat, voice, email, and social
These components are containerized (via Kubernetes) and deployed via CI/CD pipelines, enabling rapid iteration and scalability.
2. Conversational AI and Natural Language Understanding
The most visible face of AI in contact centers is the conversational interface—delivered via AI-powered voice bots and chatbots.
Key Technologies:
Automatic Speech Recognition (ASR): Converts spoken input to text in real time. Example: OpenAI Whisper, Deepgram, Google Cloud Speech-to-Text.
Natural Language Understanding (NLU): Determines intent and entities from user input. Typically fine-tuned BERT or LLaMA models power these layers.
Dialog Management: Manages context-aware conversations using finite state machines or transformer-based dialog engines.
Natural Language Generation (NLG): Generates dynamic responses based on context. GPT-based models (e.g., GPT-4) are increasingly embedded for open-ended interactions.
Architecture Snapshot:
plaintext
CopyEdit
Customer Input (Voice/Text)
↓
ASR Engine (if voice)
↓
NLU Engine → Intent Classification + Entity Recognition
↓
Dialog Manager → Context State
↓
NLG Engine → Response Generation
↓
Omnichannel Delivery Layer
These AI systems are often deployed on low-latency, edge-compute infrastructure to minimize delay and improve UX.
3. AI-Augmented Agent Assist
AI doesn’t only serve customers—it empowers human agents as well.
Features:
Real-Time Transcription: Streaming STT pipelines provide transcripts as the customer speaks.
Sentiment Analysis: Transformers and CNNs trained on customer service data flag negative sentiment or stress cues.
Contextual Suggestions: Based on historical data, ML models suggest actions or FAQ snippets.
Auto-Summarization: Post-call summaries are generated using abstractive summarization models (e.g., PEGASUS, BART).
Technical Workflow:
Voice input transcribed → parsed by NLP engine
Real-time context is compared with knowledge base (vector similarity via FAISS or Pinecone)
Agent UI receives predictive suggestions via API push
4. Intelligent Call Routing and Queuing
AI-based routing uses predictive analytics and reinforcement learning (RL) to dynamically assign incoming interactions.
Routing Criteria:
Customer intent + sentiment
Agent skill level and availability
Predicted handle time (via regression models)
Customer lifetime value (CLV)
Model Stack:
Intent Detection: Multi-label classifiers (e.g., fine-tuned RoBERTa)
Queue Prediction: Time-series forecasting (e.g., Prophet, LSTM)
RL-based Routing: Models trained via Q-learning or Proximal Policy Optimization (PPO) to optimize wait time vs. resolution rate
5. Knowledge Mining and Retrieval-Augmented Generation (RAG)
Large contact centers manage thousands of documents, SOPs, and product manuals. AI facilitates rapid knowledge access through:
Vector Embedding of documents (e.g., using OpenAI, Cohere, or Hugging Face models)
Retrieval-Augmented Generation (RAG): Combines dense retrieval with LLMs for grounded responses
Semantic Search: Replaces keyword-based search with intent-aware queries
This enables agents and bots to answer complex questions with dynamic, accurate information.
6. Customer Journey Analytics and Predictive Modeling
AI enables real-time customer journey mapping and predictive support.
Key ML Models:
Churn Prediction: Gradient Boosted Trees (XGBoost, LightGBM)
Propensity Modeling: Logistic regression and deep neural networks to predict upsell potential
Anomaly Detection: Autoencoders flag unusual user behavior or possible fraud
Streaming Frameworks:
Apache Kafka / Flink / Spark Streaming for ingesting and processing customer signals (page views, clicks, call events) in real time
These insights are visualized through BI dashboards or fed back into orchestration engines to trigger proactive interventions.
7. Automation & RPA Integration
Routine post-call processes like updating CRMs, issuing refunds, or sending emails are handled via AI + RPA integration.
Tools:
UiPath, Automation Anywhere, Microsoft Power Automate
Workflows triggered via APIs or event listeners (e.g., on call disposition)
AI models can determine intent, then trigger the appropriate bot to complete the action in backend systems (ERP, CRM, databases)
8. Security, Compliance, and Ethical AI
As AI handles more sensitive data, contact centers embed security at multiple levels:
Voice biometrics for authentication (e.g., Nuance, Pindrop)
PII Redaction via entity recognition models
Audit Trails of AI decisions for compliance (especially in finance/healthcare)
Bias Monitoring Pipelines to detect model drift or demographic skew
Data governance frameworks like ISO 27001, GDPR, and SOC 2 compliance are standard in enterprise AI deployments.
Final Thoughts
AI in 2025 has moved far beyond simple automation. It now orchestrates entire contact center ecosystems—powering conversational agents, augmenting human reps, automating back-office workflows, and delivering predictive intelligence in real time.
The technical stack is increasingly cloud-native, model-driven, and infused with real-time analytics. For engineering teams, the focus is now on building scalable, secure, and ethical AI infrastructures that deliver measurable impact across customer satisfaction, cost savings, and employee productivity.
As AI models continue to advance, contact centers will evolve into fully adaptive systems, capable of learning, optimizing, and personalizing in real time. The revolution is already here—and it's deeply technical.
#AI-based contact center#conversational AI in contact centers#natural language processing (NLP)#virtual agents for customer service#real-time sentiment analysis#AI agent assist tools#speech-to-text AI#AI-powered chatbots#contact center automation#AI in customer support#omnichannel AI solutions#AI for customer experience#predictive analytics contact center#retrieval-augmented generation (RAG)#voice biometrics security#AI-powered knowledge base#machine learning contact center#robotic process automation (RPA)#AI customer journey analytics
0 notes