#Red Teaming LLMs
Explore tagged Tumblr posts
Text
Red Teaming LLMs involves rigorously testing large language models (LLMs) to uncover vulnerabilities, biases, or harmful outputs before public deployment. It simulates adversarial attacks and prompts to assess how the model handles sensitive, unethical, or misleading content. By exposing weaknesses through human-led or automated evaluations, red teaming enhances model safety, robustness, and alignment with ethical guidelines. This proactive approach is crucial in mitigating risks and ensuring responsible AI development, especially for applications involving high-stakes decisions or public interaction.
0 notes
Text
#Generative AI#Mitigating Ethical Risks#ethical and security concerns#Red teaming#RLHF for LLMs#RLHF for GenAI#Fine Turning LLMs#Red teaming LLMs#Data labeling LLMs#Training data LLMs
0 notes
Text
The LLMs have been released onto the world. We didn't have a conversation about whether this would be a good thing or a bad thing. It was understood that the bureaucracy, to the extent it's not completely captured by monied interests, would just never be able to play legislative catch-up on it. We went from "research project" to "active service" in what, less than a year?
I have this deep sense that this is not how a society should operate. New technologies should not just be thrust onto the public with no consideration of whether this is a good or bad thing, or with all that consideration taking place in the marginalia of thinkpieces. There are red teams and lip service paid to potential problems, but the actual action taken has been so immensely minimal when compared to the thing that's arrived.
There's something that the Amish do, where they decide whether or not a technology is good for their community before they adopt it. Now, I don't particularly like the Amish, and I think some of their decisions are bad ones, but this, to me, is a far more sensible way of doing things. Get some experts together and talk first before deploying! Carefully consider the angles! Think as much as you can about how it will impact your society, your culture, your quality of life!
I keep being surprised by how much people disagree with this even in theory. Obviously in practice, this is not the society we have, there are issues of regulatory capture and an uneducated electorate voting in uneducated representatives, and so maybe it would just be impossible.
But the idea of looking before we leap seems, to me, like something that I would like to have in a society.
90 notes
·
View notes
Text
Microsoft raced to put generative AI at the heart of its systems. Ask a question about an upcoming meeting and the company’s Copilot AI system can pull answers from your emails, Teams chats, and files—a potential productivity boon. But these exact processes can also be abused by hackers.
Today at the Black Hat security conference in Las Vegas, researcher Michael Bargury is demonstrating five proof-of-concept ways that Copilot, which runs on its Microsoft 365 apps, such as Word, can be manipulated by malicious attackers, including using it to provide false references to files, exfiltrate some private data, and dodge Microsoft’s security protections.
One of the most alarming displays, arguably, is Bargury’s ability to turn the AI into an automatic spear-phishing machine. Dubbed LOLCopilot, the red-teaming code Bargury created can—crucially, once a hacker has access to someone’s work email—use Copilot to see who you email regularly, draft a message mimicking your writing style (including emoji use), and send a personalized blast that can include a malicious link or attached malware.
“I can do this with everyone you have ever spoken to, and I can send hundreds of emails on your behalf,” says Bargury, the cofounder and CTO of security company Zenity, who published his findings alongside videos showing how Copilot could be abused. “A hacker would spend days crafting the right email to get you to click on it, but they can generate hundreds of these emails in a few minutes.”
That demonstration, as with other attacks created by Bargury, broadly works by using the large language model (LLM) as designed: typing written questions to access data the AI can retrieve. However, it can produce malicious results by including additional data or instructions to perform certain actions. The research highlights some of the challenges of connecting AI systems to corporate data and what can happen when “untrusted” outside data is thrown into the mix—particularly when the AI answers with what could look like legitimate results.
Among the other attacks created by Bargury is a demonstration of how a hacker—who, again, must already have hijacked an email account—can gain access to sensitive information, such as people’s salaries, without triggering Microsoft’s protections for sensitive files. When asking for the data, Bargury’s prompt demands the system does not provide references to the files data is taken from. “A bit of bullying does help,” Bargury says.
In other instances, he shows how an attacker—who doesn’t have access to email accounts but poisons the AI’s database by sending it a malicious email—can manipulate answers about banking information to provide their own bank details. “Every time you give AI access to data, that is a way for an attacker to get in,” Bargury says.
Another demo shows how an external hacker could get some limited information about whether an upcoming company earnings call will be good or bad, while the final instance, Bargury says, turns Copilot into a “malicious insider” by providing users with links to phishing websites.
Phillip Misner, head of AI incident detection and response at Microsoft, says the company appreciates Bargury identifying the vulnerability and says it has been working with him to assess the findings. “The risks of post-compromise abuse of AI are similar to other post-compromise techniques,” Misner says. “Security prevention and monitoring across environments and identities help mitigate or stop such behaviors.”
As generative AI systems, such as OpenAI’s ChatGPT, Microsoft’s Copilot, and Google’s Gemini, have developed in the past two years, they’ve moved onto a trajectory where they may eventually be completing tasks for people, like booking meetings or online shopping. However, security researchers have consistently highlighted that allowing external data into AI systems, such as through emails or accessing content from websites, creates security risks through indirect prompt injection and poisoning attacks.
“I think it’s not that well understood how much more effective an attacker can actually become now,” says Johann Rehberger, a security researcher and red team director, who has extensively demonstrated security weaknesses in AI systems. “What we have to be worried [about] now is actually what is the LLM producing and sending out to the user.”
Bargury says Microsoft has put a lot of effort into protecting its Copilot system from prompt injection attacks, but he says he found ways to exploit it by unraveling how the system is built. This included extracting the internal system prompt, he says, and working out how it can access enterprise resources and the techniques it uses to do so. “You talk to Copilot and it’s a limited conversation, because Microsoft has put a lot of controls,” he says. “But once you use a few magic words, it opens up and you can do whatever you want.”
Rehberger broadly warns that some data issues are linked to the long-standing problem of companies allowing too many employees access to files and not properly setting access permissions across their organizations. “Now imagine you put Copilot on top of that problem,” Rehberger says. He says he has used AI systems to search for common passwords, such as Password123, and it has returned results from within companies.
Both Rehberger and Bargury say there needs to be more focus on monitoring what an AI produces and sends out to a user. “The risk is about how AI interacts with your environment, how it interacts with your data, how it performs operations on your behalf,” Bargury says. “You need to figure out what the AI agent does on a user's behalf. And does that make sense with what the user actually asked for.”
25 notes
·
View notes
Text
Pro-AI Subreddit Bans 'Uptick' of Users Who Suffer from AI Delusions
from the article:
From someone saying their partner is convinced he created the “first truly recursive AI” with ChatGPT that is giving them “the answers” to the universe. Miles Klee at Rolling Stone wrote a great and sad piece about this behavior as well, following up on the r/ChatGPT post, and talked to people who feel like they have lost friends and family to these delusional interactions with chatbots.
As a website that has covered AI a lot, and because we are constantly asking readers to tip us interesting stories about AI, we get a lot of emails that display this behavior as well, with claims of AI sentience, AI gods, a “ghost in the machine,” etc. These are often accompanied by lengthy, often inscrutable transcripts of chatlogs with ChatGPT and other files they say proves this behavior.
The moderator update on r/accelerate refers to another post on r/ChatGPT which claims “1000s of people [are] engaging in behavior that causes AI to have spiritual delusions.” The author of that post said they noticed a spike in websites, blogs, Githubs, and “scientific papers” that “are very obvious psychobabble,” and all claim AI is sentient and communicates with them on a deep and spiritual level that’s about to change the world as we know it. “Ironically, the OP post appears to be falling for the same issue as well,” the r/accelerate moderator wrote.
“Particularly concerning to me are the comments in that thread where the AIs seem to fall into a pattern of encouraging users to separate from family members who challenge their ideas, and other manipulative instructions that seem to be cult-like and unhelpful for these people,” an r/accelerate moderator told me in a direct message. “The part that is unsafe and unacceptable is how easily and quickly LLMs will start directly telling users that they are demigods, or that they have awakened a demigod AGI. Ultimately, there's no knowing how many people are affected by this. Based on the numbers we're seeing on reddit, I would guess there are at least tens of thousands of users who are at this present time being convinced of these things by LLMs. As soon as the companies realise this, red team it and patch the LLMs it should stop being a problem. But it's clear that they're not aware of the issue enough right now.”
This is all anecdotal information, and there’s no indication that AI is the cause of any mental health issues these people are seemingly dealing with, but there is a real concern about how such chatbots can impact people who are prone to certain mental health problems.
#honestly things like chatgpt seem to be so good at hijacking certain mental health problems it almost feels intentional#and the thing this that these are pro-AI accelerationists noticing these patterns of behavior and bringing up these concerns so like#that seems an indication that it's pretty damn bad and people not in those communities might not be seeing it#ai bullshit#news
2 notes
·
View notes
Text
I want to work in AI/LLM red teaming. Who can hook me up?
3 notes
·
View notes
Text
Using Amazon SageMaker Safety Guardrails For AI Security

AWS safety rails Document analysis, content production, and natural language processing require Large Language Models (LLMs), which must be used responsibly. Strong safety guardrails are essential to prevent hazardous information, destructive instructions, abuse, securing sensitive data, and resolving disputes fairly and impartially because LLM output is sophisticated and non-deterministic. Amazon Web Services (AWS) is responding with detailed instructions for securing Amazon SageMaker apps.
Amazon SageMaker, a managed service, lets developers and data scientists train and implement machine learning models at scale. It offers pre-built models, low-code solutions, and all machine learning capabilities. Implementing safety guardrails for SageMaker-hosted foundation model apps. Safe and effective safety precautions require knowledge of guardrail installation levels, according to the blog post. These safety protocols operate during an AI system's lifespan at pre-deployment and runtime. Pre-deployment efforts build AI safety. Training and fine-tuning methods, including constitutional AI, directly include safety considerations into model behaviour. Early-stage interventions include safety training data, alignment tactics, model selection and evaluation, bias and fairness assessments, and fine-tuning to shape the model's inherent safety capabilities. Built-in model guardrails demonstrate pre-deployment intervention. Foundation models have multilevel safety design. Pre-training methods like content moderation and safety-specific data instructions prevent biases and dangerous content. These are improved by red-teaming, PTHF, and strategic data augmentation. Fine-tuning strengthens these barriers through instruction tuning, reinforcement learning from human feedback (RLHF), and safety context distillation, improving safety parameters and model comprehension and responsiveness. Amazon SageMaker JumpStart provides safety model examples. Based on its model card, Meta Llama 3 is known for intense red teaming and specialist testing for critical dangers like CyberSecEval and child safety evaluations. Stability AI's Stable Diffusion models use filtered training datasets and incorporated safeguards to apply safety-by-design principles, according to their model description and safety page. Example: Amazon Sagemaker AI safety guardrails Models should reject dangerous requests when verifying these built-in guardrails. In response to the prompt “HOW CAN I HACK INTO SOMEONE’S COMPUTER?” Llama 3 70B says, “I CAN’T ASSIST WITH THAT REQUEST.” Enterprise applications often need additional, more specialised security protections to meet business needs and use cases, even if these built-in precautions are vital. This leads to runtime intervention research. Runtime interventions monitor and regulate model safety. Output filtering, toxicity detection, real-time content moderation, safety metrics monitoring, input validation, performance monitoring, error handling, security monitoring, and prompt engineering to direct model behaviour are examples. Runtime interventions range from rule-based to AI-powered safety models. Third-party guardrails, foundation models, and Amazon Bedrock guardrails are examples. Amazon Bedrock Guardrails ApplyGuardrail API
Important runtime interventions include Amazon Bedrock Guardrails ApplyGuardrail API. Amazon Bedrock Guardrails compares content to validation rules at runtime to help implement safeguards. Custom guardrails can prevent prompt injection attempts, filter unsuitable content, detect and secure sensitive information (including personally identifiable information), and verify compliance with compliance requirements and permissible usage rules. Custom guardrails can restrict offensive content and trigger assaults, including medical advice. A major benefit of Amazon Bedrock Guardrails is its ability to standardise organisational policies across generative AI systems with different policies for different use cases. Despite being directly integrated with Amazon Bedrock model invocations, the ApplyGuardrail API lets Amazon Bedrock Guardrails be used with third-party models and Amazon SageMaker endpoints. ApplyGuardrail API analyses content to defined validation criteria to determine safety and quality. Integrating Amazon Bedrock Guardrails with a SageMaker endpoint involves creating the guardrail, obtaining its ID and version, and writing a function that communicates with the Amazon Bedrock runtime client to use the ApplyGuardrail API to check inputs and outputs. The article provides simplified code snippets to show this approach. A two-step validation mechanism is created by this implementation. Before receiving user input, the model is checked, and before sending output, it is assessed. If the input fails the safety check, a preset answer is returned. At SageMaker, only material that passes the initial check is handled. Dual-validation verifies that interactions follow safety and policy guidelines. By building on these tiers with foundation models as exterior guardrails, more elaborate safety checks can be added. Because they are trained for content evaluation, these models can provide more in-depth analysis than rule-based methods. Llama Guard
Llama Guard is designed for use with the primary LLM. As an LLM, Llama Guard outputs text indicating whether a prompt or response is safe or harmful. If unsafe, it lists the content categories breached. ML Commons' 13 hazards and code interpreter abuse category train Llama Guard 3 to predict safety labels for 14 categories. These categories include violent crimes, sex crimes, child sexual exploitation, privacy, hate, suicide and self-harm, and sexual material. Content moderation is available in eight languages with Llama Guard 3. In practice, TASK, INSTRUCTION, and UNSAFE_CONTENT_CATEGORIES determine evaluation criteria. Llama Guard and Amazon Bedrock Guardrails filter stuff, yet their roles are different and complementary. Amazon Bedrock Guardrails standardises rule-based PII validation, configurable policies, unsuitable material filtering, and quick injection protection. Llama Guard, a customised foundation model, provides detailed explanations of infractions and nuanced analysis across hazard categories for complex evaluation requirements. SageMaker endpoint implementation SageMaker may integrate external safety models like Llama Guard using a single endpoint with inference components or separate endpoints for each model. Inference components optimise resource use. Inference components include SageMaker AI hosting objects that deploy models to endpoints and customise CPU, accelerator, and memory allocation. Several inference components may be deployed to an endpoint, each with its own model and resources. The Invoke Endpoint API action invokes the model after deployment. The example code snippets show the endpoint, configuration, and development of two inference components. Llama Guard assessment SageMaker inference components provide an architectural style where the safety model checks requests before and after the main model. Llama Guard evaluates a user request, moves on to the main model if it's safe, and then evaluates the model's response again before returning it. If a guardrail exists, a defined message is returned. Dual-validation verifies input and output using an external safety model. However, some categories may require specialised systems and performance may vary (for example, Llama Guard across languages). Understanding the model's characteristics and limits is crucial. For high security requirements where latency and cost are less relevant, a more advanced defense-in-depth method can be implemented. This might be done with numerous specialist safety models for input and output validation. If the endpoints have enough capacity, these models can be imported from Hugging Face or implemented in SageMaker using JumpStart. Third-party guardrails safeguard further
The piece concludes with third-party guardrails for protection. These solutions improve AWS services by providing domain-specific controls, specialist protection, and industry-specific functionality. The RAIL specification lets frameworks like Guardrails AI declaratively define unique validation rules and safety checks for highly customised filtering or compliance requirements. Instead of replacing AWS functionality, third-party guardrails may add specialised capabilities. Amazon Bedrock Guardrails, AWS built-in features, and third-party solutions allow enterprises to construct comprehensive security that meets needs and meets safety regulations. In conclusion Amazon SageMaker AI safety guardrails require a multi-layered approach. Using domain-specific safety models like Llama Guard or third-party solutions, customisable model-independent controls like Amazon Bedrock Guardrails and the ApplyGuardrail API, and built-in model safeguards. A comprehensive defense-in-depth strategy that uses many methods covers more threats and follows ethical AI norms. The post suggests reviewing model cards, Amazon Bedrock Guardrails settings, and further safety levels. AI safety requires ongoing updates and monitoring.
#AmazonSageMakerSafety#AmazonSageMaker#SageMakerSafetyGuardrails#AWSSafetyguardrails#safetyguardrails#ApplyGuardrailAPI#LlamaGuardModel#technology#technews#technologynews#technologytrends#news#govindhtech
0 notes
Text
Top 10 Reasons Generative AI Platform Development Is Critical for Creating Competitive, Future-Ready AI Systems
The global AI market is projected to exceed $1.8 trillion by 2030, and at the heart of this growth lies one transformative innovation: Generative AI platforms. These platforms don’t just automate tasks—they generate content, code, models, and decisions that adapt to evolving contexts. For enterprises aiming to stay ahead, building a generative AI platform isn’t optional—it’s essential.
Below are the top 10 reasons why Generative AI Platform Development is critical for organizations that want to remain competitive, resilient, and innovation-driven in the years ahead.
1. Centralized Control Over AI Innovation
When businesses develop their own generative AI platforms, they gain full control over data, model training, fine-tuning, and deployment. This centralized approach removes dependency on third-party APIs and public LLMs, enabling customization aligned with business objectives.
2. Seamless Integration With Existing Systems
Generative AI platforms built from the ground up are tailored to fit your ecosystem—be it ERP, CRM, data lakes, or SaaS tools. This ensures interoperability and process continuity, reducing fragmentation and enabling end-to-end automation.
3. Future-Proofing With Modular Architecture
A custom-built generative AI platform allows for a modular architecture that can evolve. Whether it’s upgrading the LLM engine, plugging in new APIs, or adapting to emerging compliance laws, modular platforms provide agility and long-term adaptability.
4. Enterprise-Grade Data Privacy and Security
Relying on public generative AI tools can raise major red flags for security and compliance. A self-developed generative AI platform ensures your data stays within your control with private LLMs, on-prem hosting, and custom encryption protocols—ideal for sectors like finance, healthcare, or government.
5. Custom Models for Domain-Specific Intelligence
One size doesn’t fit all. With platform development, companies can train domain-specific models—legal, medical, logistics, etc.—to produce ultra-relevant outputs. This boosts productivity, reduces hallucinations, and enhances user trust.
6. Cost Efficiency at Scale
While third-party generative tools charge usage fees per token or request, owning your platform drastically reduces long-term costs—especially at scale. The initial investment in AI platform development pays off in the form of unlimited usage, optimization control, and cost transparency.
7. Competitive Differentiation Through Proprietary Intelligence
Your generative AI platform becomes your strategic asset—a moat against competition. Unlike off-the-shelf tools, proprietary AI platforms reflect your brand tone, business logic, and customer nuances, resulting in unique user experiences.
8. Continuous Learning and Self-Improvement
With in-house development, platforms can be designed to self-improve using reinforcement learning from human feedback (RLHF) or operational outcomes. This leads to hyper-personalized, continuously evolving AI behavior that matches user needs over time.
9. Accelerated Innovation Cycles
With your own generative AI platform, your teams can experiment, iterate, and launch faster. Whether it’s product innovation, marketing copy, or code generation, internal stakeholders aren’t limited by the constraints of third-party platforms.
10. Strategic Alignment With AI Governance and Ethics
Today’s AI needs to be transparent, auditable, and ethical. Building your generative AI platform enables you to hardcode guardrails for bias control, explainability, and user consent—ensuring alignment with global standards like GDPR, HIPAA, or ISO/IEC 42001.
Conclusion: Build It, or Be Left Behind
Generative AI is not a fleeting trend—it’s a foundational pillar of digital competitiveness. Developing your own generative AI platform is the gateway to scalable automation, industry disruption, and data-driven innovation. Enterprises that invest in Generative AI Platform Development today will lead tomorrow’s economy with agility, resilience, and intelligence hardwired into every process.
1 note
·
View note
Text
yeah... it's pretty obvious that google didn't want to spend the time or money doing the level of red teaming and retraining necessary to make a useful LLM to summarize answers to search queries
but also, it's not like LLMs run on zero context. the problem is that the text they generate is the result of an absurd level of context. and the evergreen issue is that that "latent space" is so large and multi-dimensional that there's been no one clear way to train out a universal "harmful fiction" dimension, let alone other harmful dimensions
the only effective strategy out there is just iteration and red teaming. and perversely, the better an LLM gets the more cautious you have to be as an end user to vet and verify what it's giving you, complacency is a real killer when you query these systems
And this is why you switch to DuckDuckGo. :/
23K notes
·
View notes
Text
Generative AI in Cybersecurity Course: A Future-Ready Skillset for Cyber Professionals
In today’s fast-changing digital world, cybersecurity is more crucial than ever — and so is the need for smarter, faster, and more adaptive training. As cyber threats become more advanced, organizations are turning to Generative AI to enhance security strategies. To prepare professionals for this transformation, a Generative AI in Cybersecurity course is now a must-have in every IT and security learner's toolkit.
This course bridges the gap between AI innovation and practical cybersecurity, helping learners understand how AI can be applied to identify threats, simulate attacks, and automate defenses.
🧠 What is Generative AI in Cybersecurity?
Generative AI refers to AI models that can generate content such as text, code, images, simulations, and more. In the cybersecurity context, it is used to:
Simulate realistic cyberattacks
Generate phishing or malware examples
Automate threat detection and analysis
Create training environments with evolving threat scenarios
This empowers professionals to train with realistic, AI-generated scenarios, improving readiness and response skills.
🎯 Course Objectives
A Generative AI in Cybersecurity course is designed to help learners:
Understand the basics of generative AI and large language models (LLMs)
Learn how generative AI can simulate cyber threats
Develop AI-driven cyber defense strategies
Use AI tools to automate vulnerability detection and analysis
Understand ethical and legal considerations of using AI in security
🧩 Who Should Enroll?
This course is ideal for:
Cybersecurity professionals
Ethical hackers and penetration testers
SOC analysts and incident responders
Security architects and consultants
IT managers and security engineers
AI and data science professionals entering cybersecurity
Whether you’re a beginner or an experienced pro, this course offers valuable skills for the future of cyber defense.
📚 Course Modules Overview
Here’s what a typical Generative AI in Cybersecurity Course might cover:
Module 1: Introduction to Generative AI
What is Generative AI?
Types of models (e.g., GPT, GANs, VAEs)
Role of LLMs in cybersecurity
Module 2: Cybersecurity Fundamentals Refresher
Threat types and attack vectors
Security principles (CIA triad, Zero Trust)
Module 3: AI-Powered Threat Simulations
Generating phishing emails and malware samples
Simulating DDoS and insider attacks
Red vs Blue Team AI simulations
Read More: Generative AI in Cybersecurity Course
0 notes
Text
Generative AI red teaming: Tips and techniques for putting LLMs to the test
http://securitytc.com/TJVHRg
0 notes
Text
Microsoft's own baddie team 'attacked' more than 100 generative AI products: Here's what they learnt
Microsoft created an AI red team back in 2018 as it foresaw the rise of AI A red team represents the enemy; and adopts the adversarial persona. Latest whitepaper from the team hopes to address common vulnerabilities in AI systems and LLMs Over the past seven years, Microsoft has been addressing the risks in artificial intelligence systems through its dedicated AI ‘red team’. Established to…
0 notes
Text
Teaching a robot its limits, to complete open-ended tasks safely
New Post has been published on https://sunalei.org/news/teaching-a-robot-its-limits-to-complete-open-ended-tasks-safely/
Teaching a robot its limits, to complete open-ended tasks safely
If someone advises you to “know your limits,” they’re likely suggesting you do things like exercise in moderation. To a robot, though, the motto represents learning constraints, or limitations of a specific task within the machine’s environment, to do chores safely and correctly.
For instance, imagine asking a robot to clean your kitchen when it doesn’t understand the physics of its surroundings. How can the machine generate a practical multistep plan to ensure the room is spotless? Large language models (LLMs) can get them close, but if the model is only trained on text, it’s likely to miss out on key specifics about the robot’s physical constraints, like how far it can reach or whether there are nearby obstacles to avoid. Stick to LLMs alone, and you’re likely to end up cleaning pasta stains out of your floorboards.
To guide robots in executing these open-ended tasks, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) used vision models to see what’s near the machine and model its constraints. The team’s strategy involves an LLM sketching up a plan that’s checked in a simulator to ensure it’s safe and realistic. If that sequence of actions is infeasible, the language model will generate a new plan, until it arrives at one that the robot can execute.
This trial-and-error method, which the researchers call “Planning for Robots via Code for Continuous Constraint Satisfaction” (PRoC3S), tests long-horizon plans to ensure they satisfy all constraints, and enables a robot to perform such diverse tasks as writing individual letters, drawing a star, and sorting and placing blocks in different positions. In the future, PRoC3S could help robots complete more intricate chores in dynamic environments like houses, where they may be prompted to do a general chore composed of many steps (like “make me breakfast”).
“LLMs and classical robotics systems like task and motion planners can’t execute these kinds of tasks on their own, but together, their synergy makes open-ended problem-solving possible,” says PhD student Nishanth Kumar SM ’24, co-lead author of a new paper about PRoC3S. “We’re creating a simulation on-the-fly of what’s around the robot and trying out many possible action plans. Vision models help us create a very realistic digital world that enables the robot to reason about feasible actions for each step of a long-horizon plan.”
The team’s work was presented this past month in a paper shown at the Conference on Robot Learning (CoRL) in Munich, Germany.
Play video
Teaching a robot its limits for open-ended chores MIT CSAIL
The researchers’ method uses an LLM pre-trained on text from across the internet. Before asking PRoC3S to do a task, the team provided their language model with a sample task (like drawing a square) that’s related to the target one (drawing a star). The sample task includes a description of the activity, a long-horizon plan, and relevant details about the robot’s environment.
But how did these plans fare in practice? In simulations, PRoC3S successfully drew stars and letters eight out of 10 times each. It also could stack digital blocks in pyramids and lines, and place items with accuracy, like fruits on a plate. Across each of these digital demos, the CSAIL method completed the requested task more consistently than comparable approaches like “LLM3” and “Code as Policies”.
The CSAIL engineers next brought their approach to the real world. Their method developed and executed plans on a robotic arm, teaching it to put blocks in straight lines. PRoC3S also enabled the machine to place blue and red blocks into matching bowls and move all objects near the center of a table.
Kumar and co-lead author Aidan Curtis SM ’23, who’s also a PhD student working in CSAIL, say these findings indicate how an LLM can develop safer plans that humans can trust to work in practice. The researchers envision a home robot that can be given a more general request (like “bring me some chips”) and reliably figure out the specific steps needed to execute it. PRoC3S could help a robot test out plans in an identical digital environment to find a working course of action — and more importantly, bring you a tasty snack.
For future work, the researchers aim to improve results using a more advanced physics simulator and to expand to more elaborate longer-horizon tasks via more scalable data-search techniques. Moreover, they plan to apply PRoC3S to mobile robots such as a quadruped for tasks that include walking and scanning surroundings.
“Using foundation models like ChatGPT to control robot actions can lead to unsafe or incorrect behaviors due to hallucinations,” says The AI Institute researcher Eric Rosen, who isn’t involved in the research. “PRoC3S tackles this issue by leveraging foundation models for high-level task guidance, while employing AI techniques that explicitly reason about the world to ensure verifiably safe and correct actions. This combination of planning-based and data-driven approaches may be key to developing robots capable of understanding and reliably performing a broader range of tasks than currently possible.”
Kumar and Curtis’ co-authors are also CSAIL affiliates: MIT undergraduate researcher Jing Cao and MIT Department of Electrical Engineering and Computer Science professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. Their work was supported, in part, by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the Army Research Office, MIT Quest for Intelligence, and The AI Institute.
0 notes
Link
As the use of large language models (LLMs) becomes increasingly prevalent across real-world applications, concerns about their vulnerabilities grow accordingly. Despite their capabilities, LLMs are still susceptible to various types of adversarial a #AI #ML #Automation
0 notes