#International Conference on Learning Representations (ICLR)
Explore tagged Tumblr posts
thepastisalreadywritten · 2 months ago
Text
AI Tool Reproduces Ancient Cuneiform Characters with High Accuracy
Tumblr media
ProtoSnap, developed by Cornell and Tel Aviv universities, aligns prototype signs to photographed clay tablets to decode thousands of years of Mesopotamian writing.
Cornell University researchers report that scholars can now use artificial intelligence to “identify and copy over cuneiform characters from photos of tablets,” greatly easing the reading of these intricate scripts​.
The new method, called ProtoSnap, effectively “snaps” a skeletal template of a cuneiform sign onto the image of a tablet, aligning the prototype to the strokes actually impressed in the clay​.
By fitting each character’s prototype to its real-world variation, the system can produce an accurate copy of any sign and even reproduce entire tablets​.
"Cuneiform, like Egyptian hieroglyphs, is one of the oldest known writing systems and contains over 1,000 unique symbols​.
Its characters change shape dramatically across different eras, cultures and even individual scribes so that even the same character… looks different across time,” Cornell computer scientist Hadar Averbuch-Elor explains​.
This extreme variability has long made automated reading of cuneiform a very challenging problem.
The ProtoSnap technique addresses this by using a generative AI model known as a diffusion model.
It compares each pixel of a photographed tablet character to a reference prototype sign, calculating deep-feature similarities.
Once the correspondences are found, the AI aligns the prototype skeleton to the tablet’s marking and “snaps” it into place so that the template matches the actual strokes​.
In effect, the system corrects for differences in writing style or tablet wear by deforming the ideal prototype to fit the real inscription.
Crucially, the corrected (or “snapped”) character images can then train other AI tools.
The researchers used these aligned signs to train optical-character-recognition models that turn tablet photos into machine-readable text​.
They found the models trained on ProtoSnap data performed much better than previous approaches at recognizing cuneiform signs, especially the rare ones or those with highly varied forms.
In practical terms, this means the AI can read and copy symbols that earlier methods often missed.
This advance could save scholars enormous amounts of time.
Traditionally, experts painstakingly hand-copy each cuneiform sign on a tablet.
The AI method can automate that process, freeing specialists to focus on interpretation.
It also enables large-scale comparisons of handwriting across time and place, something too laborious to do by hand.
As Tel Aviv University archaeologist Yoram Cohen says, the goal is to “increase the ancient sources available to us by tenfold,” allowing big-data analysis of how ancient societies lived – from their religion and economy to their laws and social life​.
The research was led by Hadar Averbuch-Elor of Cornell Tech and carried out jointly with colleagues at Tel Aviv University.
Graduate student Rachel Mikulinsky, a co-first author, will present the work – titled “ProtoSnap: Prototype Alignment for Cuneiform Signs” – at the International Conference on Learning Representations (ICLR) in April.
In all, roughly 500,000 cuneiform tablets are stored in museums worldwide, but only a small fraction have ever been translated and published​.
By giving AI a way to automatically interpret the vast trove of tablet images, the ProtoSnap method could unlock centuries of untapped knowledge about the ancient world.
5 notes · View notes
mariacallous · 1 month ago
Text
The government of Singapore released a blueprint today for global collaboration on artificial intelligence safety following a meeting of AI researchers from the US, China, and Europe. The document lays out a shared vision for working on AI safety through international cooperation rather than competition.
“Singapore is one of the few countries on the planet that gets along well with both East and West,” says Max Tegmark, a scientist at MIT who helped convene the meeting of AI luminaries last month. “They know that they're not going to build [artificial general intelligence] themselves—they will have it done to them—so it is very much in their interests to have the countries that are going to build it talk to each other."
The countries thought most likely to build AGI are, of course, the US and China—and yet those nations seem more intent on outmaneuvering each other than working together. In January, after Chinese startup DeepSeek released a cutting-edge model, President Trump called it “a wakeup call for our industries” and said the US needed to be “laser-focused on competing to win.”
The Singapore Consensus on Global AI Safety Research Priorities calls for researchers to collaborate in three key areas: studying the risks posed by frontier AI models, exploring safer ways to build those models, and developing methods for controlling the behavior of the most advanced AI systems.
The consensus was developed at a meeting held on April 26 alongside the International Conference on Learning Representations (ICLR), a premier AI event held in Singapore this year.
Researchers from OpenAI, Anthropic, Google DeepMind, xAI, and Meta all attended the AI safety event, as did academics from institutions including MIT, Stanford, Tsinghua, and the Chinese Academy of Sciences. Experts from AI safety institutes in the US, UK, France, Canada, China, Japan and Korea also participated.
"In an era of geopolitical fragmentation, this comprehensive synthesis of cutting-edge research on AI safety is a promising sign that the global community is coming together with a shared commitment to shaping a safer AI future," Xue Lan, dean of Tsinghua University, said in a statement.
The development of increasingly capable AI models, some of which have surprising abilities, has caused researchers to worry about a range of risks. While some focus on near-term harms including problems caused by biased AI systems or the potential for criminals to harness the technology, a significant number believe that AI may pose an existential threat to humanity as it begins to outsmart humans across more domains. These researchers, sometimes referred to as “AI doomers,” worry that models may deceive and manipulate humans in order to pursue their own goals.
The potential of AI has also stoked talk of an arms race between the US, China, and other powerful nations. The technology is viewed in policy circles as critical to economic prosperity and military dominance, and many governments have sought to stake out their own visions and regulations governing how it should be developed.
DeepSeek’s debut in January compounded fears that China may be catching up or even surpassing the US, despite efforts to curb China’s access to AI hardware with export controls. Now, the Trump administration is mulling additional measures aimed at restricting China’s ability to build cutting-edge AI.
The Trump administration has also sought to downplay AI risks in favor of a more aggressive approach to building the technology in the US. At a major AI meeting in Paris in 2025, Vice President JD Vance said that the US government wanted fewer restrictions around the development and deployment of AI, and described the previous approach as “too risk-averse.”
Tegmark, the MIT scientist, says some AI researchers are keen to “turn the tide a bit after Paris” by refocusing attention back on the potential risks posed by increasingly powerful AI.
At the meeting in Singapore, Tegmark presented a technical paper that challenged some assumptions about how AI can be built safely. Some researchers had previously suggested that it may be possible to control powerful AI models using weaker ones. Tegmark’s paper shows that this dynamic does not work in some simple scenarios, meaning it may well fail to prevent AI models from going awry.
“We tried our best to put numbers to this, and technically it doesn't work at the level you'd like,” Tegmark says. “And, you know, the stakes are quite high.”
2 notes · View notes
sunaleisocial · 1 month ago
Text
Learning how to predict rare kinds of failures
New Post has been published on https://sunalei.org/news/learning-how-to-predict-rare-kinds-of-failures/
Learning how to predict rare kinds of failures
Tumblr media
On Dec. 21, 2022, just as peak holiday season travel was getting underway, Southwest Airlines went through a cascading series of failures in their scheduling, initially triggered by severe winter weather in the Denver area. But the problems spread through their network, and over the course of the next 10 days the crisis ended up stranding over 2 million passengers and causing losses of $750 million for the airline.
How did a localized weather system end up triggering such a widespread failure? Researchers at MIT have examined this widely reported failure as an example of cases where systems that work smoothly most of the time suddenly break down and cause a domino effect of failures. They have now developed a computational system for using the combination of sparse data about a rare failure event, in combination with much more extensive data on normal operations, to work backwards and try to pinpoint the root causes of the failure, and hopefully be able to find ways to adjust the systems to prevent such failures in the future.
The findings were presented at the International Conference on Learning Representations (ICLR), which was held in Singapore from April 24-28 by MIT doctoral student Charles Dawson, professor of aeronautics and astronautics Chuchu Fan, and colleagues from Harvard University and the University of Michigan.
“The motivation behind this work is that it’s really frustrating when we have to interact with these complicated systems, where it’s really hard to understand what’s going on behind the scenes that’s creating these issues or failures that we’re observing,” says Dawson.
The new work builds on previous research from Fan’s lab, where they looked at problems involving hypothetical failure prediction problems, she says, such as with groups of robots working together on a task, or complex systems such as the power grid, looking for ways to predict how such systems may fail. “The goal of this project,” Fan says, “was really to turn that into a diagnostic tool that we could use on real-world systems.”
The idea was to provide a way that someone could “give us data from a time when this real-world system had an issue or a failure,” Dawson says, “and we can try to diagnose the root causes, and provide a little bit of a look behind the curtain at this complexity.”
The intent is for the methods they developed “to work for a pretty general class of cyber-physical problems,” he says. These are problems in which “you have an automated decision-making component interacting with the messiness of the real world,” he explains. There are available tools for testing software systems that operate on their own, but the complexity arises when that software has to interact with physical entities going about their activities in a real physical setting, whether it be the scheduling of aircraft, the movements of autonomous vehicles, the interactions of a team of robots, or the control of the inputs and outputs on an electric grid. In such systems, what often happens, he says, is that “the software might make a decision that looks OK at first, but then it has all these domino, knock-on effects that make things messier and much more uncertain.”
One key difference, though, is that in systems like teams of robots, unlike the scheduling of airplanes, “we have access to a model in the robotics world,” says Fan, who is a principal investigator in MIT’s Laboratory for Information and Decision Systems (LIDS). “We do have some good understanding of the physics behind the robotics, and we do have ways of creating a model” that represents their activities with reasonable accuracy. But airline scheduling involves processes and systems that are proprietary business information, and so the researchers had to find ways to infer what was behind the decisions, using only the relatively sparse publicly available information, which essentially consisted of just the actual arrival and departure times of each plane.
“We have grabbed all this flight data, but there is this entire system of the scheduling system behind it, and we don’t know how the system is working,” Fan says. And the amount of data relating to the actual failure is just several day’s worth, compared to years of data on normal flight operations.
The impact of the weather events in Denver during the week of Southwest’s scheduling crisis clearly showed up in the flight data, just from the longer-than-normal turnaround times between landing and takeoff at the Denver airport. But the way that impact cascaded though the system was less obvious, and required more analysis. The key turned out to have to do with the concept of reserve aircraft.
Airlines typically keep some planes in reserve at various airports, so that if problems are found with one plane that is scheduled for a flight, another plane can be quickly substituted. Southwest uses only a single type of plane, so they are all interchangeable, making such substitutions easier. But most airlines operate on a hub-and-spoke system, with a few designated hub airports where most of those reserve aircraft may be kept, whereas Southwest does not use hubs, so their reserve planes are more scattered throughout their network. And the way those planes were deployed turned out to play a major role in the unfolding crisis.
“The challenge is that there’s no public data available in terms of where the aircraft are stationed throughout the Southwest network,” Dawson says. “What we’re able to find using our method is, by looking at the public data on arrivals, departures, and delays, we can use our method to back out what the hidden parameters of those aircraft reserves could have been, to explain the observations that we were seeing.”
What they found was that the way the reserves were deployed was a “leading indicator” of the problems that cascaded in a nationwide crisis. Some parts of the network that were affected directly by the weather were able to recover quickly and get back on schedule. “But when we looked at other areas in the network, we saw that these reserves were just not available, and things just kept getting worse.”
For example, the data showed that Denver’s reserves were rapidly dwindling because of the weather delays, but then “it also allowed us to trace this failure from Denver to Las Vegas,” he says. While there was no severe weather there, “our method was still showing us a steady decline in the number of aircraft that were able to serve flights out of Las Vegas.”
He says that “what we found was that there were these circulations of aircraft within the Southwest network, where an aircraft might start the day in California and then fly to Denver, and then end the day in Las Vegas.” What happened in the case of this storm was that the cycle got interrupted. As a result, “this one storm in Denver breaks the cycle, and suddenly the reserves in Las Vegas, which is not affected by the weather, start to deteriorate.”
In the end, Southwest was forced to take a drastic measure to resolve the problem: They had to do a “hard reset” of their entire system, canceling all flights and flying empty aircraft around the country to rebalance their reserves.
Working with experts in air transportation systems, the researchers developed a model of how the scheduling system is supposed to work. Then, “what our method does is, we’re essentially trying to run the model backwards.” Looking at the observed outcomes, the model allows them to work back to see what kinds of initial conditions could have produced those outcomes.
While the data on the actual failures were sparse, the extensive data on typical operations helped in teaching the computational model “what is feasible, what is possible, what’s the realm of physical possibility here,” Dawson says. “That gives us the domain knowledge to then say, in this extreme event, given the space of what’s possible, what’s the most likely explanation” for the failure.
This could lead to a real-time monitoring system, he says, where data on normal operations are constantly compared to the current data, and determining what the trend looks like. “Are we trending toward normal, or are we trending toward extreme events?” Seeing signs of impending issues could allow for preemptive measures, such as redeploying reserve aircraft in advance to areas of anticipated problems.
Work on developing such systems is ongoing in her lab, Fan says. In the meantime, they have produced an open-source tool for analyzing failure systems, called CalNF, which is available for anyone to use. Meanwhile Dawson, who earned his doctorate last year, is working as a postdoc to apply the methods developed in this work to understanding failures in power networks.
The research team also included Max Li from the University of Michigan and Van Tran from Harvard University. The work was supported by NASA, the Air Force Office of Scientific Research, and the MIT-DSTA program.
0 notes
qhsetools2022 · 4 months ago
Text
The first AI scientist writing peer-reviewed papers
The newly-formed Autoscience Institute has unveiled ‘Carl,’ the first AI system crafting academic research papers to pass a rigorous double-blind peer-review process. Carl’s research papers were accepted in the Tiny Papers track at the International Conference on Learning Representations (ICLR). Critically, these submissions were generated with minimal human involvement, heralding a new era for…
0 notes
jcmarchi · 4 months ago
Text
Autoscience Carl: The first AI scientist writing peer-reviewed papers
New Post has been published on https://thedigitalinsider.com/autoscience-carl-the-first-ai-scientist-writing-peer-reviewed-papers/
Autoscience Carl: The first AI scientist writing peer-reviewed papers
The newly-formed Autoscience Institute has unveiled ‘Carl,’ the first AI system crafting academic research papers to pass a rigorous double-blind peer-review process.
Carl’s research papers were accepted in the Tiny Papers track at the International Conference on Learning Representations (ICLR). Critically, these submissions were generated with minimal human involvement, heralding a new era for AI-driven scientific discovery.
Meet Carl: The ‘automated research scientist’
Carl represents a leap forward in the role of AI as not just a tool, but an active participant in academic research. Described as “an automated research scientist,” Carl applies natural language models to ideate, hypothesise, and cite academic work accurately. 
Crucially, Carl can read and comprehend published papers in mere seconds. Unlike human researchers, it works continuously, thus accelerating research cycles and reducing experimental costs.
According to Autoscience, Carl successfully “ideated novel scientific hypotheses, designed and performed experiments, and wrote multiple academic papers that passed peer review at workshops.”
This underlines the potential of AI to not only complement human research but, in many ways, surpass it in speed and efficiency.
Carl is a meticulous worker, but human involvement is still vital
Carl’s ability to generate high-quality academic work is built on a three-step process:
Ideation and hypothesis formation: Leveraging existing research, Carl identifies potential research directions and generates hypotheses. Its deep understanding of related literature allows it to formulate novel ideas in the field of AI.
Experimentation: Carl writes code, tests hypotheses, and visualises the resulting data through detailed figures. Its tireless operation shortens iteration times and reduces redundant tasks.
Presentation: Finally, Carl compiles its findings into polished academic papers—complete with data visualisations and clearly articulated conclusions.
Although Carl’s capabilities make it largely independent, there are points in its workflow where human involvement is still required to adhere to computational, formatting, and ethical standards:
Greenlighting research steps: To avoid wasting computational resources, human reviewers provide “continue” or “stop” signals during specific stages of Carl’s process. This guidance steers Carl through projects more efficiently but does not influence the specifics of the research itself.
Citations and formatting: The Autoscience team ensures all references are correctly cited and formatted to meet academic standards. This is currently a manual step but ensures the research aligns with the expectations of its publication venue. 
Assistance with pre-API models: Carl occasionally relies on newer OpenAI and Deep Research models that lack auto-accessible APIs. In such cases, manual interventions – such as copy-pasting outputs – bridge these gaps. Autoscience expects these tasks to be entirely automated in the future when APIs become available.
For Carl’s debut paper, the human team also helped craft the “related works” section and refine the language. These tasks, however, were unnecessary following updates applied before subsequent submissions.
Stringent verification process for academic integrity
Before submitting any research, the Autoscience team undertook a rigorous verification process to ensure Carl’s work met the highest standards of academic integrity:
Reproducibility: Every line of Carl’s code was reviewed and experiments were rerun to confirm reproducibility. This ensured the findings were scientifically valid and not coincidental anomalies.
Originality checks: Autoscience conducted extensive novelty evaluations to ensure that Carl’s ideas were new contributions to the field and not rehashed versions of existing publications.
External validation: A hackathon involving researchers from prominent academic institutions – such as MIT, Stanford University, and U.C. Berkeley – independently verified Carl’s research. Further plagiarism and citation checks were performed to ensure compliance with academic norms.
Undeniable potential, but raises larger questions
Achieving acceptance at a workshop as respected as the ICLR is a significant milestone, but Autoscience recognises the greater conversation this milestone may spark. Carl’s success raises larger philosophical and logistical questions about the role of AI in academic settings.
“We believe that legitimate results should be added to the public knowledge base, regardless of where they originated,” explained Autoscience. “If research meets the scientific standards set by the academic community, then who – or what – created it should not lead to automatic disqualification.”
“We also believe, however, that proper attribution is necessary for transparent science, and work purely generated by AI systems should be discernable from that produced by humans.”
Given the novelty of autonomous AI researchers like Carl, conference organisers may need time to establish new guidelines that account for this emerging paradigm, especially to ensure fair evaluation and intellectual attribution standards. To prevent unnecessary controversy at present, Autoscience has withdrawn Carl’s papers from ICLR workshops while these frameworks are being devised.
Moving forward, Autoscience aims to contribute to shaping these evolving standards. The company intends to propose a dedicated workshop at NeurIPS 2025 to formally accommodate research submissions from autonomous research systems. 
As the narrative surrounding AI-generated research unfolds, it’s clear that systems like Carl are not merely tools but collaborators in the pursuit of knowledge. But as these systems transcend typical boundaries, the academic community must adapt to fully embrace this new paradigm while safeguarding integrity, transparency, and proper attribution.
(Photo by Rohit Tandon)
See also: You.com ARI: Professional-grade AI research agent for businesses
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
0 notes
cleverhottubmiracle · 4 months ago
Link
[ad_1] Research Published 3 May 2024 Developing next-gen AI agents, exploring new modalities, and pioneering foundational learningNext week, AI researchers from around the globe will converge at the 12th International Conference on Learning Representations (ICLR), set to take place May 7-11 in Vienna, Austria.Raia Hadsell, Vice President of Research at Google DeepMind, will deliver a keynote reflecting on the last 20 years in the field, highlighting how lessons learned are shaping the future of AI for the benefit of humanity.We’ll also offer live demonstrations showcasing how we bring our foundational research into reality, from the development of Robotics Transformers to the creation of toolkits and open-source models like Gemma.Teams from across Google DeepMind will present more than 70 papers this year. Some research highlights: Problem-solving agents and human-inspired approachesLarge language models (LLMs) are already revolutionizing advanced AI tools, yet their full potential remains untapped. For instance, LLM-based AI agents capable of taking effective actions could transform digital assistants into more helpful and intuitive AI tools.AI assistants that follow natural language instructions to carry out web-based tasks on people’s behalf would be a huge timesaver. In an oral presentation we introduce WebAgent, an LLM-driven agent that learns from self-experience to navigate and manage complex tasks on real-world websites.To further enhance the general usefulness of LLMs, we focused on boosting their problem-solving skills. We demonstrate how we achieved this by equipping an LLM-based system with a traditionally human approach: producing and using “tools”. Separately, we present a training technique that ensures language models produce more consistently socially acceptable outputs. Our approach uses a sandbox rehearsal space that represents the values of society.Pushing boundaries in vision and coding Our Dynamic Scene Transformer (DyST) model leverages real-world single-camera videos to extract 3D representations of objects in the scene and their movements. Until recently, large AI models mostly focused on text and images, laying the groundwork for large-scale pattern recognition and data interpretation. Now, the field is progressing beyond these static realms to embrace the dynamics of real-world visual environments. As computing advances across the board, it is increasingly important that its underlying code is generated and optimized with maximum efficiency.When you watch a video on a flat screen, you intuitively grasp the three-dimensional nature of the scene. Machines, however, struggle to emulate this ability without explicit supervision. We showcase our Dynamic Scene Transformer (DyST) model, which leverages real-world single-camera videos to extract 3D representations of objects in the scene and their movements. What’s more, DyST also enables the generation of novel versions of the same video, with user control over camera angles and content.Emulating human cognitive strategies also makes for better AI code generators. When programmers write complex code, they typically “decompose” the task into simpler subtasks. With ExeDec, we introduce a novel code-generating approach that harnesses a decomposition approach to elevate AI systems’ programming and generalization performance.In a parallel spotlight paper we explore the novel use of machine learning to not only generate code, but to optimize it, introducing a dataset for the robust benchmarking of code performance. Code optimization is challenging, requiring complex reasoning, and our dataset enables the exploration of a range of ML techniques. We demonstrate that the resulting learning strategies outperform human-crafted code optimizations. ExeDec introduces a novel code-generating approach that harnesses a decomposition approach to elevate AI systems’ programming and generalization performance Advancing foundational learningOur research teams are tackling the big questions of AI - from exploring the essence of machine cognition to understanding how advanced AI models generalize - while also working to overcome key theoretical challenges.For both humans and machines, causal reasoning and the ability to predict events are closely related concepts. In a spotlight presentation, we explore how reinforcement learning is affected by prediction-based training objectives, and draw parallels to changes in brain activity also linked to prediction.When AI agents are able to generalize well to new scenarios is it because they, like humans, have learned an underlying causal model of their world? This is a critical question in advanced AI. In an oral presentation, we reveal that such models have indeed learned an approximate causal model of the processes that resulted in their training data, and discuss the deep implications.Another critical question in AI is trust, which in part depends on how accurately models can estimate the uncertainty of their outputs - a crucial factor for reliable decision-making. We've made significant advances in uncertainty estimation within Bayesian deep learning, employing a simple and essentially cost-free method.Finally, we explore game theory’s Nash equilibrium (NE) - a state in which no player benefits from changing their strategy if others maintain theirs. Beyond simple two-player games, even approximating a Nash equilibrium is computationally intractable, but in an oral presentation, we reveal new state-of-the-art approaches in negotiating deals from poker to auctions.Bringing together the AI communityWe’re delighted to sponsor ICLR and support initiatives including Queer in AI and Women In Machine Learning. Such partnerships not only bolster research collaborations but also foster a vibrant, diverse community in AI and machine learning.If you’re at ICLR, be sure to visit our booth and our Google Research colleagues next door. Discover our pioneering research, meet our teams hosting workshops, and engage with our experts presenting throughout the conference. We look forward to connecting with you! [ad_2] Source link
0 notes
fumpkins · 3 years ago
Text
Is technology spying on you? New AI could prevent eavesdropping | Science
Tumblr media
Big Brother is listening. Companies use “bossware” to listen to their employees when they’re near their computers. Multiple “spyware” apps can record phone calls. And home devices such as Amazon’s Echo can record everyday conversations. A new technology, called Neural Voice Camouflage, now offers a defense. It generates custom audio noise in the background as you talk, confusing the artificial intelligence (AI) that transcribes our recorded voices.
The new system uses an “adversarial attack.” The strategy employs machine learning—in which algorithms find patterns in data—to tweak sounds in a way that causes an AI, but not people, to mistake it for something else. Essentially, you use one AI to fool another.
The process isn’t as easy as it sounds, however. The machine-learning AI needs to process the whole sound clip before knowing how to tweak it, which doesn’t work when you want to camouflage in real time.
So in the new study, researchers taught a neural network, a machine-learning system inspired by the brain, to effectively predict the future. They trained it on many hours of recorded speech so it can constantly process 2-second clips of audio and disguise what’s likely to be said next.
For instance, if someone has just said “enjoy the great feast,” it can’t predict exactly what will be said next. But by taking into account what was just said, as well as characteristics of the speaker’s voice, it produces sounds that will disrupt a range of possible phrases that could follow. That includes what actually happened next; here, the same speaker saying, “that’s being cooked.” To human listeners, the audio camouflage sounds like background noise, and they have no trouble understanding the spoken words. But machines stumble.
M. Chiquier et al., ICLR 2022 Oral
The scientists overlaid the output of their system onto recorded speech as it was being fed directly into one of the automatic speech recognition (ASR) systems that might be used by eavesdroppers to transcribe. The system increased the ASR software’s word error rate from 11.3% to 80.2%. “I’m nearly starved myself, for this conquering kingdoms is hard work,” for example, was transcribed as “im mearly starme my scell for threa for this conqernd kindoms as harenar ov the reson” (see video, above).
The error rates for speech disguised by white noise and a competing adversarial attack (which, lacking predictive capabilities, masked only what it had just heard with noise played half a second too late) were only 12.8% and 20.5%, respectively. The work was presented in a paper last month at the International Conference on Learning Representations, which peer reviews manuscript submissions.
Even when the ASR system was trained to transcribe speech perturbed by Neural Voice Camouflage (a technique eavesdroppers could conceivably employ), its error rate remained 52.5%. In general, the hardest words to disrupt were short ones, such as “the,” but these are the least revealing parts of a conversation.
The researchers also tested the method in the real world, playing a voice recording combined with the camouflage through a set of speakers in the same room as a microphone. It still worked. For example, “I also just got a new monitor” was transcribed as “with reasons with they also toscat and neumanitor.”
This is just the first step in safeguarding privacy in the face of AI, says Mia Chiquier, a computer scientist at Columbia University who led the research. “Artificial intelligence collects data about our voice, our faces, and our actions. We need a new generation of technology that respects our privacy.”
Chiquier adds that the predictive part of the system has great potential for other applications that need real-time processing, such as autonomous vehicles. “You have to anticipate where the car will be next, where the pedestrian might be,” she says. Brains also operate through anticipation; you feel surprise when your brain incorrectly predicts something. In that regard, Chiquier says, “We’re emulating the way humans do things.”
“There’s something nice about the way it combines predicting the future, a classic problem in machine learning, with this other problem of adversarial machine learning,” says Andrew Owens, a computer scientist at the University of Michigan, Ann Arbor, who studies audio processing and visual camouflage and was not involved in the work. Bo Li, a computer scientist at the University of Illinois, Urbana-Champaign, who has worked on audio adversarial attacks, was impressed that the new approach worked even against the fortified ASR system.
Audio camouflage is much needed, says Jay Stanley, a senior policy analyst at the American Civil Liberties Union. “All of us are susceptible to having our innocent speech misinterpreted by security algorithms.” Maintaining privacy is hard work, he says. Or rather it’s harenar ov the reson.
New post published on: https://livescience.tech/2022/05/31/is-technology-spying-on-you-new-ai-could-prevent-eavesdropping-science/
0 notes
saurabhbagchi · 3 years ago
Text
OpenReview: A Positive Direction for Peer Review?
OpenReview: A Positive Direction for Peer Review?
I recently had occasion to use OpenReview as the reviewing platform for a conference. The conference was the International Conference on Learning Representations (ICLR), a core Machine Learning conference. I used it both as a reviewer and as an author. OpenReview is a platform that is distinguished by two features: It facilitates dialog between the authors and the reviewers during the…
Tumblr media
View On WordPress
0 notes
zibibyte · 3 years ago
Text
Our Favorite Deep Learning Papers and Talks from ICLR 2021
Tumblr media
The International Conference on Learning Representations is one of the premier international conferences on machine learning, with a special focus on deep learning (also known as representation learning). As we have in past years, Two Sigma sponsored the ICLR 2021, which took place virtually in May. via https://ift.tt/3K8pRB9
0 notes
imgexhaust · 5 years ago
Text
How neural nets are really just looking at textures
A paper submitted to this year’s International Conference on Learning Representations (ICLR) may explain why. Researchers from the University of Tübingen in Germany found that CNNs trained on ImageNet identify objects by their texture rather than shape.
Source:  https://openreview.net/forum?id=Bygh9j09KX https://www.theregister.co.uk/2019/02/13/ai_image_texture/ / https://twitter.com/mikarv/status/1095770134260731904
Tumblr media
0 notes
vladislav-karelin · 6 years ago
Text
[Перевод] 8 лучших трендов International Conference on Learning Representations (ICLR) 2019
Тема анализа данных и Data Science в наши дни развивается с поразительной скоростью. Для того, чтобы понимать актуальность своих методов и подходов, необходимо быть в курсе работ коллег, и именно на конференциях удается получить информацию о трендах современности. К сожалению, не все мероприятия можно посетить, поэтому статьи о прошедших конференциях представляют интерес для специалистов, не нашедших времени и возможности для личного присутствия. Мы рады представить вам перевод статьи Чип Хен (Chip Huyen) о конференции ICLR 2019, посвященной передовым веяниям и подходам в области Data Science.
Tumblr media
Читать дальше → from Искусственный интеллект – AI, ANN и иные формы искусственного разума https://habr.com/ru/post/475720/?utm_campaign=475720&utm_source=habrahabr&utm_medium=rss via IFTTT
0 notes
sunaleisocial · 1 year ago
Text
Engineering household robots to have a little common sense
New Post has been published on https://sunalei.org/news/engineering-household-robots-to-have-a-little-common-sense/
Engineering household robots to have a little common sense
From wiping up spills to serving up food, robots are being taught to carry out increasingly complicated household tasks. Many such home-bot trainees are learning through imitation; they are programmed to copy the motions that a human physically guides them through.
It turns out that robots are excellent mimics. But unless engineers also program them to adjust to every possible bump and nudge, robots don’t necessarily know how to handle these situations, short of starting their task from the top.
Now MIT engineers are aiming to give robots a bit of common sense when faced with situations that push them off their trained path. They’ve developed a method that connects robot motion data with the “common sense knowledge” of large language models, or LLMs.
Their approach enables a robot to logically parse many given household task into subtasks, and to physically adjust to disruptions within a subtask so that the robot can move on without having to go back and start a task from scratch — and without engineers having to explicitly program fixes for every possible failure along the way.   
Image courtesy of the researchers.
“Imitation learning is a mainstream approach enabling household robots. But if a robot is blindly mimicking a human’s motion trajectories, tiny errors can accumulate and eventually derail the rest of the execution,” says Yanwei Wang, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS). “With our method, a robot can self-correct execution errors and improve overall task success.”
Wang and his colleagues detail their new approach in a study they will present at the International Conference on Learning Representations (ICLR) in May. The study’s co-authors include EECS graduate students Tsun-Hsuan Wang and Jiayuan Mao, Michael Hagenow, a postdoc in MIT’s Department of Aeronautics and Astronautics (AeroAstro), and Julie Shah, the H.N. Slater Professor in Aeronautics and Astronautics at MIT.
Language task
The researchers illustrate their new approach with a simple chore: scooping marbles from one bowl and pouring them into another. To accomplish this task, engineers would typically move a robot through the motions of scooping and pouring — all in one fluid trajectory. They might do this multiple times, to give the robot a number of human demonstrations to mimic.
“But the human demonstration is one long, continuous trajectory,” Wang says.
The team realized that, while a human might demonstrate a single task in one go, that task depends on a sequence of subtasks, or trajectories. For instance, the robot has to first reach into a bowl before it can scoop, and it must scoop up marbles before moving to the empty bowl, and so forth. If a robot is pushed or nudged to make a mistake during any of these subtasks, its only recourse is to stop and start from the beginning, unless engineers were to explicitly label each subtask and program or collect new demonstrations for the robot to recover from the said failure, to enable a robot to self-correct in the moment.
“That level of planning is very tedious,” Wang says.
Instead, he and his colleagues found some of this work could be done automatically by LLMs. These deep learning models process immense libraries of text, which they use to establish connections between words, sentences, and paragraphs. Through these connections, an LLM can then generate new sentences based on what it has learned about the kind of word that is likely to follow the last.
For their part, the researchers found that in addition to sentences and paragraphs, an LLM can be prompted to produce a logical list of subtasks that would be involved in a given task. For instance, if queried to list the actions involved in scooping marbles from one bowl into another, an LLM might produce a sequence of verbs such as “reach,” “scoop,” “transport,” and “pour.”
“LLMs have a way to tell you how to do each step of a task, in natural language. A human’s continuous demonstration is the embodiment of those steps, in physical space,” Wang says. “And we wanted to connect the two, so that a robot would automatically know what stage it is in a task, and be able to replan and recover on its own.”
Mapping marbles
For their new approach, the team developed an algorithm to automatically connect an LLM’s natural language label for a particular subtask with a robot’s position in physical space or an image that encodes the robot state. Mapping a robot’s physical coordinates, or an image of the robot state, to a natural language label is known as “grounding.” The team’s new algorithm is designed to learn a grounding “classifier,” meaning that it learns to automatically identify what semantic subtask a robot is in — for example, “reach” versus “scoop” — given its physical coordinates or an image view.
“The grounding classifier facilitates this dialogue between what the robot is doing in the physical space and what the LLM knows about the subtasks, and the constraints you have to pay attention to within each subtask,” Wang explains.
The team demonstrated the approach in experiments with a robotic arm that they trained on a marble-scooping task. Experimenters trained the robot by physically guiding it through the task of first reaching into a bowl, scooping up marbles, transporting them over an empty bowl, and pouring them in. After a few demonstrations, the team then used a pretrained LLM and asked the model to list the steps involved in scooping marbles from one bowl to another. The researchers then used their new algorithm to connect the LLM’s defined subtasks with the robot’s motion trajectory data. The algorithm automatically learned to map the robot’s physical coordinates in the trajectories and the corresponding image view to a given subtask.
The team then let the robot carry out the scooping task on its own, using the newly learned grounding classifiers. As the robot moved through the steps of the task, the experimenters pushed and nudged the bot off its path, and knocked marbles off its spoon at various points. Rather than stop and start from the beginning again, or continue blindly with no marbles on its spoon, the bot was able to self-correct, and completed each subtask before moving on to the next. (For instance, it would make sure that it successfully scooped marbles before transporting them to the empty bowl.)
“With our method, when the robot is making mistakes, we don’t need to ask humans to program or give extra demonstrations of how to recover from failures,” Wang says. “That’s super exciting because there’s a huge effort now toward training household robots with data collected on teleoperation systems. Our algorithm can now convert that training data into robust robot behavior that can do complex tasks, despite external perturbations.”
0 notes
itinerancianet · 6 years ago
Text
Llevamos años usando mal las redes neuronales: ahora sabemos cómo hacerlas hasta 10 veces más pequeñas sin perder rendimiento
Llevamos años usando mal las redes neuronales: ahora sabemos cómo hacerlas hasta 10 veces más pequeñas sin perder rendimiento
Tumblr media
Esta semana se ha celebrado en Nueva Orleans la séptima edición del ICLR (International Conference on Learning Representations), uno de los grandes eventos científicos mundiales en torno a la inteligencia artificial. Un aspecto destacable de esta edición ha sido uno de los trabajos académicos premiados en el mismo.
Escrito por Michael Carbin y Jonathan Frankle, ambos investigadores del MIT,…
View On WordPress
0 notes
jcmarchi · 1 year ago
Text
Engineering household robots to have a little common sense
New Post has been published on https://thedigitalinsider.com/engineering-household-robots-to-have-a-little-common-sense/
Engineering household robots to have a little common sense
From wiping up spills to serving up food, robots are being taught to carry out increasingly complicated household tasks. Many such home-bot trainees are learning through imitation; they are programmed to copy the motions that a human physically guides them through.
It turns out that robots are excellent mimics. But unless engineers also program them to adjust to every possible bump and nudge, robots don’t necessarily know how to handle these situations, short of starting their task from the top.
Now MIT engineers are aiming to give robots a bit of common sense when faced with situations that push them off their trained path. They’ve developed a method that connects robot motion data with the “common sense knowledge” of large language models, or LLMs.
Their approach enables a robot to logically parse many given household task into subtasks, and to physically adjust to disruptions within a subtask so that the robot can move on without having to go back and start a task from scratch — and without engineers having to explicitly program fixes for every possible failure along the way.   
Image courtesy of the researchers.
“Imitation learning is a mainstream approach enabling household robots. But if a robot is blindly mimicking a human’s motion trajectories, tiny errors can accumulate and eventually derail the rest of the execution,” says Yanwei Wang, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS). “With our method, a robot can self-correct execution errors and improve overall task success.”
Wang and his colleagues detail their new approach in a study they will present at the International Conference on Learning Representations (ICLR) in May. The study’s co-authors include EECS graduate students Tsun-Hsuan Wang and Jiayuan Mao, Michael Hagenow, a postdoc in MIT’s Department of Aeronautics and Astronautics (AeroAstro), and Julie Shah, the H.N. Slater Professor in Aeronautics and Astronautics at MIT.
Language task
The researchers illustrate their new approach with a simple chore: scooping marbles from one bowl and pouring them into another. To accomplish this task, engineers would typically move a robot through the motions of scooping and pouring — all in one fluid trajectory. They might do this multiple times, to give the robot a number of human demonstrations to mimic.
“But the human demonstration is one long, continuous trajectory,” Wang says.
The team realized that, while a human might demonstrate a single task in one go, that task depends on a sequence of subtasks, or trajectories. For instance, the robot has to first reach into a bowl before it can scoop, and it must scoop up marbles before moving to the empty bowl, and so forth. If a robot is pushed or nudged to make a mistake during any of these subtasks, its only recourse is to stop and start from the beginning, unless engineers were to explicitly label each subtask and program or collect new demonstrations for the robot to recover from the said failure, to enable a robot to self-correct in the moment.
“That level of planning is very tedious,” Wang says.
Instead, he and his colleagues found some of this work could be done automatically by LLMs. These deep learning models process immense libraries of text, which they use to establish connections between words, sentences, and paragraphs. Through these connections, an LLM can then generate new sentences based on what it has learned about the kind of word that is likely to follow the last.
For their part, the researchers found that in addition to sentences and paragraphs, an LLM can be prompted to produce a logical list of subtasks that would be involved in a given task. For instance, if queried to list the actions involved in scooping marbles from one bowl into another, an LLM might produce a sequence of verbs such as “reach,” “scoop,” “transport,” and “pour.”
“LLMs have a way to tell you how to do each step of a task, in natural language. A human’s continuous demonstration is the embodiment of those steps, in physical space,” Wang says. “And we wanted to connect the two, so that a robot would automatically know what stage it is in a task, and be able to replan and recover on its own.”
Mapping marbles
For their new approach, the team developed an algorithm to automatically connect an LLM’s natural language label for a particular subtask with a robot’s position in physical space or an image that encodes the robot state. Mapping a robot’s physical coordinates, or an image of the robot state, to a natural language label is known as “grounding.” The team’s new algorithm is designed to learn a grounding “classifier,” meaning that it learns to automatically identify what semantic subtask a robot is in — for example, “reach” versus “scoop” — given its physical coordinates or an image view.
“The grounding classifier facilitates this dialogue between what the robot is doing in the physical space and what the LLM knows about the subtasks, and the constraints you have to pay attention to within each subtask,” Wang explains.
The team demonstrated the approach in experiments with a robotic arm that they trained on a marble-scooping task. Experimenters trained the robot by physically guiding it through the task of first reaching into a bowl, scooping up marbles, transporting them over an empty bowl, and pouring them in. After a few demonstrations, the team then used a pretrained LLM and asked the model to list the steps involved in scooping marbles from one bowl to another. The researchers then used their new algorithm to connect the LLM’s defined subtasks with the robot’s motion trajectory data. The algorithm automatically learned to map the robot’s physical coordinates in the trajectories and the corresponding image view to a given subtask.
The team then let the robot carry out the scooping task on its own, using the newly learned grounding classifiers. As the robot moved through the steps of the task, the experimenters pushed and nudged the bot off its path, and knocked marbles off its spoon at various points. Rather than stop and start from the beginning again, or continue blindly with no marbles on its spoon, the bot was able to self-correct, and completed each subtask before moving on to the next. (For instance, it would make sure that it successfully scooped marbles before transporting them to the empty bowl.)
“With our method, when the robot is making mistakes, we don’t need to ask humans to program or give extra demonstrations of how to recover from failures,” Wang says. “That’s super exciting because there’s a huge effort now toward training household robots with data collected on teleoperation systems. Our algorithm can now convert that training data into robust robot behavior that can do complex tasks, despite external perturbations.”
1 note · View note
craigbrownphd-blog-blog · 7 years ago
Text
Data Science, Machine Learning, and AI Conferences and Events 2018
We’ve compiled a list of the hottest events and conferences from the world of Data Science, Machine Learning and Artificial Intelligence happening in 2018. Below are all the links you need to get yourself to these great events! JANUARY 17th – 19th Global Artificial Intelligence Conference, Santa Clara, USA 17th – 20th AI NEXTCon, Seattle, USA 25th – 26th AI on a Social Mission Conference, Montreal, Canada 27th – 30th Applied Machine Learning Days, Lausanne, Switzerland 30th – 31st 3rd International Chatbot Summit, Tel Aviv, Israel FEBRUARY 2nd – 7th 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA 7th – 8th 4th Annual Big Data & Analytics Summit, Toronto, Canada 14th – 17th International Research Conference Robophilosophy, Vienna, Austria 20th Women in AI Dinner, Re-work, London, UK 26th – 27th Gartner Data & Analytics Summit, Sydney, Australia 26th – 28th 10th International Conference on Machine Learning and Computing (ICMLC), Macau,China 27th AI 4 Business Summit, Lint, Belgium MARCH 5th – 6th European Artificial Intelligence Innovation Summit, London, UK 5th – 8th  13th ACM/IEEE International Conference on Human Robot Interaction, Chicago, USA 7th – 8th AI and Sentiment Analysis in Finance, Hong Kong, China 7th – 11th ACM IUI, Tokyo, Japan 19th – 21st Gartner Data & Analytics Summit, London, UK 22nd Data Innovation Summit, Stockholm, Sweden 29th – 31st 10th International Conference on Advanced Computational Intelligence, Xiamen, China APRIL 5th – 6th Future of Information and Communication Conference (FICC), Singapore, Singapore 10th – 13th O’Reilly Artificial Intelligence Conference, Beijing, China 12th Applied Artificial Intelligence Conference, San Francisco, USA 15th – 20th IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada 16th – 17th 5th International Conference on Automation and Robotics, Las Vegas, USA 18th – 19th Big Data & Analytics Innovation Summit, Hong Kong, China 18th – 19th AI Expo Global Conference & Exhibition, London, UK 23rd – 25th RPA & AI Summit, Copenhagen, Denmark 24th – 28th RoboSoft: IEEE-RAS International Conference on Soft Robotics, Livorno, Italy 25th – 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium 30th April – 3rd May International Conference on Learning Representations (ICLR), Vancouver, Canada MAY 1st – 4th Accelerate AI: Open Data Science Conference East, Boston, USA 3rd – 5th SIAM International Conference on Data Mining (SDM18), San Diego, USA 16th – 19th IEEE International Conference on Simulation, Modelling, and Programming for Autonomous Robots (SIMPAR), Brisbane, Australia 17th 4th Rise of AI Conference, Berlin, Germany 21st – 24th Strata Data Conference, London, UK 23rd – 24th LDV Vision Summit, New York, USA 24th – 25th Deep Learning Summit, Boston, USA 31st May – 1st June Dot.AI, Paris, France JUNE 1st – 6th Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), New Orleans, USA 3rd – 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, Australia 6th – 7th Machine Intelligence Summit Re-work, Hong Kong, China 11th – 12th CogX, London, UK 12th – 13th Predictive analytics World Industry 4.0, Munich, Germany 18th – 20th Conference for Machine Learning Innovation, Munich, Germany 18th – 22nd CVPR 2018: Computer Vision and Pattern Recognition, Salt Lake City, USA 20th – 22nd Distributed Computing and Artificial Intelligence (DCAI), Toledo, Spain 25th – 28th 31st International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems, Montreal, Canada 26th – 30th Robotics: Science and Systems, Pittsburgh, USA 27th – 28th AI, Machine Learning and Sentiment Analysis Applied to Finance, London, UK 28th – 29th AI in Industrial Automation Summit, San Francisco, USA JULY 5th – 8th 6th AAAI Conference on Human Computation and Crowdsourcing, Zurich, Switzerland 11th – 15th 18th Industrial Conference on Data Mining ICDM, New York, USA 14th – 19th 14th International Conference on Machine Learning and Data Mining, New York, USA 15th – 20th   Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia 19th – 21st 4th Global Summit and Expo on Multimedia & Artificial Intelligence, Rome, Italy 26th AI Summit, Hong Kong, China 26th – 28th Data Conference, Lisbon, Portugal AUGUST 12th – 16th SIGGRAPH 2018, Vancouver, Canada 20th – 24th ICPR 2018: 24th International Conference on Pattern Recognition, Beijing, China 20th – 24th IEEE International Conference on Automation Science and Engineering, Munich, Germany 20th – 25th International Conference on Computational Linguistics, Santa Fe, USA 21st – 22nd Artificial Intelligence, Robotics & IoT, Paris, France SEPTEMBER 3rd – 6th BMVC 2018: British Machine Vision Conference, Newcastle, UK 4th – 7th O’Reilly Artificial Intelligence Conference, San Francisco, USA 8th – 14th ECCV 2018: European Conference on Computer Vision, Munich, Germany 9th – 12th International Symposium Advances in Artificial Intelligence and Applications, Poznan, Poland 10th – 11th 6th World Convention on Robots and Deep Learning, Singapore, Singapore 11th – 12th Big Data Innovation Summit, Boston, USA 18th – 20th AI Summit San Francisco, San Francisco, USA 18th – 20th International Joint Conference on Computational Intelligence, Seville, Spain 19th – 21st International Conference on Computer-Human Interaction Research and Applications, Seville, Spain 20th – 21st Deep Learning in Healthcare Summit, London, UK 23rd – 25th Auto AI, Berlin, Germany 27th – 29th IEEE Workshop on Advanced Robotics and its Social Impacts, Genova, Italy OCTOBER 1st – 5th IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain 10th – 11th World Summit AI, Amsterdam, Netherlands 17th – 18th Nordic Data Science and Machine Learning Summit, Stockholm, Sweden 17th – 18th Predictive Analytics World, London, UK 22nd – 26th International Conference on Information and Knowledge Management, Turin, Italy 23rd Women in AI Dinner, Toronto, Canada 24th – 25th Predictive Analytics Innovation Summit, Chicago, USA 31st October – 4th November Open Data Science Conference West, San Francisco, USA NOVEMBER 1st – 2nd Big Data & Analytics Innovation Summit, London, UK 13th – 14th Predictive Analytics World, Berlin, Berlin, Germany 15th – 16th Future Technologies Conference 2018, Vancouver, Canada 21st – 22nd Big Data & Analytics Innovation Summit, Beijing, China 28th – 29th AI Expo North America, Santa Clara, USA DECEMBER 3rd – 8th 32nd Annual Conference on Neural Information Processing Systems, Montréal, Canada 7th Machine Learning Innovation Summit, Dublin, Ireland 11th – 13th 38th SGAI International Conference on Artificial Intelligence, Cambridge, UK Please get in touch if there are any great events or conferences you think should be added! https://goo.gl/GbDZhY #DataScience #Cloud
0 notes
kathleenseiber · 5 years ago
Text
‘Early Bird’ makes training AI greener
A new system called Early Bird makes training deep neural networks, a form of artificial intelligence, more energy efficient, researchers report.
Deep neural networks (DNNs) are behind self-driving cars, intelligent assistants, facial recognition, and dozens more high-tech applications.
Early Bird could use 10.7 times less energy to train a DNN to the same level of accuracy or better than typical training.
“A major driving force in recent AI breakthroughs is the introduction of bigger, more expensive DNNs,” says Yingyan Lin, director of the Efficient and Intelligent Computing (EIC) Lab and an assistant professor of electrical and computer engineering in the Brown School of Engineering at Rice University.
“But training these DNNs demands considerable energy. For more innovations to be unveiled, it is imperative to find ‘greener’ training methods that both address environmental concerns and reduce financial barriers of AI research.”
Training cutting-edge DNNs is costly and getting costlier. A 2019 study from the Allen Institute for AI in Seattle found the number of computations needed to train a top-flight deep neural network increased 300,000 times between 2012-2018, and a different 2019 study from researchers at the University of Massachusetts Amherst found the carbon footprint for training a single, elite DNN was roughly equivalent to the lifetime carbon dioxide emissions of five US automobiles.
DNNs contain millions or even billions of artificial neurons that learn to perform specialized tasks. Without any explicit programming, deep networks of artificial neurons can learn to make human-like decisions—and even outperform human experts—by “studying” a large number of previous examples.
For instance, if a DNN studies photographs of cats and dogs, it learns to recognize cats and dogs. AlphaGo, a deep network trained to play the board game Go, beat a professional human player in 2015 after studying tens of thousands of previously played games.
“The state-of-art way to perform DNN training is called progressive prune and train,” says Lin,.
“First, you train a dense, giant network, then remove parts that don’t look important—like pruning a tree. Then you retrain the pruned network to restore performance because performance degrades after pruning. And in practice you need to prune and retrain many times to get good performance.”
Pruning is possible because only a fraction of the artificial neurons in the network can potentially do the job for a specialized task. Training strengthens connections between necessary neurons and reveals which ones can be pruned away. Pruning reduces model size and computational cost, making it more affordable to deploy fully trained DNNs, especially on small devices with limited memory and processing capability.
“The first step, training the dense, giant network, is the most expensive,” Lin says. “Our idea in this work is to identify the final, fully functional pruned network, which we call the ‘early-bird ticket,’ in the beginning stage of this costly first step.”
By looking for key network connectivity patterns early in training, the researchers were able to both discover the existence of early-bird tickets and use them to streamline DNN training. In experiments on various benchmarking data sets and DNN models, they found Early Bird could emerge as little as one-tenth or less of the way through the initial phase of training.
“Our method can automatically identify early-bird tickets within the first 10% or less of the training of the dense, giant networks,” Lin says. “This means you can train a DNN to achieve the same or even better accuracy for a given task in about 10% or less of the time needed for traditional training, which can lead to more than one order savings in both computation and energy.”
Developing techniques to make AI greener is the main focus of Lin’s group. Environmental concerns are the primary motivation, but Lin says there are multiple benefits.
“Our goal is to make AI both more environmentally friendly and more inclusive,” she says. “The sheer size of complex AI problems has kept out smaller players. Green AI can open the door enabling researchers with a laptop or limited computational resources to explore AI innovations.”
The researchers shared a paper on the research at ICLR 2020, the International Conference on Learning Representations. Additional coauthors are from Rice University and Texas A&M University.
The research was supported by the National Science Foundation.
Source: Rice University
The post ‘Early Bird’ makes training AI greener appeared first on Futurity.
‘Early Bird’ makes training AI greener published first on https://triviaqaweb.weebly.com/
0 notes