#AlphaGeometry
Explore tagged Tumblr posts
jcmarchi · 3 months ago
Text
AlphaGeometry2: The AI That Outperforms Human Olympiad Champions in Geometry
New Post has been published on https://thedigitalinsider.com/alphageometry2-the-ai-that-outperforms-human-olympiad-champions-in-geometry/
AlphaGeometry2: The AI That Outperforms Human Olympiad Champions in Geometry
Tumblr media Tumblr media
Artificial intelligence has long been trying to mimic human-like logical reasoning. While it has made massive progress in pattern recognition, abstract reasoning and symbolic deduction have remained tough challenges for AI. This limitation becomes especially evident when AI is being used for mathematical problem-solving, a discipline that has long been a testament to human cognitive abilities such as logical thinking, creativity, and deep understanding. Unlike other branches of mathematics that rely on formulas and algebraic manipulations, geometry is different. It requires not only structured, step-by-step reasoning but also the ability to recognize hidden relationships and the skill to construct extra elements for solving problems.
For a long time, these abilities were thought to be unique to humans. However, Google DeepMind has been working on developing AI that can solve these complex reasoning tasks. Last year, they introduced AlphaGeometry, an AI system that combines the predictive power of neural networks with the structured logic of symbolic reasoning to tackle complex geometry problems. This system made a significant impact by solving 54% of International Mathematical Olympiad (IMO) geometry problems to achieve performance at par with silver medalists. Recently, they took it even further with AlphaGeometry2, which achieved an incredible 84% solve rate to outperform an average IMO gold medalist.
In this article, we will explore key innovations that helped AlphaGeometry2 achieve this level of performance and what this development means for the future of AI in solving complex reasoning problems. But before diving into what makes AlphaGeometry2 special, it’s essential first to understand what AlphaGeometry is and how it works.
AlphaGeometry: Pioneering AI in Geometry Problem-Solving
AlphaGeometry is an AI system designed to solve complex geometry problems at the level of the IMO. It is basically a neuro-symbolic system that combines a neural language model with a symbolic deduction engine. The neural language model helps the system predict new geometric constructs, while symbolic AI applies formal logic to generate proofs. This setup allows AlphaGeometry to think more like a human by combining the pattern recognition capabilities of neural networks, which replicate intuitive human thinking, with the structured reasoning of formal logic, which mimics human deductive reasoning abilities. One of the key innovations in AlphaGeometry was how it generated training data. Instead of relying on human demonstrations, it created one billion random geometric diagrams and systematically derived relationships between points and lines. This process created a massive dataset of 100 million unique examples, helping the neural model predict functional geometric constructs and guiding the symbolic engine toward accurate solutions. This hybrid approach enabled AlphaGeometry to solve 25 out of 30 Olympiad geometry problems within standard competition time, closely matching the performance of top human competitors.
How AlphaGeometry2 Achieves Improved Performance
While AlphaGeometry was a breakthrough in AI-driven mathematical reasoning, it had certain limitations. It struggled with solving complex problems, lacked efficiency in handling a wide range of geometry challenges, and had limitations in problem coverage. To overcome these hurdles, AlphaGeometry2 introduces a series of significant improvements:
Expanding AI’s Ability to Understand More Complex Geometry Problems
One of the most significant improvements in AlphaGeometry2 is its ability to work with a broader range of geometry problems. The former AlphaGeometry struggled with issues that involved linear equations of angles, ratios, and distances, as well as those that required reasoning about moving points, lines, and circles. AlphaGeometry2 overcomes these limitations by introducing a more advanced language model that allows it to describe and analyze these complex problems. As a result, it can now tackle 88% of all IMO geometry problems from the last two decades, a significant increase from the previous 66%.
A Faster and More Efficient Problem-Solving Engine
Another key reason AlphaGeometry2 performs so well is its improved symbolic engine. This engine, which serves as the logical core of this system, has been enhanced in several ways. First, it is improved to work with a more refined set of problem-solving rules which makes it more effective and faster. Second, it can now recognize when different geometric constructs represent the same point in a problem, allowing it to reason more flexibly. Finally, the engine has been rewritten in C++ rather than Python, making it over 300 times faster than before. This speed boost allows AlphaGeometry2 to generate solutions more quickly and efficiently.
Training the AI with More Complex and Varied Geometry Problems
The effectiveness of AlphaGeometry2’s neural model comes from its extensive training in synthetic geometry problems. AlphaGeometry initially generated one billion random geometric diagrams to create 100 million unique training examples. AlphaGeometry2 takes this a step further by generating more extensive and more complex diagrams that include intricate geometric relationships. Additionally, it now incorporates problems that require the introduction of auxiliary constructions—newly defined points or lines that help solve a problem, allowing it to predict and generate more sophisticated solutions
Finding the Best Path to a Solution with Smarter Search Strategies
A key innovation of AlphaGeometry2 is its new search approach, called the Shared Knowledge Ensemble of Search Trees (SKEST). Unlike its predecessor, which relied on a basic search method, AlphaGeometry2 runs multiple searches in parallel, with each search learning from the others. This technique allows it to explore a broader range of possible solutions and significantly improves the AI’s ability to solve complex problems in a shorter amount of time.
Learning from a More Advanced Language Model
Another key factor behind AlphaGeometry2’s success is its adoption of Google’s Gemini model, a state-of-the-art AI model that has been trained on an even more extensive and more diverse set of mathematical problems. This new language model improves AlphaGeometry2’s ability to generate step-by-step solutions due to its improved chain-of-thought reasoning. Now, AlphaGeometry2 can approach the problems in a more structured way. By fine-tuning its predictions and learning from different types of problems, the system can now solve a much more significant percentage of Olympiad-level geometry questions.
Achieving Results That Surpass Human Olympiad Champions
Thanks to the above advancements, AlphaGeometry2 solves 42 out of 50 IMO geometry problems from 2000-2024, achieving an 84% success rate. These results surpass the performance of an average IMO gold medalist and set a new standard for AI-driven mathematical reasoning. Beyond its impressive performance, AlphaGeometry2 is also making strides in automating theorem proving, bringing us closer to AI systems that can not only solve geometry problems but also explain their reasoning in a way that humans can understand
The Future of AI in Mathematical Reasoning
The progress from AlphaGeometry to AlphaGeometry2 shows how AI is getting better at handling complex mathematical problems that require deep thinking, logic, and strategy. It also signifies that AI is no longer just about recognizing patterns—it can reason, make connections, and solve problems in ways that feel more like human-like logical reasoning.
AlphaGeometry2 also shows us what AI might be capable of in the future. Instead of just following instructions, AI could start exploring new mathematical ideas on its own and even help with scientific research. By combining neural networks with logical reasoning, AI might not just be a tool that can automate simple tasks but a qualified partner that helps expand human knowledge in fields that rely on critical thinking.
Could we be entering an era where AI proves theorems and makes new discoveries in physics, engineering, and biology? As AI shifts from brute-force calculations to more thoughtful problem-solving, we might be on the verge of a future where humans and AI work together to uncover ideas we never thought possible.
0 notes
lifetechweb · 9 months ago
Text
IA na Olimpíada Internacional de Matemática: como AlphaProof e AlphaGeometry 2 alcançaram o padrão de medalha de prata
O raciocínio matemático é um aspecto vital das habilidades cognitivas humanas, impulsionando o progresso em descobertas científicas e desenvolvimentos tecnológicos. À medida que nos esforçamos para desenvolver inteligência artificial geral que corresponda à cognição humana, equipar a IA com capacidades avançadas de raciocínio matemático é essencial. Embora os sistemas de IA atuais possam lidar…
Tumblr media
View On WordPress
0 notes
ibmathsresources · 1 year ago
Text
AI Masters Olympiad Geometry
AI Masters Olympiad Geometry The team behind Google’s Deep Mind have just released details of a new AI system:  AlphaGeometry This has been specifically trained to solve classical geometry problems – and already is now at the level of a Gold Medalist at the International Olympiad (considering only geometry problems).  This is an incredible achievement – as in order to solve classical geometry…
Tumblr media
View On WordPress
0 notes
appslookup · 1 year ago
Text
AlphaGeometry: DeepMind’s AI Masters Geometry Problems at Olympiad Levels
#AlphaGeometry is indeed an impressive feat in the advancement of AI! It represents a significant leap in the capability of AI to tackle complex geometry problems at the level of Olympiad-level competitions like the International Mathematical Olympiad (IMO). By AppsLookup
0 notes
definitelytzar · 1 year ago
Link
0 notes
govindhtech · 9 months ago
Text
AlphaProof: Google AI Systems To Think Like Mathematicians
Tumblr media
AlphaProof and AlphaGeometry 2
Google AI systems advance towards thinking by making strides in maths. One question was answered in minutes, according to a blog post by Google, but other questions took up to three days to answer longer than the competition’s time limit. Nevertheless, the scores are among the highest achieved by an Al system in the competition thus far.
Google, a division of Alphabet, showcased two artificial intelligence systems that showed improvements in generative Al development the ability to solve challenging mathematical problems.
The current breed of AI models has had difficulty with abstract arithmetic since it demands more reasoning power akin to human intellect. These models operate by statistically anticipating the following word.
The company’s Al division, DeepMind, released data demonstrating that its recently developed Al models, namely AlphaProof and AlphaGeometry 2, answered four of every six questions in the 2024 International Math Olympiad, a well-known tournament for high school students.
One question was answered in minutes, according to a blog post by Google, but other questions took up to three days to answer longer than the competition’s time limit. Nevertheless, the scores are among the highest achieved by an Al system in the competition thus far.
AlphaZero
The business said that AlphaZero, another Al system that has previously defeated humans at board games like chess and go, and a version of Gemini, the language model underlying its chatbot of the same name, were combined to produce AlphaProof, a reasoning-focused system. Only five out of the more than 600 human competitors were able to answer the most challenging question, which was one of the three questions that AlphaProof answered correctly.
AlphaGeometry 2
AlphaGeometry 2 solved another math puzzle. It was previously reported in July that OpenAI, supported by Microsoft, was working on reasoning technology under the code name “Strawberry.” As Reuters first revealed, the project, originally known as Q, was regarded as such a breakthrough that several staff researchers warned OpenAI’s board of directors in a letter they wrote in November, stating that it could endanger humankind.
The top choice for document editing and proofreading is AlphaProof. The demand for accurate and efficient services is growing in the digital age. It stands out as a leading option, offering excellent services to guarantee your documents are flawless. In order to show why AlphaProof is unique in the industry, this article explores its features, advantages, and user experiences.
How does AlphaProof work?
AlphaProof a feature-rich online tool, handles all editing and proofreading needs. It offers specialized services to increase the quality and readability of your documents for professionals, students, and company owners. AlphaProof publishes technical documentation, corporate reports, creative writing, and academic essays.
Essential Elements of AlphaProof
Expert Proofreading
To fix typographical, punctuation, and grammar flaws in your documents, AlphaProof has a team of highly skilled proofreaders who carefully go over them. This guarantees that your text looks professional and is free of common mistakes.
Complex Editing
It provides sophisticated editing services in addition to basic proofreading. This entails streamlining the sentence structure, boosting readability overall, and strengthening coherence and flow. Better word selections and stylistic enhancements are also suggested by the editors.
Editors with specific expertise
AlphaProof recognizes that varying documents call for varying levels of competence. It boasts a diverse team of editors with skills in technical writing, business communication, academic writing, and creative writing. This guarantees that an individual possessing pertinent expertise and experience will evaluate your material.
Quick Resolution
Quick turnaround times are provided by AlphaProof to help you meet deadlines. You can choose 24-hour express service to ensure your document is available when you need it.
Easy-to-use interface
The AlphaProof platform boasts an intuitive interface that facilitates the uploading of documents, selection of services, and tracking of order status. From beginning to end, the procedure is simplified to offer a hassle-free experience.
Secrecy and Protection
The security and privacy of your papers are very important to it. The platform uses cutting-edge encryption technology to safeguard your data, and every file is handled with the highest care.
The Advantages of AlphaProof Use
Better Document Quality
The quality of your documents can be greatly improved by utilising it’s services. This can result in more professionalism in corporate communication, higher grades, and a more positive impression on your readers.
Reduce Effort and Time
Editing and proofreading can be laborious processes. With AlphaProof, you can focus on your primary responsibilities while professionals optimize your papers, saving you time and effort.
Customized Offerings
To address the unique requirements of various document formats, It offers customized services. AlphaProof may provide you with comprehensive editing for a research paper or expeditious proofreading for an email.
Knowledgeable Perspectives
The editor’s comments and recommendations on it can give you important information about your writing style and areas that need work. With time, this can assist you in improving as a writer.
A Boost in Self-Assurance
You may feel more confident in the calibre of your work if you know it has been expertly edited and proofread. For high-stakes papers like published articles, commercial proposals, and theses from academic institutions, this is especially crucial.
Customer Experiences
Scholars and Students
AlphaProof has proven to be a useful resource for numerous academics and students. A postgraduate student said, “AlphaProof enabled me to refine my thesis to the ideal level.” The final draft was error-free, and the editors’ suggestions were wise.”
Composers and Novelists
The specialized editing services provided by AlphaProof are valued by authors and creative writers. A budding writer said, “it’s editors understood my voice and style, providing feedback that improved my manuscript without altering my unique voice.”
In conclusion
With a variety of features and advantages to meet a wide range of demands, AlphaProof stands out as a top option for document editing and proofreading. It guarantees that your documents are flawless, saving you time and improving the calibre of your work. It does this through its skilled staff, quick return times, and intuitive interface.
Read more on govindhtech.com
0 notes
gamingavickreyauction · 9 months ago
Text
I haven't seen anyone talk yet about the fact that an AI solved 4/6 of this year's IMO problems. Is there some way they fudged it so that it's not as big a deal as it seems? (I do not count more time as fudging- you could address that by adding more compute. I also do not count giving the question already formalised as fudging, as AIs can already do that).
I ask because I really want this not to be a big deal, because the alternative is scary. I thought this would be one of the last milestones for AI surpassing human intelligence, and it seems like the same reasoning skills required for this problem would be able to solve a vast array of other important problems. And potentially it's a hop and a skip away from outperforming humans in research mathematics.
I did not think we were anywhere near this point, and I was already pretty worried about the societal upheaval that neural networks will cause.
4 notes · View notes
librarianrafia · 1 year ago
Text
"There’s no one thing called “AI”
The question of what AI can and can’t do is made very challenging to navigate by a frustrating tendency that I’ve observed among many commentators to blur the lines between hierarchical levels of AI technology. ....
Tumblr media
AI is too broad and fuzzy to cleanly decompose into a proper hierarchy, but there are a few ways to impose a messy order on it. ...
Frequently, reporting on new technology will collapse this huge category into a single amorphous entity, ascribing any properties of its individual elements to AI at large. .... All of this really makes it seem like “an AI” is a discrete kind of thing that is manning chat bots, solving unsolved math problems, and beating high schoolers at geometry Olympiads. But this isn’t remotely the case. FunSearch, AlphaGeometry, and ChatGPT are three completely different kinds of technologies which do three completely different kinds of things are are not at all interchangeable or even interoperable. You can’t have a conversation with AlphaGeometry and and ChatGPT can’t solve geometry Olympiad problems.
... I believe that this property, where there are many ways to appear to have done it (by outputting a million random digits, for example), but only a very small number of ways to actually do it (by outputting the correct million digits), is characteristic of things that Generative AI systems will generally be bad at. ChatGPT works by making repeated guesses. At any given point in its attempt to generate the decimal digits of π, there are 10 digits to choose from, only one of which is the right one. The probability that it’s going to make a million correct guesses in a row is infinitesimally small, so small that we might as well call it zero. For this reason, this particular task is not one that’s well suited to this particular type of text generation.
...We can see this same pattern in other generative AI systems as well, where the system seems to perform well if the success criteria are quite general, but increasing specificity causes failures.
2 notes · View notes
cleverhottubmiracle · 2 months ago
Link
[ad_1] Research Published 17 January 2024 Authors Trieu Trinh and Thang Luong Our AI system surpasses the state-of-the-art approach for geometry problems, advancing AI reasoning in mathematicsReflecting the Olympic spirit of ancient Greece, the International Mathematical Olympiad is a modern-day arena for the world's brightest high-school mathematicians. The competition not only showcases young talent, but has emerged as a testing ground for advanced AI systems in math and reasoning.In a paper published today in Nature, we introduce AlphaGeometry, an AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist - a breakthrough in AI performance. In a benchmarking test of 30 Olympiad geometry problems, AlphaGeometry solved 25 within the standard Olympiad time limit. For comparison, the previous state-of-the-art system solved 10 of these geometry problems, and the average human gold medalist solved 25.9 problems. In our benchmarking set of 30 Olympiad geometry problems (IMO-AG-30), compiled from the Olympiads from 2000 to 2022, AlphaGeometry solved 25 problems under competition time limits. This is approaching the average score of human gold medalists on these same problems. The previous state-of-the-art approach, known as “Wu’s method”, solved 10. AI systems often struggle with complex problems in geometry and mathematics due to a lack of reasoning skills and training data. AlphaGeometry’s system combines the predictive power of a neural language model with a rule-bound deduction engine, which work in tandem to find solutions. And by developing a method to generate a vast pool of synthetic training data - 100 million unique examples - we can train AlphaGeometry without any human demonstrations, sidestepping the data bottleneck.With AlphaGeometry, we demonstrate AI’s growing ability to reason logically, and to discover and verify new knowledge. Solving Olympiad-level geometry problems is an important milestone in developing deep mathematical reasoning on the path towards more advanced and general AI systems. We are open-sourcing the AlphaGeometry code and model, and hope that together with other tools and approaches in synthetic data generation and training, it helps open up new possibilities across mathematics, science, and AI. “ It makes perfect sense to me now that researchers in AI are trying their hands on the IMO geometry problems first because finding solutions for them works a little bit like chess in the sense that we have a rather small number of sensible moves at every step. But I still find it stunning that they could make it work. It's an impressive achievement. Ngô Bảo Châu, Fields Medalist and IMO gold medalist AlphaGeometry adopts a neuro-symbolic approachAlphaGeometry is a neuro-symbolic system made up of a neural language model and a symbolic deduction engine, which work together to find proofs for complex geometry theorems. Akin to the idea of “thinking, fast and slow”, one system provides fast, “intuitive” ideas, and the other, more deliberate, rational decision-making.Because language models excel at identifying general patterns and relationships in data, they can quickly predict potentially useful constructs, but often lack the ability to reason rigorously or explain their decisions. Symbolic deduction engines, on the other hand, are based on formal logic and use clear rules to arrive at conclusions. They are rational and explainable, but they can be “slow” and inflexible - especially when dealing with large, complex problems on their own.AlphaGeometry’s language model guides its symbolic deduction engine towards likely solutions to geometry problems. Olympiad geometry problems are based on diagrams that need new geometric constructs to be added before they can be solved, such as points, lines or circles. AlphaGeometry’s language model predicts which new constructs would be most useful to add, from an infinite number of possibilities. These clues help fill in the gaps and allow the symbolic engine to make further deductions about the diagram and close in on the solution. AlphaGeometry solving a simple problem: Given the problem diagram and its theorem premises (left), AlphaGeometry (middle) first uses its symbolic engine to deduce new statements about the diagram until the solution is found or new statements are exhausted. If no solution is found, AlphaGeometry’s language model adds one potentially useful construct (blue), opening new paths of deduction for the symbolic engine. This loop continues until a solution is found (right). In this example, just one construct is required. AlphaGeometry solving an Olympiad problem: Problem 3 of the 2015 International Mathematics Olympiad (left) and a condensed version of AlphaGeometry’s solution (right). The blue elements are added constructs. AlphaGeometry’s solution has 109 logical steps. Generating 100 million synthetic data examplesGeometry relies on understanding of space, distance, shape, and relative positions, and is fundamental to art, architecture, engineering and many other fields. Humans can learn geometry using a pen and paper, examining diagrams and using existing knowledge to uncover new, more sophisticated geometric properties and relationships. Our synthetic data generation approach emulates this knowledge-building process at scale, allowing us to train AlphaGeometry from scratch, without any human demonstrations.Using highly parallelized computing, the system started by generating one billion random diagrams of geometric objects and exhaustively derived all the relationships between the points and lines in each diagram. AlphaGeometry found all the proofs contained in each diagram, then worked backwards to find out what additional constructs, if any, were needed to arrive at those proofs. We call this process “symbolic deduction and traceback”. Visual representations of the synthetic data generated by AlphaGeometry That huge data pool was filtered to exclude similar examples, resulting in a final training dataset of 100 million unique examples of varying difficulty, of which nine million featured added constructs. With so many examples of how these constructs led to proofs, AlphaGeometry’s language model is able to make good suggestions for new constructs when presented with Olympiad geometry problems. Pioneering mathematical reasoning with AIThe solution to every Olympiad problem provided by AlphaGeometry was checked and verified by computer. We also compared its results with previous AI methods, and with human performance at the Olympiad. In addition, Evan Chen, a math coach and former Olympiad gold-medalist, evaluated a selection of AlphaGeometry’s solutions for us.Chen said: “AlphaGeometry's output is impressive because it's both verifiable and clean. Past AI solutions to proof-based competition problems have sometimes been hit-or-miss (outputs are only correct sometimes and need human checks). AlphaGeometry doesn't have this weakness: its solutions have machine-verifiable structure. Yet despite this, its output is still human-readable. One could have imagined a computer program that solved geometry problems by brute-force coordinate systems: think pages and pages of tedious algebra calculation. AlphaGeometry is not that. It uses classical geometry rules with angles and similar triangles just as students do.” “ AlphaGeometry's output is impressive because it's both verifiable and clean…It uses classical geometry rules with angles and similar triangles just as students do. Evan Chen, math coach and Olympiad gold medalist As each Olympiad features six problems, only two of which are typically focused on geometry, AlphaGeometry can only be applied to one-third of the problems at a given Olympiad. Nevertheless, its geometry capability alone makes it the first AI model in the world capable of passing the bronze medal threshold of the IMO in 2000 and 2015.In geometry, our system approaches the standard of an IMO gold-medalist, but we have our eye on an even bigger prize: advancing reasoning for next-generation AI systems. Given the wider potential of training AI systems from scratch with large-scale synthetic data, this approach could shape how the AI systems of the future discover new knowledge, in math and beyond.AlphaGeometry builds on Google DeepMind and Google Research’s work to pioneer mathematical reasoning with AI – from exploring the beauty of pure mathematics to solving mathematical and scientific problems with language models. And most recently, we introduced FunSearch, which made the first discoveries in open problems in mathematical sciences using Large Language Models.Our long-term goal remains to build AI systems that can generalize across mathematical fields, developing the sophisticated problem-solving and reasoning that general AI systems will depend on, all the while extending the frontiers of human knowledge. Learn more about AlphaGeometry AcknowledgementsThis project is a collaboration between the Google DeepMind team and the Computer Science Department of New York University. The authors of this work include Trieu Trinh, Yuhuai Wu, Quoc Le, He He, and Thang Luong. We thank Rif A. Saurous, Denny Zhou, Christian Szegedy, Delesley Hutchins, Thomas Kipf, Hieu Pham, Petar Veličković, Edward Lockhart, Debidatta Dwibedi, Kyunghyun Cho, Lerrel Pinto, Alfredo Canziani, Thomas Wies, He He’s research group, Evan Chen, Mirek Olsak, Patrik Bak for their help and support. We would also like to thank Google DeepMind leadership for the support, especially Ed Chi, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. [ad_2] Source link
0 notes
jcmarchi · 3 months ago
Text
The Sequence Radar # : The Amazing AlphaGeometry2 Now Achieved Gold Medalist in Math Olympiads
New Post has been published on https://thedigitalinsider.com/the-sequence-radar-the-amazing-alphageometry2-now-achieved-gold-medalist-in-math-olympiads/
The Sequence Radar # : The Amazing AlphaGeometry2 Now Achieved Gold Medalist in Math Olympiads
DeepMind was able to improve the model just a few months after its first release.
Created Using Midjourney
Next Week in The Sequence:
Our series about RAG continues with an expxloration of Self-RAG. The engineering section dives into Txtai, a new framework for LLM workflows. In research we are going to dive into DeepSeek-R1( finally). And in the opinion section we will discuss another controversial topics in AI.
You can subscribe to The Sequence below:
TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
📝 Editorial: The Amazing AlphaGeometry2 Now Achieved Gold Medalist in Math Olympiads
DeepMind’s journey toward mathematical AI dominance took a major leap last year when AlphaProof and AlphaGeometry nearly clinched gold at the International Math Olympiad (IMO). Now, with the latest upgrade, AlphaGeometry2 (AG2) has officially surpassed top human competitors in geometry, marking a milestone in AI-driven mathematical reasoning. The general consensus among IMO competitors is that geometry problems are among the toughest in each day of the Olympiad.
AlphaGeometry2 (AG2), an improved version of AlphaGeometry, was released in early 2025 and has demonstrated gold-medalist level performance in solving Olympiad geometry problems. The system builds upon its predecessor by expanding its domain-specific language to handle more complex problems, including those with object movements and linear equations involving angles, ratios, and distances. The coverage rate of the AG2 language on International Math Olympiad (IMO) geometry problems from 2000-2024 increased from 66% to 88%. Furthermore, AG2 utilizes a Gemini architecture for better language modeling and incorporates a knowledge-sharing mechanism that combines multiple search trees, improving its overall solving rate to 84% on IMO geometry problems from the past 25 years, compared to 54% previously. This enhanced performance has allowed AG2 to surpass an average IMO gold medalist. The system also achieved a silver-medal standard at IMO 2024.
The key improvements in AG2 can be attributed to several factors. The domain language was expanded to cover locus-type theorems, linear equations, and non-constructive problem statements. A stronger and faster symbolic engine was developed, featuring an optimized rule set, added handling of double points, and a faster implementation in C++. The system utilizes an advanced novel search algorithm that employs multiple search trees with knowledge sharing. An enhanced language model, leveraging the Gemini architecture and trained on a larger and more diverse dataset, was also implemented. The original AlphaGeometry (AG1) used a domain-specific language with nine basic predicates. AG2 now includes additional predicates to improve its handling of angle, ratio, and linear equation problems, expanding its mathematical understanding. Furthermore, AG2 introduces eleven locus cases with corresponding predicate syntax to handle movements of objects. To support topological/non-degeneracy conditions, AG2 incorporates predicates for diagram checks. The expansion of the domain language allowed AG2 to cover 88% of all 2000-2024 IMO geometry problems.
AG2’s symbolic engine, named DDAR (Deductive Database Arithmetic Reasoning), has also been greatly enhanced. The symbolic engine computes the deduction closure, which is the set of all deducible facts from a core set of initial facts. AG2 incorporates the capability of handling double points, which allows the system to reason about points with different names but the same coordinates. The algorithm has been made more efficient by hard-coding the search for essential rules, which has reduced the number of queries for the AR sub-engine to at most cubic. A new algorithm, DDAR2, was designed to make the search for similar triangles and cyclic quadrilaterals faster. The core computation of DDAR was implemented in C++, achieving a 300x speed improvement. The enhanced symbolic engine is crucial for both training data generation and proof search.
AG2’s training data was improved by scaling up resources, exploring more complex random diagrams, generating more complex theorems and proofs, and creating a more balanced distribution of question types and problems with and without auxiliary points. The data generating algorithm also produces problems of the “locus” type, which was not supported in AG1. The data generation algorithm was also made faster using a greedy discarding algorithm. The new search algorithm, SKEST (Shared Knowledge Ensemble of Search Trees), employs multiple search trees with different configurations running in parallel and sharing facts, and multiple language models for each search tree configuration are used to improve system robustness. The language model itself is a sparse mixture-of-expert Transformer-based model that leverages the Gemini training pipeline. AG2 also utilizes a more sophisticated neuro-symbolic interface by providing the language model with additional information about deductions made by DDAR. Through these advancements, AlphaGeometry2 represents a significant step forward in AI’s ability to tackle challenging mathematical reasoning tasks.
AG2’s performance suggests we are on the cusp of AI surpassing human capabilities in competitive mathematics—an achievement that could pave the way for advancements in broader scientific and logical reasoning tasks. While AG2 demonstrates AI’s ability to master geometry, similar breakthroughs in physics and chemistry Olympiads remain unexplored. These fields introduce additional challenges such as experimental validation and real-world data interpretation, but AG2’s success suggests that similar neuro-symbolic approaches could be adapted for broader scientific discovery.
🔎 AI Research
SafeRAG
In the paper“SafeRAG: A Security Evaluation Benchmark for Retrieval-Augmented Generation”, researchers from several AI labs introduce SafeRAG, a benchmark designed to evaluate the security vulnerabilities of Retrieval-Augmented Generation (RAG) systems against data injection attacks. The study identifies four critical attack surfaces—noise, conflict, toxicity, and Denial-of-Service (DoS)—and demonstrates significant weaknesses in the retriever, filter, and generator components of RAG pipelines.
Self-MoA
In the paper“Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?”, researchers from Princeton University introduce Self-MoA, an ensemble method that aggregates outputs from a single top-performing Large Language Model (LLM), which surprisingly outperforms standard Mixture-of-Agents (MoA) that combines different LLMs. The paper also presents Self-MoA-Seq, a sequential version of Self-MoA that iteratively aggregates outputs, and their findings highlight that MoA performance is sensitive to model quality.
Transformers and RL
In the paper “Improving Transformer World Models for Data-Efficient RL”, researchers from Google DeepMind present improvements to vision-based Model-Based Reinforcement Learning (MBRL) agents that use transformer world models for background planning. Key contributions include training policies on both real and imagined trajectories, implementing a nearest-neighbor tokenizer (NNT) for patches, and using block teacher forcing (BTF) to train the world model, ultimately achieving higher rewards than previous state-of-the-art methods on the Craftax-classic benchmark.
Chain-of-Action-Thought
In the paper “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search“, researchers introduce the Chain-of-Action-Thought (COAT) mechanism, which enables Large Language Models (LLMs) to perform meta-actions during problem-solving, using a novel two-stage training paradigm involving format tuning and reinforcement learning with “Restart and Explore” (RAE) techniques. This approach results in Satori, a 7B LLM, which shows strong performance on both in-domain and out-of-domain tasks, leveraging a multi-agent framework for generating high-quality reasoning trajectories.
ZebraLogic
In the paper“ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning”, researchers from the University of Washington, Allen Institute for AI, and Stanford University introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance on logic grid puzzles derived from constraint satisfaction problems (CSPs). This framework enables the generation of puzzles with controllable and quantifiable complexity to study the scaling limits of models such as Llama, o1, and DeepSeek-R1. The study reveals a significant decline in accuracy as problem complexity grows, termed the “curse of complexity,” even with larger models and increased inference-time computation.
Edge LLMs
In a blog post titled “Advances to low-bit quantization enable LLMs on edge devices”, researchers from Microsoft Research discuss how low-bit quantization can enable the deployment of large language models (LLMs) on edge devices by compressing models and reducing memory demands. They developed three techniques: Ladder, T-MAC, and LUT Tensor Core, to address challenges in mixed-precision matrix multiplication (mpGEMM) and improve computational efficiency for LLMs on resource-constrained devices. These innovations include a data type compiler, a table lookup method, and a hardware design for low-bit LLM inference.
🤖 AI Tech Releases
Deep Research
OpenAI introduced Deep Research, its AI reasoning agent.
Gemini 2.0 GA
Google made Gemini 2.0 available to all users.
DABStep
Hugging Face open sourced DABStep, a multi step reasoning bechmark consisting of 450 tasks.
🛠 Real World AI
Bug Catching at Meta
Meta shares some of the details behind their LLM solution for bug catching.
Slack AI Performance
Slack shares some of the best practices to maintain the performance of its Slack AI platform.
LLM-Powered Search at Yelp
Yelp discusses the use of LLM for its search capabilities.
📡AI Radar
Safe Superintelligence, the startup founded by OpenAI’s former chief scientist Ilya Sutskever, is in talks to raise capital at a $20 billion valuation.
Krutrim Labs launched to become a frontier AI lab in India.
OpenAI co-founder John Schulman left Anthropic after just five months and is joining Mira Murati in her new startup.
Mistral announced its new assistant for Android and iOS application.
Amazon is planning to spend $100 billion in AI.
TrueFoundry, an AI opps platform, raised $19 million in a Series A.
GitHub Copilot introduced its “agent mode” capability for taking actions in a code base.
Enterprise AI platform Cognida.ai raised $15 million in new funding.
Avelios, an AI platform healthcare system automation, raised $30 million.
PromptLayer, a promot engineering platform, raised $4.8 million in a new round.
AI productivity app Tana raised $14 million in a new round.
AI-based programmatic ad platform StackAdapt, raised a new $235 million round.
TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
0 notes
er-10-media · 3 months ago
Text
DeepMind представила «золотую» ИИ-модель AlphaGeometry2
New Post has been published on https://er10.kz/read/it-novosti/deepmind-predstavila-zolotuju-ii-model-alphageometry2/
DeepMind представила «золотую» ИИ-модель AlphaGeometry2
Tumblr media
Компания DeepMind представила ИИ-модель AlphaGeometry2, которая, как утверждается, справляется с геометрическими задачами лучше, чем золотые медалисты Международной математической олимпиады (IMO).
AlphaGeometry2 является улучшенной версией системы AlphaGeometry, которую DeepMind выпустила в январе прошлого года. Разработчики утверждают, что их ИИ способен решить 84% всех задач по геометрии за последние 25 лет в рамках Международной математической олимпиады.
В лаборатории DeepMind считают, что ключ к созданию эффективного ИИ может лежать в открытии новых способов решения сложных задач по геометрии.
Доказательство математических теорем требует как рассуждений, так и умения выбирать из ряда возможных шагов. Если DeepMind не ошибется, эти навыки решения проблем могут стать полезным компонентом будущих моделей ИИ общего назначения.
Так, этим летом DeepMind продемонстрировала систему, которая в сочетании с AlphaGeometry2 и моделью ИИ для формальных математических рассуждений AlphaProof решила четыре из шести заданий из IMO 2024 года. Помимо задач по геометрии, подобные подходы могут быть распространены и на другие области математики и естественных наук – например в сложных инженерных расчетах. Конечно, есть и ограничения. Техническая особенность не позволяет AlphaGeometry2 решать задачи с переменным количеством точек, нелинейные уравнения и неравенства.
0 notes
moko1590m · 4 months ago
Quote
2024年12月25日 09時45分 OpenAIのo3モデルが数学の超難問データセット「FrontierMath」で25.2%のスコアを獲得した衝撃を数学者が語る インペリアル・カレッジ・ロンドンで純粋数学の教授を務める数学者のケビン・バザード氏が、OpenAIのo3モデルがFrontierMath問題データセットで25.2%のスコアを獲得したことについて解説するブログ記事を投稿しました。 Can AI do maths yet? Thoughts from a mathematician. | Xena https://xenaproject.wordpress.com/2024/12/22/can-ai-do-maths-yet-thoughts-from-a-mathematician/ 2024年12月20日に、OpenAIは��たな推論モデル「o3」シリーズを発表しました。OpenAIはo3モデルについて「これまで開発した中で最も高度な推論能力を持つ」と述べ、2025年の公開に向けて準備を進めています。 OpenAIが推論能力を大幅に強化した「o3」シリーズを発表、 推論の中でOpenAIの安全ポリシーについて「再考」する仕組みを導入 - GIGAZINE o3モデルはFrontierMathという問題データセットで25.2%のスコアを獲得したことが明らかになっています。FrontierMathは数百個の難しい数学の問題のデータセットで、問題そのものだけでなくデータセット全体の問題数なども秘密であり、AIが事前に問題をトレーニングしないよう注意深く設計されています。 FrontierMathの全ての問題は計算問題で、「証明せよ」などの形式の問題は含まれていないとのこと。公開されている5つのサンプル問題では答えが全て正の整数となっており、その他の問題についても「自動的に検証できる明確で計算可能な答えがある」とされています。問題の難易度はかなり高く、数学者のバザード氏でもサンプル問題のうち解けたのは2問だけで、別の1問については「取り組めば解けるかも」と思えたものの、残りの2問は「解けない」と思ったそうです。 FrontierMathの論文にはフィールズ賞受賞者などの著名な数学者による問題の難易度評価が記載されていますが、「極めて難しい」とコメントした上で、それぞれの問題の分野の専門家でなくては解答できないことを示唆しています。実際、バザード氏が解けた2問はバザード氏の専門分野の問題でした。 なお、実際の数学者は計算ではなく証明や証明のためのアイデアを考え出すことにほとんどの時間を使用するため、「計算で数値的な答えを出すことは独自の証明を思いつくことと完全に異なる」として数学力の計測に不適だとする数学者も存在します。しかし証明を採点するのはコストがかかるため、モデルが提出した答えが正答と一致するかどうかを確認するだけで採点できる計算問題が採用されているとのこと。 そうしたFrontierMathのテストに対し、OpenAIのo3モデルが25.2%ものスコアを獲得したことについてバザード氏は「衝撃を受けた」と述べました。 これまでAIは優秀な高校生が解くような「数学オリンピック形式」を得意としていることが明らかになっており、バザード氏は「多くの典型問題が出題される」という点で似ている大学の学部生レベルの数学の問題をAIが解けるようになることは疑っていませんでした。しかし、典型問題のレベルを超えて博士課程の初期レベルの問題に対し革新的なアイデアで対応するレベルの数学力をAIが獲得していることに対し、バザード氏は「かなり大きな飛躍が起きたように見える」とコメントしています。 ただし、FrontierMathを組み上げたEpoch AIのエリオット・グレイザー氏はデータセット内の問題の25%は数学オリンピック形式だと発言しています。公開されている5つのサンプル問題はいずれも数学オリンピックの形式とは全く異なるため、バザード氏はo3モデルがFrontierMathで25.2%のスコアを獲得したと聞いて非常に興奮したものの、25%が数学オリンピック形式と知って興奮は収まったとのこと。「今後、AIがFrontierMathで50%のスコアを獲得することを楽しみにしている」とバザード氏はコメントを残しました。 現在、AIの進歩は急速に進んでいますが、まだまだ道のりは遠く、やるべき事は山ほどあります。バザード氏はAIが「この定理を正しく証明し、その証明がなぜ機能するのかを人間が理解できる方法で説明せよ」というレベルの問題に対応できるほどの数学力を身につけることを期待しているとブログを締めくくりました。 この記事のタイトルとURLをコピーする ・関連記事 Microsoftが軽量なのにGPT-4oを圧倒的に上回る数学性能を発揮するAIモデル「Phi-4」をリリース - GIGAZINE AppleのAI研究者らが「今のAI言語モデルは算数の文章題への推論能力が小学生未満」と研究結果を発表 - GIGAZINE 数学オリンピックの問題で銀メダルレベルのスコアを残すAIを開発したとGoogle DeepMindが発表 - GIGAZINE OpenAIが「GPT-4o」を発表、人間と同等の速さでテキスト・音声・カメラ入力を処理可能で「周囲を見渡して状況判断」「数学の解き方を教える」「AI同士で会話して作曲」など多様な操作を実行可能 - GIGAZINE 数学を解ける言語モデル「Qwen2-Math」が登場、GPT-4o超えの数学性能 - GIGAZINE ・関連コンテンツ GPT-4oがAIベンチマークのARC-AGIで50%のスコアに到達、これまでの最高記録である34%を大幅に更新 Google DeepMindが数学オリンピックレベルの幾何学問題を解けるAI「AlphaGeometry」を発表、人間の金メダリストに近い性能を発揮 日本語対応マルチモーダルAI「Claude 3」はわずか2つのプロンプトで量子アルゴリズムを設計可能 DeepMindが開発したAIの「AlphaCode」がプログラミングコンテストで「平均」評価を獲得 Metaの大規模言語モデル「LLaMA」がChatGPTを再現できる可能性があるとさまざまなチャットAI用言語モデルのベンチマーク測定で判明 画像生成AI・Stable Diffusionのエンコーダーに見つかった致命的な欠陥とは? OpenAIが4度目のブレイクスルーとなる数学ができるAI「Q*(キュースター)」で汎用人工知能開発の飛躍を目指す、アルトマンCEO解任騒動の一因か OpenAIが複雑な推論能力をもつAIモデル「OpenAI o1」と「OpenAI o1-mini」を発表、プログラミングや数学で高い能力を発揮
OpenAIのo3モデルが数学の超難問データセット「FrontierMath」で25.2%のスコアを獲得した衝撃を数学者が語る - GIGAZINE
0 notes
education30and40blog · 7 months ago
Text
AI achieves silver-medal standard solving International Mathematical Olympiad problems
See on Scoop.it - Education 2.0 & 3.0
Breakthrough models AlphaProof and AlphaGeometry 2 solve advanced reasoning problems in mathematics
0 notes
ai-news · 9 months ago
Link
In a groundbreaking achievement, AI systems developed by Google DeepMind have attained a silver medal-level score in the 2024 International Mathematical Olympiad (IMO), a prestigious global competition for young mathematicians. The AI models, named #AI #ML #Automation
0 notes
news2024news · 9 months ago
Text
Tumblr media
IA en las Olimpiadas Matemáticas: AlphaProof y AlphaGeometry http://dlvr.it/TB7pnx
0 notes
iwan1979 · 10 months ago
Text
Google DeepMind: AI achieves silver-medal standard solving International Mathematical Olympiad problems
0 notes