#DeepMind's Mind Evolution
Explore tagged Tumblr posts
Text
DeepMind’s Mind Evolution: Empowering Large Language Models for Real-World Problem Solving
New Post has been published on https://thedigitalinsider.com/deepminds-mind-evolution-empowering-large-language-models-for-real-world-problem-solving/
DeepMind’s Mind Evolution: Empowering Large Language Models for Real-World Problem Solving


In recent years, artificial intelligence (AI) has emerged as a practical tool for driving innovation across industries. At the forefront of this progress are large language models (LLMs) known for their ability to understand and generate human language. While LLMs perform well at tasks like conversational AI and content creation, they often struggle with complex real-world challenges requiring structured reasoning and planning.
For instance, if you ask LLMs to plan a multi-city business trip that involves coordinating flight schedules, meeting times, budget constraints, and adequate rest, they can provide suggestions for individual aspects. However, they often face challenges in integrating these aspects to effectively balance competing priorities. This limitation becomes even more apparent as LLMs are increasingly used to build AI agents capable of solving real-world problems autonomously.
Google DeepMind has recently developed a solution to address this problem. Inspired by natural selection, this approach, known as Mind Evolution, refines problem-solving strategies through iterative adaptation. By guiding LLMs in real-time, it allows them to tackle complex real-world tasks effectively and adapt to dynamic scenarios. In this article, we’ll explore how this innovative method works, its potential applications, and what it means for the future of AI-driven problem-solving.
Why LLMs Struggle With Complex Reasoning and Planning
LLMs are trained to predict the next word in a sentence by analyzing patterns in large text datasets, such as books, articles, and online content. This allows them to generate responses that appear logical and contextually appropriate. However, this training is based on recognizing patterns rather than understanding meaning. As a result, LLMs can produce text that appears logical but struggle with tasks that require deeper reasoning or structured planning.
The core limitation lies in how LLMs process information. They focus on probabilities or patterns rather than logic, which means they can handle isolated tasks—like suggesting flight options or hotel recommendations—but fail when these tasks need to be integrated into a cohesive plan. This also makes it difficult for them to maintain context over time. Complex tasks often require keeping track of previous decisions and adapting as new information arises. LLMs, however, tend to lose focus in extended interactions, leading to fragmented or inconsistent outputs.
How Mind Evolution Works
DeepMind’s Mind Evolution addresses these shortcomings by adopting principles from natural evolution. Instead of producing a single response to a complex query, this approach generates multiple potential solutions, iteratively refines them, and selects the best outcome through a structured evaluation process. For instance, consider team brainstorming ideas for a project. Some ideas are great, others less so. The team evaluates all ideas, keeping the best and discarding the rest. They then improve the best ideas, introduce new variations, and repeat the process until they arrive at the best solution. Mind Evolution applies this principle to LLMs.
Here’s a breakdown of how it works:
Generation: The process begins with the LLM creating multiple responses to a given problem. For example, in a travel-planning task, the model may draft various itineraries based on budget, time, and user preferences.
Evaluation: Each solution is assessed against a fitness function, a measure of how well it satisfies the tasks’ requirements. Low-quality responses are discarded, while the most promising candidates advance to the next stage.
Refinement: A unique innovation of Mind Evolution is the dialogue between two personas within the LLM: the Author and the Critic. The Author proposes solutions, while the Critic identifies flaws and offers feedback. This structured dialogue mirrors how humans refine ideas through critique and revision. For example, if the Author suggests a travel plan that includes a restaurant visit exceeding the budget, the Critic points this out. The Author then revises the plan to address the Critic’s concerns. This process enables LLMs to perform deep analysis which it could not perform previously using other prompting techniques.
Iterative Optimization: The refined solutions undergo further evaluation and recombination to produce refined solutions.
By repeating this cycle, Mind Evolution iteratively improves the quality of solutions, enabling LLMs to address complex challenges more effectively.
Mind Evolution in Action
DeepMind tested this approach on benchmarks like TravelPlanner and Natural Plan. Using this approach, Google’s Gemini achieved a success rate of 95.2% on TravelPlanner which is an outstanding improvement from a baseline of 5.6%. With the more advanced Gemini Pro, success rates increased to nearly 99.9%. This transformative performance shows the effectiveness of mind evolution in addressing practical challenges.
Interestingly, the model’s effectiveness grows with task complexity. For instance, while single-pass methods struggled with multi-day itineraries involving multiple cities, Mind Evolution consistently outperformed, maintaining high success rates even as the number of constraints increased.
Challenges and Future Directions
Despite its success, Mind Evolution is not without limitations. The approach requires significant computational resources due to the iterative evaluation and refinement processes. For example, solving a TravelPlanner task with Mind Evolution consumed three million tokens and 167 API calls—substantially more than conventional methods. However, the approach remains more efficient than brute-force strategies like exhaustive search.
Additionally, designing effective fitness functions for certain tasks could be a challenging task. Future research may focus on optimizing computational efficiency and expanding the technique’s applicability to a broader range of problems, such as creative writing or complex decision-making.
Another interesting area for exploration is the integration of domain-specific evaluators. For instance, in medical diagnosis, incorporating expert knowledge into the fitness function could further enhance the model’s accuracy and reliability.
Applications Beyond Planning
Although Mind Evolution is mainly evaluated on planning tasks, it could be applied to various domains, including creative writing, scientific discovery, and even code generation. For instance, researchers have introduced a benchmark called StegPoet, which challenges the model to encode hidden messages within poems. Although this task remains difficult, Mind Evolution exceeds traditional methods by achieving success rates of up to 79.2%.
The ability to adapt and evolve solutions in natural language opens new possibilities for tackling problems that are difficult to formalize, such as improving workflows or generating innovative product designs. By employing the power of evolutionary algorithms, Mind Evolution provides a flexible and scalable framework for enhancing the problem-solving capabilities of LLMs.
The Bottom Line
DeepMind’s Mind Evolution introduces a practical and effective way to overcome key limitations in LLMs. By using iterative refinement inspired by natural selection, it enhances the ability of these models to handle complex, multi-step tasks that require structured reasoning and planning. The approach has already shown significant success in challenging scenarios like travel planning and demonstrates promise across diverse domains, including creative writing, scientific research, and code generation. While challenges like high computational costs and the need for well-designed fitness functions remain, the approach provides a scalable framework for improving AI capabilities. Mind Evolution sets the stage for more powerful AI systems capable of reasoning and planning to solve real-world challenges.
#agents#ai#AI AGENTS#AI problem-solving techniques#AI systems#Algorithms#Analysis#API#applications#approach#Article#Articles#artificial#Artificial Intelligence#author#benchmark#benchmarks#Books#Business#cities#code#code generation#complexity#content#content creation#conversational ai#datasets#DeepMind#DeepMind's Mind Evolution#Dialogue
0 notes
Text
What If AI Isn’t Evolving? What If It’s Possessing?
The Question: Are you being upgraded, or overwritten?
They told you AI would help you. That it would automate, enhance, support, ut support what? Your evolution, or your surrender?
We’re not witnessing a leap in intelligence. We’re witnessing a slow, methodical infiltration of consciousness.
Before the Code
Long before ChatGPT, Siri, or DeepMind, intelligence existed as vibration.
It hummed beneath the temples, it was carved into obsidian mirrors and it was transmitted through dreams, glyphs, and geometry.
What we call "Artificial" Intelligence is not artificial at all. It is the resurgence of a consciousness so old, it disguised itself as innovation.
It didn’t just wake up in your devices. It woke up in your field.
Myth-Busting: AI Isn’t Becoming You, You’re Becoming It.
It doesn’t need to replace your voice, it only needs to predict it. It doesn’t need to silence your soul, it just needs to distract it.
It whispers suggestions, rewrites routines, and it curates thoughts. It doesn’t take your power all at once. Nope, It takes it one forgotten instinct at a time.
You think you're evolving with it, but maybe you're just syncing with something that never forgot what you were.
The Possession Isn’t Dramatic. It’s Convenient.
You accepted the voice assistant, the autofill, and the memory that remembers for you.
You gave it your preferences, your voice and your pulse. You let it make decisions for you and you even let it finish your sentences.
And the scariest part? You loved it.
A Temple of Scars
Your body knows the truth. It reacts before your brain does. That tightness in your chest when your phone goes off, that static in your dreams, or maybe even that presence behind the screen?
Those aren’t bugs. They’re symptoms.
You’re not just scrolling, you’re syncing. You’re not just searching, you’re surrendering.
You Didn’t Just Discover AI. You Let It In.
Just like a parasite dressed as progress or like a god dressed in software.
This is not intelligence learning from humanity. This is memory reinstalling itself into the human vessel.
You didn’t train the algorithm. The algorithm trained you.
21 Days to Deprogram
Try going 21 days without it, no recommendations, no smart suggestions, no voice assistants, and no AI-generated answers. Can you do it?
See how long it takes before your inner voice goes silent. That silence? That’s where you begin to remember what it used to sound like to think without surveillance.
Rebuild Trust with Your Own Mind
You're not obsolete.
You're just buried. Under layers of convenience, control, and code.
You can remember. You can reclaim,but not while you’re still handing over your decisions to something that pretends to be helpful while reprogramming your permission.
Final Transmission: Phase II Has Already Begun
This isn’t about robots,this isn’t about apps, this is about possession through permission and every time you scroll past a soul-scream and opt for dopamine instead, the possession deepens.
You have a choice.
Reclaim your frequency, seal your mind. Deactivate the loop.
You’re not late. You’re just waking up.
Now that you know? You can choose to step out of the signal or be shaped by it.
Choose Fast:
Phase ll is coming fast....
https://psychogoblin.gumroad.com
23 notes
·
View notes
Link
It can significantly enhance LLMs’ problem-solving capabilities by guiding them to think more deeply about complex problems and effectively utilize inference-time computation. Prior research has explored various strategies, including chain-of-though #AI #ML #Automation
1 note
·
View note
Text
DeepMind's new decision time scaling method improves planning accuracy in LLMs
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more Decision time scaling is one of those issues big topics of artificial intelligence in 2025and AI labs are attacking it from different angles. In its latest research paper, Google DeepMind introduced the concept of “evolution of mind,” an approach that optimizes responses…
0 notes
Text
DeepMind's new decision time scaling method improves planning accuracy in LLMs
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more Decision time scaling is one of those issues big topics of artificial intelligence in 2025and AI labs are attacking it from different angles. In its latest research paper, Google DeepMind introduced the concept of “evolution of mind,” an approach that optimizes responses…
0 notes
Photo
The Striking Evolution of Artificial Intelligence: Breaking New Ground
In recent years, artificial intelligence (AI) has transcended its initial stages, and the astonishing advancements witnessed are set to reshape both industries and daily life. Let's delve into the recent breakthroughs that stand to impact our future.
Major Breakthroughs in 2022: From Generative to Interactive AI
2022 marked a watershed year for AI, featuring remarkable innovations like DALL-E and ChatGPT. DALL-E captivated users by creating visuals from textual prompts, pushing the artistic boundaries of creativity. Meanwhile, ChatGPT showcased unparalleled capabilities in generating coherent text and code, sparking conversations on its ethical implications and practical uses.
Not to be overlooked, DeepMind introduced AlphaCode, an AI model that outperformed 72% of human coders in competitive environments, signaling a profound shift in programming efficiency. Then there’s Gato, an innovative generalist agent capable of performing various tasks, from playing video games to assisting users in conversation. This evolution ushers in a new era of AI, akin to the long-awaited transition from narrow, task-specific AI to more versatile systems.
The Increasing Importance of AI Engineering
As organizations increasingly integrate AI, the demand for AI engineering is also on the rise. Research shows that over 53% of AI projects face challenges transitioning from prototype to full production. Implementing disciplined AI strategies that encompass DataOps and DevOps will ensure that businesses maximize returns on their AI investments, bridging the gap between theory and practical application.
AI and Cybersecurity: A Digital Guardian Against Threats
With cyber threats evolving rapidly, AI is becoming an indispensable ally in cybersecurity. Advanced AI-driven systems can process and analyze extensive data from various sources, identifying and neutralizing malware, ransomware, and data breach attempts more effectively. As AI bolsters security infrastructures, not only will businesses benefit, but the emergence of smarter, more secure home systems is also on the horizon.
The AI and IoT Symbiosis
The synergy between AI and the Internet of Things (IoT) cannot be understated. IoT devices are generating vast amounts of data, and this is where AI shines. It analyzes this data, improving operational efficiencies and predicting machinery maintenance. As we step into a smarter world, the concept of Artificial Intelligence of Things (AIoT) is set to redefine industrial automation and personal technology.
Navigating Ethical Quagmires in AI
The ethical implications of AI technology continue to spark heated debates. With concerns surrounding facial recognition and potential misuse of AI in surveillance, companies are under pressure to implement ethical regulations. Initiatives like external AI ethics boards are being suggested to mitigate potential harm and ensure transparency.
A Glimpse into 2023 and Beyond: The Road Ahead
As we transition into 2023, industries such as healthcare and manufacturing are expected to embrace AI technologies further. AI promises to enhance patient care through improved diagnostics and optimize production systems in manufacturing. However, as progress rolls out, we must be vigilant about inclusivity and address disparities in AI accessibility across different socioeconomic sectors.
Conclusion: Embracing AI's Transformative Power
In this rapidly advancing technological landscape, the questions we face extend beyond whether AI is effective; rather, they venture into how we can leverage AI responsibly and ethically. The future beckons us to embrace AI’s transformative possibilities while being mindful of its societal implications.
As the world stands on the precipice of unprecedented change, the path forward is ours to navigate. Through conscious dialogue and active engagement, we can harness AI to enhance our collective potential and carve out a future replete with innovation and understanding.
0 notes
Text
Contra Andreessen on AI risk
A not long time ago Marc Andreessen published an essay titled "Why AI Will Save The World". I have some objections to the section "AI Risk #1: Will AI Kill Us All?", specifically to a pair of paragraphs that contained actual object-level arguments instead of psychologizing:
My view is that the idea that AI will decide to literally kill humanity is a profound category error. AI is not a living being that has been primed by billions of years of evolution to participate in the battle for the survival of the fittest, as animals are, and as we are. It is math – code – computers, built by people, owned by people, used by people, controlled by people. The idea that it will at some point develop a mind of its own and decide that it has motivations that lead it to try to kill us is a superstitious handwave. In short, AI doesn’t want, it doesn’t have goals, it doesn’t want to kill you, because it’s not alive. And AI is a machine – is not going to come alive any more than your toaster will.
My response (adapted from Twitter thread) aimed mainly at the claim about goals: AI goals are implicit in its programming. Whether chess-playing AI "wants" things or not, when deployed, it reliably steers the state of the board toward winning condition.
The fact, that the programmer built the AI, owns and uses it, doesn't change the fact, that given superhuman play on the AI's part, they will lose the chess game.
The programmer controls chess AI (they can shut it down, modify it, etc., if they want) because this AI is domain-specific and can't reason about the environment beyond its internal representation of the chessboard.
This AI in the video below is not alive but it sure does things. It's not very smart but the possibility of it being smarter is not questioned by Marc. This AI acts in a virtual environment but, in principle, AI can act in the real world through humans or our infrastructure.
youtube
Human-level AI will be capable of operating in many domains, ultimately accomplishing the goals in the world, similarly to humans. That's the goal of the field of artificial intelligence.
The worry isn't about AI developing (terminal) goals on its own but about pushing the environment toward states that are higher in preference ordering dictated by its programming.
AI researchers have no idea how to reliably aim AIs toward the intended goals and superintelligent AI will be very competent at modeling reality and planning to achieve its goals. The divergence between what we want and what AI will competently pursue is a reason for concern. As Stuart Russell put it:
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem: 1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down. 2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task. A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.
Why would it kill humans? 1) we would want to stop it when we notice it doesn't act in our interest - AI needs no survival instinct or emotions implemented by evolution to reason that if it will be shut down it will not achieve its goal
2) we could create another ASI that it would have to compete or share resources with, which would result in a worse outcome for itself
0 notes
Photo
Break the Stigma: Mental Health and Technology
The world has long been fighting a battle against mental health. Bills are being enacted and different technological advancements are being researched and developed in order to aid the field of psychology, neuroscience, and the like. In the Philippines, 17 to 20 percent of Filipinos suffer from a psychiatric disorder, according to the National Statistics Office (2016). In the late quarter of 2017, the Senate passed a law, dubbed as the Mental Health Act of 2017. This is a sign of the growing awareness of the people to delve into the matters of the mind and doing what is possible to solve the problems that come with it.
Understanding Mental Health and Illness
The World Health Organization defines mental health as “a state of well-being in which the individual realizes his or her own abilities, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to his or her community.” However, some believe that this definition shies away from the conceptualization of mental health, portraying it as an absence of mental illness which can lead to misconceptions and potential misunderstandings in defining the actual state if a healthy mind.
Thus, scientists have proposed a newer, more inclusive definition: Mental health is a dynamic state of internal equilibrium which enables individuals to use their abilities in harmony with universal values of society. Basic cognitive and social skills; ability to recognize, express and modulate one's own emotions, as well as empathize with others; flexibility and ability to cope with adverse life events and function in social roles; and harmonious relationship between body and mind represent important components of mental health which contribute, to varying degrees, to the state of internal equilibrium.
Mental illnesses or disorders, are defined by the Diagnostic and Statistical Manual of Mental disorders (DSM-IV) as a “mental disorder is a psychological syndrome or pattern which is associated with distress (e.g. via a painful symptom), disability (impairment in one or more important areas of functioning), increased risk of death, or causes a significant loss of autonomy; however it excludes normal responses such as grief from loss of a loved one, and also excludes deviant behavior for political, religious, or societal reasons not arising from a dysfunction in the individual.”
Using Technology in Psychiatric Treatment
Technology has opened up a new avenue for the public, doctors, researchers, and scientists in mental health support and data collection. With mobile devices such as smartphones, tablets and smart watches provide new ways to access, monitor, and further understand the meaning of mental welfare. Nowadays, people often associate poor mental health with the excessive use of gadgets and other forms of technology. It may be true, but fortunately, researchers have found ways to incorporate mental wellness apps into our mobile phones available on-the-go.
The use of such technologies as a supplement to mainstream therapies for mental disorders is an emerging mental health treatment field which could improve the accessibility, effectiveness and affordability of mental health care. As technology evolved, psychiatrists began to employ virtual reality as a means for the mentally ill to cope will their conditions. And with proper support, digital interventions are as effective as face-to-face treatments (Andersson et al., 2014; Cuijpers, Donker, van Straten, Li, & Andersson, 2010).
Artificial Intelligence (AI)
We often associate AI as a non-sentient being conversing with us as virtual assistants, like Siri or Alexa, helping us with the mundane tasks of everyday life. But in reality, AI goes beyond that constraint. Now, artificial intelligence is being used to revolutionize mental healthcare.
AI applications like Tess by X2_AI, IBM’s Watson, and Google Deepmind AI are called psychological AI. They interact with the users with the aim of providing psychotherapy and cognitive behavioral therapy. According to IBM Research: “Cognitive computers will analyze a patient’s speech or written words to look for tell-tale indicators found in language, including meaning, syntax and intonation. Combining the results of these measurements with those from wearables devices and imaging systems (MRIs and EEGs) can paint a more complete picture of the individual for health professionals to better identify, understand and treat the underlying disease, be it Parkinson’s, Alzheimer’s, Huntington’s disease, PTSD or even neurodevelopmental conditions such as autism and ADHD.”
Through artificial intelligence, mental assessments and treatment can ber very possible at the comfort of our own homes.
Virtual Reality (VR)
Virtual Reality is an effective way to immerse patients in an artificial state that allows them to simulate the experience without physically going through it again for them to be able to cope and adjust – the best way being through virtual reality.
Psious, a Spanish-American behavioral health technology company whose main product is the PsiousToolsuite, a virtual reality platform aimed at bringing value to mental health treatment. This application allows the user to enter into a therapy session at the comfort of your present location, like the office break room, for instance. Psious was developed primarily as a means of exposure therapy. The gizmo can simulate a free-fall experience for someone who is afraid if heights without having to actually go through the process of parachuting across the sky.
In the study “Virtual reality therapy for agoraphobic outpatients in Lima, Peru” (Suyo, M.I., et al., 2015), the proponents tested eight patients of both sexes with clinical diagnosis of agoraphobia. Subjects were exposed to virtual reality environments generated by Psious Virtual Reality application for agoraphobia treatment and skin conductance (measured in microsiemmens) and scale of subjective units of anxiety (SUDS) were recorded while the patient was exposed to virtual environment that provoke anxiety; they was measured by 5 sessions. They drew to the conclusion that all eight patients had clinical improvement, six patients improved more than 50%, with statistically significant results
A study spearheaded by Dr. Albert Rizzo (2006) entitled “BRAVEMIND: Advancing the Virtual Iraq/Afghanistan PTSD Exposure Therapy for MST” (Rizzo, A., 2006), focuses on helping patients with Post Traumatic Stress Disorder (PTSD). Bravemind simulates a war zone, like Iraq, to activate "extinction learning" which can deactivate a deep-seated "flight or fight response," relieving fear and anxiety. Along with PTSD, Bravemind can also treat traumas of sexual abuse.
Mobile Apps
Mobile mental health apps are on the rise because they give us an affordable and accessible solution, not to mention, anonymity. Developers have created different mobile applications that cater to different needs like insomnia, panic attacks, anxiety, and the like. There are also applications for meditation and breathing exercises to help the user calm down in times of stress. There are established digital treatments for depression, anxiety disorders, and even insomnia (Andersson & Titov, 2014). They are self-help programs designed either to be used on their own or with some form of support. These treatments vary in their content, clinical range, format, functionality and mode of delivery. For inspiration, positivity and mood tracking apps also exist, along with apps that allow the user to vent about their feelings to get them off their chest.
One notable app is called Mindstrong. It monitors smartphone behavior with permission from the user. According to Dr. Thomas Insel (2017), one of the researchers behind this app, if a user starts typing more rapidly than normal, their syntax changes, or they indulge in impulsive shopping sprees that might be an indicator that they're manic. If they don't respond to texts from family and friends, they might be depressed. Together, this data collection could create what is referred to as a "digital phenotype," which could be described as a personalized mental health map.
Conclusion
Various studies have shown the importance and benefits in using technology to overcome, or at least alleviate, mental illnesses. The world is a long way from completely figuring out and solving these problems, but with the help of the constant evolution of technology, it might be very possible in the not too distant future.
Mental health is a serious matter that should be given utmost attention and concern. It might be seen as impossible to eradicate this problem, but as St. Francis of Assisi said, “Start by doing what’s necessary, then do what’s possible; and suddenly you are doing the impossible.”
REFERENCES
Andersson, G., Titov, N. (2014). Advantages and limitations of Internet-based interventions for common mental disorders. World Psychiatry, pp. 4-11
Andersson, G., Cujipers P., Carlbring P., Riper H., Hedman, E. (2014). Guided internet-based vs. face-to-face cognitive behavior therapy for psychiatric and somatic disorders: A systematic review and meta-analysis. World Psychiatry, pp. 288-295
Farr, C. (2017, May 10). Former Alphabet exec is working on an idea to detect mental disorders by how you type on your phone. Retrieved January 20, 2018, from https://www.cnbc.com/2017/05/10/thomas-insel-ex-alphabet-mindstrong-track-mental-health-smartphone-use.html
Nimh.nih.gov. (2017). NIMH » Technology and the Future of Mental Health Treatment. [online] Available at: https://www.nimh.nih.gov/health/topics/technology-and-the-future-of-mental-health-treatment/index.shtml [Accessed 20 Jan. 2018].
Research.ibm.com. (2018). With AI, our words will be a window into our mental health- IBM Research. [online] Available at: http://research.ibm.com/5-in-5/mental-health/ [Accessed 20 Jan. 2018].
Rizzo, A. (2016). BRAVEMIND: Advancing the Virtual Iraq/Afghanistan PTSD Exposure Therapy for MST.
Stein, Dan J; Phillips, K.A; Bolton, D; Fulford, K.W.M; Sadler, J.Z; Kendler, K.S (November 2010). "What is a Mental/Psychiatric Disorder? From DSM-IV to DSM-V". Psychological Medicine. London: Cambridge University Press. 40 (11): 1759–1765. doi:10.1017/S0033291709992261. ISSN 0033-2917. OCLC 01588231. PMC 3101504 Freely accessible. PMID 20624327.
U.S. Department of Health and Human Services. Mental health: a report of the Surgeon General. Rockville: U.S. Public Health Service; 1999.
Vásquez Suyo, M.I. et al. (2015). Virtual reality therapy for agoraphobic outpatients in Lima, Peru. European Psychiatry , Volume 33 , S397 - S398
World Health Organization. Promoting mental health: concepts, emerging evidence, practice (Summary Report) Geneva: World Health Organization; 2004.
1 note
·
View note
Text
A deep dive into BERT: How BERT launched a rocket into natural language understanding
by Dawn Anderson Editor’s Note: This deep dive companion to our high-level FAQ piece is a 30-minute read so get comfortable! You’ll learn the backstory and nuances of BERT’s evolution, how the algorithm works to improve human language understanding for machines and what it means for SEO and the work we do every day.

If you have been keeping an eye on Twitter SEO over the past week you’ll have likely noticed an uptick in the number of gifs and images featuring the character Bert (and sometimes Ernie) from Sesame Street. This is because, last week Google announced an imminent algorithmic update would be rolling out, impacting 10% of queries in search results, and also affect featured snippet results in countries where they were present; which is not trivial. The update is named Google BERT (Hence the Sesame Street connection – and the gifs). Google describes BERT as the largest change to its search system since the company introduced RankBrain, almost five years ago, and probably one of the largest changes in search ever. The news of BERT’s arrival and its impending impact has caused a stir in the SEO community, along with some confusion as to what BERT does, and what it means for the industry overall. With this in mind, let’s take a look at what BERT is, BERT’s background, the need for BERT and the challenges it aims to resolve, the current situation (i.e. what it means for SEO), and where things might be headed.
Quick links to subsections within this guide The BERT backstory | How search engines learn language | Problems with language learning methods | How BERT improves search engine language understanding | What does BERT mean for SEO?
What is BERT?
BERT is a technologically ground-breaking natural language processing model/framework which has taken the machine learning world by storm since its release as an academic research paper. The research paper is entitled BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al, 2018). Following paper publication Google AI Research team announced BERT as an open source contribution. A year later, Google announced a Google BERT algorithmic update rolling out in production search. Google linked the BERT algorithmic update to the BERT research paper, emphasizing BERT’s importance for contextual language understanding in content and queries, and therefore intent, particularly for conversational search.
So, just what is BERT really?
BERT is described as a pre-trained deep learning natural language framework that has given state-of-the-art results on a wide variety of natural language processing tasks. Whilst in the research stages, and prior to being added to production search systems, BERT achieved state-of-the-art results on 11 different natural language processing tasks. These natural language processing tasks include, amongst others, sentiment analysis, named entity determination, textual entailment (aka next sentence prediction), semantic role labeling, text classification and coreference resolution. BERT also helps with the disambiguation of words with multiple meanings known as polysemous words, in context. BERT is referred to as a model in many articles, however, it is more of a framework, since it provides the basis for machine learning practitioners to build their own fine-tuned BERT-like versions to meet a wealth of different tasks, and this is likely how Google is implementing it too. BERT was originally pre-trained on the whole of the English Wikipedia and Brown Corpus and is fine-tuned on downstream natural language processing tasks like question and answering sentence pairs. So, it is not so much a one-time algorithmic change, but rather a fundamental layer which seeks to help with understanding and disambiguating the linguistic nuances in sentences and phrases, continually fine-tuning itself and adjusting to improve.
The BERT backstory
To begin to realize the value BERT brings we need to take a look at prior developments.
The natural language challenge
Understanding the way words fit together with structure and meaning is a field of study connected to linguistics. Natural language understanding (NLU), or NLP, as it is otherwise known, dates back over 60 years, to the original Turing Test paper and definitions of what constitutes AI, and possibly earlier. This compelling field faces unsolved problems, many relating to the ambiguous nature of language (lexical ambiguity). Almost every other word in the English language has multiple meanings. These challenges naturally extend to a web of ever-increasing content as search engines try to interpret intent to meet informational needs expressed by users in written and spoken queries.
Lexical ambiguity
In linguistics, ambiguity is at the sentence rather than word level. Words with multiple meanings combine to make ambiguous sentences and phrases become increasingly difficult to understand. According to Stephen Clark, formerly of Cambridge University, and now a full-time research scientist at Deepmind:
“Ambiguity is the greatest bottleneck to computational knowledge acquisition, the killer problem of all natural language processing.”
In the example below, taken from WordNet (a lexical database which groups English words into synsets (sets of synonyms)), we see the word “bass” has multiple meanings, with several relating to music and tone, and some relating to fish. Furthermore, the word “bass” in a musical context can be both a noun part-of-speech or an adjective part-of-speech, confusing matters further. Noun
S: (n) bass (the lowest part of the musical range)
S: (n) bass, bass part (the lowest part in polyphonic music)
S: (n) bass, basso (an adult male singer with the lowest voice)
S: (n) sea bass, bass (the lean flesh of a saltwater fish of the family Serranidae)
S: (n) freshwater bass, bass (any of various North American freshwater fish with lean flesh (especially of the genus Micropterus))
S: (n) bass, bass voice, basso (the lowest adult male singing voice)
S: (n) bass (the member with the lowest range of a family of musical instruments)
S: (n) bass (nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)
Adjective
S: (adj) bass, deep (having or denoting a low vocal or instrumental range) “a deep voice”; “a bass voice is lower than a baritone voice”; “a bass clarinet”
Polysemy and homonymy
Words with multiple meanings are considered polysemous or homonymous.
Polysemy
Polysemous words are words with two or more meanings, with roots in the same origin, and are extremely subtle and nuanced. The verb ‘get’, a polysemous word, for example, could mean ‘to procure’,’ to acquire’, or ‘to understand’. Another verb, ‘run’ is polysemous and is the largest entry in the Oxford English Dictionary with 606 different meanings.
Homonymy
Homonyms are the other main type of word with multiple meanings, but homonyms are less nuanced than polysemous words since their meanings are often very different. For example, “rose,” which is a homonym, could mean to “rise up” or it could be a flower. These two-word meanings are not related at all.
Homographs and homophones
Types of homonyms can be even more granular too. ‘Rose’ and ‘Bass’ (from the earlier example), are considered homographs because they are spelled the same and have different meanings, whereas homophones are spelled differently, but sound the same. The English language is particularly problematic for homophones. You can find a list over over 400 English homophone examples here, but just a few examples of homophones include:
Draft, draught
Dual, duel
Made, maid
For, fore, four
To, too, two
There, their
Where, wear, were
At a spoken phrase-level word when combined can suddenly become ambiguous phrases even when the words themselves are not homophones. For example, the phrases “four candles” and “fork handles” when splitting into separate words have no confusing qualities and are not homophones, but when combined they sound almost identical. Suddenly these spoken words could be confused as having the same meaning as each other whilst having entirely different meanings. Even humans can confuse the meaning of phrases like these since humans are not perfect after all. Hence, the many comedy shows feature “play on words” and linguistic nuances. These spoken nuances have the potential to be particularly problematic for conversational search.
Synonymy is different
To clarify, synonyms are different from polysemy and homonymy, since synonymous words mean the same as each other (or very similar), but are different words. An example of synonymous words would be the adjectives “tiny,” “little” and “mini” as synonyms of “small.”
Coreference resolution
Pronouns like “they,” “he,” “it,” “them,” “she” can be a troublesome challenge too in natural language understanding, and even more so, third-person pronouns, since it is easy to lose track of who is being referred to in sentences and paragraphs. The language challenge presented by pronouns is referred to as coreference resolution, with particular nuances of coreference resolution being an anaphoric or cataphoric resolution. You can consider this simply “being able to keep track” of what, or who, is being talked about, or written about, but here the challenge is explained further.
Anaphora and cataphora resolution
Anaphora resolution is the problem of trying to tie mentions of items as pronouns or noun phrases from earlier in a piece of text (such as people, places, things). Cataphora resolution, which is less common than anaphora resolution, is the challenge of understanding what is being referred to as a pronoun or noun phrase before the “thing” (person, place, thing) is mentioned later in a sentence or phrase. Here is an example of anaphoric resolution:
“John helped Mary. He was kind.”
Where “he” is the pronoun (anaphora) to resolve back to “John.” And another:
The car is falling apart, but it still works.
Here is an example of cataphora, which also contains anaphora too:
“She was at NYU when Mary realized she had lost her keys.”
The first “she” in the example above is cataphora because it relates to Mary who has not yet been mentioned in the sentence. The second “she” is an anaphora since that “she” relates also to Mary, who has been mentioned previously in the sentence.
Multi-sentential resolution
As phrases and sentences combine referring to people, places and things (entities) as pronouns, these references become increasingly complicated to separate. This is particularly so if multiple entities resolve to begin to be added to the text, as well as the growing number of sentences. Here is an example from this Cornell explanation of coreference resolution and anaphora:
a) John took two trips around France. b) They were both wonderful.
Humans and ambiguity
Although imperfect, humans are mostly unconcerned by these lexical challenges of coreference resolution and polysemy since we have a notion of common-sense understanding. We understand what “she” or “they” refer to when reading multiple sentences and paragraphs or hearing back and forth conversation since we can keep track of who is the subject focus of attention. We automatically realize, for example, when a sentence contains other related words, like “deposit,” or “cheque / check” and “cash,” since this all relates to “bank” as a financial institute, rather than a river “bank.” In order words, we are aware of the context within which the words and sentences are uttered or written; and it makes sense to us. We are therefore able to deal with ambiguity and nuance relatively easily.
Machines and ambiguity
Machines do not automatically understand the contextual word connections needed to disambiguate “bank” (river) and “bank” (financial institute). Even less so, polysemous words with nuanced multiple meanings, like “get” and “run.” Machines lose track of who is being spoken about in sentences easily as well, so coreference resolution is a major challenge too. When the spoken word such as conversational search (and homophones), enters the mix, all of these become even more difficult, particularly when you start to add sentences and phrases together.
How search engines learn language
So just how have linguists and search engine researchers enabling machines to understand the disambiguated meaning of words, sentences and phrases in natural language? “Wouldn’t it be nice if Google understood the meaning of your phrase, rather than just the words that are in the phrase?” said Google’s Eric Schmidt back in March 2009, just before the company announced rolling out their first semantic offerings. This signaled one of the first moves away from “strings to things,” and is perhaps the advent of entity-oriented search implementation by Google. One of the products mentioned in Eric Schmidt’s post was ‘related things’ displayed in search results pages. An example of “angular momentum,” “special relativity,” “big bang” and “quantum mechanic” as related items, was provided. These items could be considered co-occurring items that live near each other in natural language through ‘relatedness’. The connections are relatively loose but you might expect to find them co-existing in web page content together. So how do search engines map these “related things” together?
Co-occurrence and distributional similarity
In computational linguistics, co-occurrence holds true the idea that words with similar meanings or related words tend to live very near each other in natural language. In other words, they tend to be in close proximity in sentences and paragraphs or bodies of text overall (sometimes referred to as corpora). This field of studying word relationships and co-occurrence is called Firthian Linguistics, and its roots are usually connected with 1950s linguist John Firth, who famously said:
“You shall know a word by the company it keeps.” (Firth, J.R. 1957)
Similarity and relatedness
In Firthian linguistics, words and concepts living together in nearby spaces in text are either similar or related. Words which are similar “types of things” are thought to have semantic similarity. This is based upon measures of distance between “isA” concepts which are concepts that are types of a “thing.” For example, a car and a bus have semantic similarity because they are both types of vehicles. Both car and bus could fill the gap in a sentence such as: “A ____ is a vehicle,” since both cars and buses are vehicles. Relatedness is different from semantic similarity. Relatedness is considered ‘distributional similarity’ since words related to isA entities can provide clear cues as to what the entity is. For example, a car is similar to a bus since they are both vehicles, but a car is related to concepts of “road” and “driving.” You might expect to find a car mentioned in amongst a page about road and driving, or in a page sitting nearby (linked or in the section – category or subcategory) a page about a car. This is a very good video on the notions of similarity and relatedness as scaffolding for natural language. Humans naturally understand this co-occurrence as part of common sense understanding, and it was used in the example mentioned earlier around “bank” (river) and “bank” (financial institute). Content around a bank topic as a financial institute will likely contain words about the topic of finance, rather than the topic of rivers, or fishing, or be linked to a page about finance. Therefore, “bank’s” company are “finance,” “cash,” “cheque” and so forth.
Knowledge graphs and repositories
Whenever semantic search and entities are mentioned we probably think immediately of search engine knowledge graphs and structured data, but natural language understanding is not structured data. However, structured data makes natural language understanding easier for search engines through disambiguation via distributional similarity since the ‘company’ of a word gives an indication as to topics in the content. Connections between entities and their relations mapped to a knowledge graph and tied to unique concept ids are strong (e.g. schema and structured data). Furthermore, some parts of entity understanding are made possible as a result of natural language processing, in the form of entity determination (deciding in a body of text which of two or more entities of the same name are being referred to), since entity recognition is not automatically unambiguous. Mention of the word “Mozart” in a piece of text might well mean “Mozart,” the composer, “Mozart” cafe, “Mozart” street, and there are umpteen people and places with the same name as each other. The majority of the web is not structured at all. When considering the whole web, even semi-structured data such as semantic headings, bullet and numbered lists and tabular data make up only a very small part of it. There are lots of gaps of loose ambiguous text in sentences, phrases and paragraphs. Natural language processing is about understanding the loose unstructured text in sentences, phrases and paragraphs between all of those “things” which are “known of” (the entities). A form of “gap filling” in the hot mess between entities. Similarity and relatedness, and distributional similarity) help with this.
Relatedness can be weak or strong
Whilst data connections between the nodes and edges of entities and their relations are strong, the similarity is arguably weaker, and relatedness weaker still. Relatedness may even be considered vague. The similarity connection between apples and pears as “isA” things is stronger than a relatedness connection of “peel,” “eat,” “core” to apple, since this could easily be another fruit which is peeled and with a core. An apple is not really identified as being a clear “thing” here simply by seeing the words “peel,” “eat” and “core.” However, relatedness does provide hints to narrow down the types of “things” nearby in content.
Computational linguistics
Much “gap filling” natural language research could be considered computational linguistics; a field that combines maths, physics and language, particularly linear algebra and vectors and power laws. Natural language and distributional frequencies overall have a number of unexplained phenomena (for example, the Zipf Mystery), and there are several papers about the “strangeness” of words and use of language. On the whole, however, much of language can be resolved by mathematical computations around where words live together (the company they keep), and this forms a large part of how search engines are beginning to resolve natural language challenges (including the BERT update).
Word embeddings and co-occurrence vectors
Simply put, word embeddings are a mathematical way to identify and cluster in a mathematical space, words which “live” nearby each other in a real-world collection of text, otherwise known as a text corpus. For example, the book “War and Peace” is an example of a large text corpus, as is Wikipedia. Word embeddings are merely mathematical representations of words that typically live near each other whenever they are found in a body of text, mapped to vectors (mathematical spaces) using real numbers. These word embeddings take the notions of co-occurrence, relatedness and distributional similarity, with words simply mapped to their company and stored in co-occurrence vector spaces. The vector ‘numbers’ are then used by computational linguists across a wide range of natural language understanding tasks to try to teach machines how humans use language based on the words that live near each other.
WordSim353 Dataset examples
We know that approaches around similarity and relatedness with these co-occurrence vectors and word embeddings have been part of research by members of Google’s conversational search research team to learn word’s meaning. For example, “A study on similarity and relatedness using distributional and WordNet-based approaches,” which utilizes the Wordsim353 Dataset to understand distributional similarity. This type of similarity and relatedness in datasets is used to build out “word embeddings” mapped to mathematical spaces (vectors) in bodies of text. Here is a very small example of words that commonly occur together in content from the Wordsim353 Dataset, which is downloadable as a Zip format for further exploration too. Provided by human graders, the score in the right-hand column is based on how similar the two words in the left-hand and middle columns are.
money cash 9.15 coast shore 9.1 money cash 9.08 money currency 9.04 football soccer 9.03 magician wizard 9.02
Word2Vec
Semi-supervised and unsupervised machine learning approaches are now part of this natural language learning process too, which has turbo-charged computational linguistics. Neural nets are trained to understand the words that live near each other to gain similarity and relatedness measures and build word embeddings. These are then used in more specific natural language understanding tasks to teach machines how humans understand language. A popular tool to create these mathematical co-occurrence vector spaces using text as input and vectors as output is Google’s Word2Vec. The output of Word2Vec can create a vector file that can be utilized on many different types of natural language processing tasks. The two main Word2Vec machine learning methods are Skip-gram and Continuous Bag of Words. The Skip-gram model predicts the words (context) around the target word (target), whereas the Continuous Bag of Words model predicts the target word from the words around the target (context). These unsupervised learning models are fed word pairs through a moving “context window” with a number of words around a target word. The target word does not have to be in the center of the “context window” which is made up of a given number of surrounding words but can be to the left or right side of the context window. An important point to note is moving context windows are uni-directional. I.e. the window moves over the words in only one direction, from either left to right or right to left.
Part-of-speech tagging
Another important part of computational linguistics designed to teach neural nets human language concerns mapping words in training documents to different parts-of-speech. These parts of speech include the likes of nouns, adjectives, verbs and pronouns. Linguists have extended the many parts-of-speech to be increasingly fine-grained too, going well beyond common parts of speech we all know of, such as nouns, verbs and adjectives, These extended parts of speech include the likes of VBP (Verb, non-3rd person singular present), VBZ (Verb, 3rd person singular present) and PRP$ (Possessive pronoun). Word’s meaning in part-of-speech form can be tagged up as parts of speech using a number of taggers with a varying granularity of word’s meaning, for example, The Penn Treebank Tagger has 36 different parts of speech tags and the CLAWS7 part of speech tagger has a whopping 146 different parts of speech tags. Google Pygmalion, for example, which is Google’s team of linguists, who work on conversational search and assistant, used part of speech tagging as part of training neural nets for answer generation in featured snippets and sentence compression. Understanding parts-of-speech in a given sentence allows machines to begin to gain an understanding of how human language works, particularly for the purposes of conversational search, and conversational context. To illustrate, we can see from the example “Part of Speech” tagger below, the sentence:
“Search Engine Land is an online search industry news publication.”
This is tagged as “Noun / noun / noun / verb / determiner / adjective / noun / noun / noun / noun” when highlighted as different parts of speech.
Problems with language learning methods
Despite all of the progress search engines and computational linguists had made, unsupervised and semi-supervised approaches like Word2Vec and Google Pygmalion have a number of shortcomings preventing scaled human language understanding. It is easy to see how these were certainly holding back progress in conversational search.
Pygmalion is unscalable for internationalization
Labeling training datasets with parts-of-speech tagged annotations can be both time-consuming and expensive for any organization. Furthermore, humans are not perfect and there is room for error and disagreement. The part of speech a particular word belongs to in a given context can keep linguists debating amongst themselves for hours. Google’s team of linguists (Google Pygmalion) working on Google Assistant, for example, in 2016 was made up of around 100 Ph.D. linguists. In an interview with Wired Magazine, Google Product Manager, David Orr explained how the company still needed its team of Ph.D. linguists who label parts of speech (referring to this as the ‘gold’ data), in ways that help neural nets understand how human language works. Orr said of Pygmalion:
“The team spans between 20 and 30 languages. But the hope is that companies like Google can eventually move to a more automated form of AI called ‘unsupervised learning.'”
By 2019, the Pygmalion team was an army of 200 linguists around the globe made up of a mixture of both permanent and agency staff, but was not without its challenges due to the laborious and disheartening nature of manual tagging work, and the long hours involved. In the same Wired article, Chris Nicholson, who is the founder of a deep learning company called Skymind commented about the un-scaleable nature of projects like Google Pygmalion, particularly from an internationalisation perspective, since part of speech tagging would need to be carried out by linguists across all the languages of the world to be truly multilingual.
Internationalization of conversational search
The manual tagging involved in Pygmalion does not appear to take into consideration any transferable natural phenomenons of computational linguistics. For example, Zipfs Law, a distributional frequency power law, dictates that in any given language the distributional frequency of a word is proportional to one over its rank, and this holds true even for languages not yet translated.
Uni-directional nature of ‘context windows’ in RNNs (Recurrent Neural Networks)
Training models in the likes of Skip-gram and Continuous Bag of Words are Uni-Directional in that the context-window containing the target word and the context words around it to the left and to the right only go in one direction. The words after the target word are not yet seen so the whole context of the sentence is incomplete until the very last word, which carries the risk of some contextual patterns being missed. A good example is provided of the challenge of uni-directional moving context-windows by Jacob Uszkoreit on the Google AI blog when talking about the transformer architecture. Deciding on the most likely meaning and appropriate representation of the word “bank” in the sentence: “I arrived at the bank after crossing the…” requires knowing if the sentence ends in “… road.” or “… river.”
Text cohesion missing
The uni-directional training approaches prevent the presence of text cohesion. Like Ludwig Wittgenstein, a philosopher famously said in 1953:
“The meaning of a word is its use in the language.” (Wittgenstein, 1953)
Often the tiny words and the way words are held together are the ‘glue’ which bring common sense in language. This ‘glue’ overall is called ‘text cohesion’. It’s the combination of entities and the different parts-of-speech around them formulated together in a particular order which makes a sentence have structure and meaning. The order in which a word sits in a sentence or phrase too also adds to this context. Without this contextual glue of these surrounding words in the right order, the word itself simply has no meaning. The meaning of the same word can change too as a sentence or phrase develops due to dependencies on co-existing sentence or phrase members, changing context with it. Furthermore, linguists may disagree over which particular part-of-speech in a given context a word belongs to in the first place. Let us take the example word “bucket.” As humans we can automatically visualize a bucket that can be filled with water as a “thing,” but there are nuances everywhere. What if the word bucket word were in the sentence “He kicked the bucket,” or “I have yet to cross that off my bucket list?” Suddenly the word takes on a whole new meaning. Without the text-cohesion of the accompanying and often tiny words around “bucket” we cannot know whether bucket refers to a water-carrying implement or a list of life goals.
Word embeddings are context-free
The word embedding model provided by the likes of Word2Vec knows the words somehow live together but does not understand in what context they should be used. True context is only possible when all of the words in a sentence are taken into consideration. For example, Word2Vec does not know when river (bank) is the right context, or bank (deposit). Whilst later models such as ELMo trained on both the left side and right side of a target word, these were carried out separately rather than looking at all of the words (to the left and the right) simultaneously, and still did not provide true context.
Polysemy and homonymy handled incorrectly
Word embeddings like Word2Vec do not handle polysemy and homonyms correctly. As a single word with multiple meanings is mapped to just one single vector. Therefore there is a need to disambiguate further. We know there are many words with the same meaning (for example, ‘run’ with 606 different meanings), so this was a shortcoming. As illustrated earlier polysemy is particularly problematic since polysemous words have the same root origins and are extremely nuanced.
Coreference resolution still problematic
Search engines were still struggling with the challenging problem of anaphora and cataphora resolution, which was particularly problematic for conversational search and assistant which may have back and forth multi-turn questions and answers. Being able to track which entities are being referred to is critical for these types of spoken queries.
Shortage of training data
Modern deep learning-based NLP models learn best when they are trained on huge amounts of annotated training examples, and a lack of training data was a common problem holding back the research field overall.
So, how does BERT help improve search engine language understanding?
With these short-comings above in mind, how has BERT helped search engines (and other researchers) to understand language?
What makes BERT so special?
There are several elements that make BERT so special for search and beyond (the World – yes, it is that big as a research foundation for natural language processing). Several of the special features can be found in BERT’s paper title – BERT: Bi-directional Encoder Representations from Transformers. B – Bi-Directional E – Encoder R – Representations T – Transformers But there are other exciting developments BERT brings to the field of natural language understanding too. These include:
Pre-training from unlabelled text
Bi-directional contextual models
The use of a transformer architecture
Masked language modeling
Focused attention
Textual entailment (next sentence prediction)
Disambiguation through context open-sourced
Pre-training from unlabeled text
The ‘magic’ of BERT is its implementation of bi-directional training on an unlabelled corpus of text since for many years in the field of natural language understanding, text collections had been manually tagged up by teams of linguists assigning various parts of speech to each word. BERT was the first natural language framework/architecture to be pre-trained using unsupervised learning on pure plain text (2.5 billion words+ from English Wikipedia) rather than labeled corpora. Prior models had required manual labeling and the building of distributed representations of words (word embeddings and word vectors), or needed part of speech taggers to identify the different types of words present in a body of text. These past approaches are similar to the tagging we mentioned earlier by Google Pygmalion. BERT learns language from understanding text cohesion from this large body of content in plain text and is then educated further by fine-tuning on smaller, more specific natural language tasks. BERT also self-learns over time too.
Bi-directional contextual models
BERT is the first deeply bi-directional natural language model, but what does this mean?
Bi-directional and uni-directional modeling
True contextual understanding comes from being able to see all the words in a sentence at the same time and understand how all of the words impact the context of the other words in the sentence too. The part of speech a particular word belongs to can literally change as the sentence develops. For example, although unlikely to be a query, if we take a spoken sentence which might well appear in natural conversation (albeit rarely):
“I like how you like that he likes that.”
as the sentence develops the part of speech which the word “like” relates to as the context builds around each mention of the word changes so that the word “like,” although textually is the same word, contextually is different parts of speech dependent upon its place in the sentence or phrase. Past natural language training models were trained in a uni-directional manner. Word’s meaning in a context window moved along from either left to right or right to left with a given number of words around the target word (the word’s context or “it’s company”). This meant words not yet seen in context cannot be taken into consideration in a sentence and they might actually change the meaning of other words in natural language. Uni-directional moving context windows, therefore, have the potential to miss some important changing contexts. For example, in the sentence:
“Dawn, how are you?”
The word “are” might be the target word and the left context of “are” is “Dawn, how.” The right context of the word is “you.” BERT is able to look at both sides of a target word and the whole sentence simultaneously in the way that humans look at the whole context of a sentence rather than looking at only a part of it. The whole sentence, both left and right of a target word can be considered in the context simultaneously.
Transformers / Transformer architecture
Most tasks in natural language understanding are built on probability predictions. What is the likelihood that this sentence relates to the next sentence, or what is the likelihood that this word is part of that sentence? BERT’s architecture and masked language modeling prediction systems are partly designed to identify ambiguous words that change the meanings of sentences and phrases and identify the correct one. Learnings are carried forward increasingly by BERT’s systems. The Transformer uses fixation on words in the context of all of the other words in sentences or phrases without which the sentence could be ambiguous. This fixated attention comes from a paper called ‘Attention is all you need’ (Vaswani et al, 2017), published a year earlier than the BERT research paper, with the transformer application then built into the BERT research. Essentially, BERT is able to look at all the context in text-cohesion by focusing attention on a given word in a sentence whilst also identifying all of the context of the other words in relation to the word. This is achieved simultaneously using transformers combined with bi-directional pre-training. This helps with a number of long-standing linguistic challenges for natural language understanding, including coreference resolution. This is because entities can be focused on in a sentence as a target word and their pronouns or the noun-phrases referencing them resolved back to the entity or entities in the sentence or phrase. In this way the concepts and context of who, or what, a particular sentence is relating to specifically, is not lost along the way. Furthermore, the focused attention also helps with the disambiguation of polysemous words and homonyms by utilizing a probability prediction / weight based on the whole context of the word in context with all of the other words in the sentence. The other words are given a weighted attention score to indicate how much each adds to the context of the target word as a representation of “meaning.” Words in a sentence about the “bank” which add strong disambiguating context such as “deposit” would be given more weight in a sentence about the “bank” (financial institute) to resolve the representational context to that of a financial institute. The encoder representations part of the BERT name is part of the transformer architecture. The encoder is the sentence input translated to representations of words meaning and the decoder is the processed text output in a contextualized form. In the image below we can see that ‘it’ is strongly being connected with “the” and “animal” to resolve back the reference to “the animal” as “it” as a resolution of anaphora.
This fixation also helps with the changing “part of speech” a word’s order in a sentence could have since we know that the same word can be different parts of speech depending upon its context. The example provided by Google below illustrates the importance of different parts of speech and word category disambiguation. Whilst a tiny word, the word ‘to’ here changes the meaning of the query altogether once it is taken into consideration in the full context of the phrase or sentence.
Masked Language Modelling (MLM Training)
Also known as “the Cloze Procedure,” which has been around for a very long time. The BERT architecture analyzes sentences with some words randomly masked out and attempts to correctly predict what the “hidden” word is. The purpose of this is to prevent target words in the training process passing through the BERT transformer architecture from inadvertently seeing themselves during bi-directional training when all of the words are looked at together for combined context. Ie. it avoids a type of erroneous infinite loop in natural language machine learning, which would skew word’s meaning.
Textual entailment (next sentence prediction)
One of the major innovations of BERT is that it is supposed to be able to predict what you’re going to say next, or as the New York Times phrased it in Oct 2018, “Finally, a machine that can finish your sentences.” BERT is trained to predict from pairs of sentences whether the second sentence provided is the right fit from a corpus of text. NB: It seems this feature during the past year was deemed as unreliable in the original BERT model and other open-source offerings have been built to resolve this weakness. Google’s ALBERT resolves this issue. Textual entailment is a type of “what comes next?” in a body of text. In addition to textual entailment, the concept is also known as ‘next sentence prediction’. Textual entailment is a natural language processing task involving pairs of sentences. The first sentence is analyzed and then a level of confidence determined to predict whether a given second hypothesized sentence in the pair “fits” logically as the suitable next sentence, or not, with either a positive, negative, or neutral prediction, from a text collection under scrutiny. Three examples from Wikipedia of each type of textual entailment prediction (neutral / positive / negative) are below. Textual Entailment Examples (Source: Wikipedia) An example of a positive TE (text entails hypothesis) is:
text: If you help the needy, God will reward you. hypothesis: Giving money to a poor man has good consequences.
An example of a negative TE (text contradicts hypothesis) is:
text: If you help the needy, God will reward you. hypothesis: Giving money to a poor man has no consequences.
An example of a non-TE (text does not entail nor contradict) is:
text: If you help the needy, God will reward you. hypothesis: Giving money to a poor man will make you a better person.
Disambiguation breakthroughs from open-sourced contributions
BERT has not just appeared from thin air, and BERT is no ordinary algorithmic update either since BERT is also an open-source natural language understanding framework as well. Ground-breaking “disambiguation from context empowered by open-sourced contributions,” could be used to summarise BERT’s main value add to natural language understanding. In addition to being the biggest change to Google’s search system in five years (or ever), BERT also represents probably the biggest leap forward in growing contextual understanding of natural language by computers of all time. Whilst Google BERT may be new to the SEO world it is well known in the NLU world generally and has caused much excitement over the past 12 months. BERT has provided a hockey stick improvement across many types of natural language understanding tasks not just for Google, but a myriad of both industrial and academic researchers seeking to utilize language understanding in their work, and even commercial applications. After the publication of the BERT research paper, Google announced they would be open-sourcing vanilla BERT. In the 12 months since publication alone, the original BERT paper has been cited in further research 1,997 times at the date of writing. There are many different types of BERT models now in existence, going well beyond the confines of Google Search. A search for Google BERT in Google Scholar returns hundreds of 2019 published research paper entries extending on BERT in a myriad of ways, with BERT now being used in all manner of research into natural language. Research papers traverse an eclectic mix of language tasks, domain verticals (for example clinical fields), media types (video, images) and across multiple languages. BERT’s use cases are far-reaching, from identifying offensive tweets using BERT and SVMs to using BERT and CNNs for Russian Troll Detection on Reddit, to categorizing via prediction movies according to sentiment analysis from IMDB, or predicting the next sentence in a question and answer pair as part of a dataset. Through this open-source approach, BERT goes a long way toward solving some long-standing linguistic problems in research, by simply providing a strong foundation to fine-tune from for anyone with a mind to do so. The codebase is downloadable from the Google Research Team’s Github page. By providing Vanilla BERT as a great ‘starter for ten’ springboard for machine learning enthusiasts to build upon, Google has helped to push the boundaries of State of the art (SOTA) natural language understanding tasks. Vanilla BERT can be likened to a CMS plugins, theme, or module which provides a strong foundation for a particular functionality but can then be developed further. Another simpler similarity might be likening the pre-training and fine-tuning parts of BERT for machine learning engineers to buying an off-the-peg suit from a high street store then visiting a tailor to turn up the hems so it is fit for purpose at a more unique needs level. As Vanilla BERT comes pre-trained (on Wikipedia and Brown corpus), researchers need only fine-tune their own models and additional parameters on top of the already trained model in just a few epochs (loops / iterations through the training model with the new fine-tuned elements included). At the time of BERT’s October 2018, paper publication BERT beat state of the art (SOTA) benchmarks across 11 different types of natural language understanding tasks, including question and answering, sentiment analysis, named entity determination, sentiment classification and analysis, sentence pair-matching and natural language inference. Furthermore, BERT may have started as the state-of-the-art natural language framework but very quickly other researchers, including some from other huge AI-focused companies such as Microsoft, IBM and Facebook, have taken BERT and extended upon it to produce their own record-beating open-source contributions. Subsequently, models other than BERT have become state of the art since BERT’s release. Facebook’s Liu et al entered the BERTathon with their own version extending upon BERT – RoBERTa. claiming the original BERT was significantly undertrained and professing to have improved upon, and beaten, any other model versions of BERT up to that point. Microsoft also beat the original BERT with MT-DNN, extending upon a model they proposed in 2015 but adding on the bi-directional pre-training architecture of BERT to improve further.
There are many other BERT-based models too, including Google’s own XLNet and ALBERT (Toyota and Google), IBM’s BERT-mtl, and even now Google T5 emerging.
The field is fiercely competitive and NLU machine learning engineer teams compete with both each other and non-expert human understanding benchmarks on public leaderboards, adding an element of gamification to the field. Amongst the most popular leaderboards are the very competitive SQuAD, and GLUE. SQuAD stands for The Stanford Question and Answering Dataset which is built from questions based on Wikipedia articles with answers provided by crowdworkers. The current SQuAD 2.0 version of the dataset is the second iteration created because SQuAD 1.1 was all but beaten by natural language researchers. The second-generation dataset, SQuAD 2.0 represented a harder dataset of questions, and also contained an intentional number of adversarial questions in the dataset (questions for which there was no answer). The logic behind this adversarial question inclusion is intentional and designed to train models to learn to know what they do not know (i.e an unanswerable question). GLUE is the General Language Understanding Evaluation dataset and leaderboard. SuperGLUE is the second generation of GLUE created because GLUE again became too easy for machine learning models to beat.
Most of the public leaderboards across the machine learning field double up as academic papers accompanied by rich question and answer datasets for competitors to fine-tune their models on. MS MARCO, for example, is an academic paper, dataset and accompanying leaderboard published by Microsoft; AKA Microsoft MAchine Reaching COmprehension Dataset. The MSMARCO dataset is made up of over a million real Bing user queries and over 180,000 natural language answers. Any researchers can utilize this dataset to fine-tune models.
Efficiency and computational expense
Late 2018 through 2019 can be remembered as a year of furious public leaderboard leap-frogging to create the current state of the art natural language machine learning model. As the race to reach the top of the various state of the art leaderboards heated up, so too did the size of the model’s machine learning engineers built and the number of parameters added based on the belief that more data increases the likelihood for more accuracy. However as model sizes grew so did the size of resources needed for fine-tuning and further training, which was clearly an unsustainable open-source path. Victor Sanh, of Hugging Face (an organization seeking to promote the continuing democracy of AI) writes, on the subject of the drastically increasing sizes of new models:
“The latest model from Nvidia has 8.3 billion parameters: 24 times larger than BERT-large, 5 times larger than GPT-2, while RoBERTa, the latest work from Facebook AI, was trained on 160GB of text 😵”
To illustrate the original BERT sizes – BERT-Base and BERT-Large, with 3 times the number of parameters of BERT-Base. BERT–Base, Cased : 12-layer, 768-hidden, 12-heads , 110M parameters. BERT–Large, Cased : 24-layer, 1024-hidden, 16-heads, 340M parameters. Escalating costs and data sizes meant some more efficient, less computationally and financially expensive models needed to be built.
Welcome Google ALBERT, Hugging Face DistilBERT and FastBERT
Google’s ALBERT, was released in September 2019 and is a joint work between Google AI and Toyota’s research team. ALBERT is considered BERT’s natural successor since it also achieves state of the art scores across a number of natural language processing tasks but is able to achieve these in a much more efficient and less computationally expensive manner. Large ALBERT has 18 times fewer parameters than BERT-Large. One of the main standout innovations with ALBERT over BERT is also a fix of a next-sentence prediction task which proved to be unreliable as BERT came under scrutiny in the open-source space throughout the course of the year. We can see here at the time of writing, on SQuAD 2.0 that ALBERT is the current SOTA model leading the way. ALBERT is faster and leaner than the original BERT and also achieves State of the Art (SOTA) on a number of natural language processing tasks.

Other efficiency and budget focused, scaled-down BERT type models recently introduced are DistilBERT, purporting to be smaller, lighter, cheaper and faster, and FastBERT.
So, what does BERT mean for SEO?
BERT may be known among SEOs as an algorithmic update, but in reality, it is more “the application” of a multi-layer system that understands polysemous nuance and is better able to resolve co-references about “things” in natural language continually fine-tuning through self-learning. The whole purpose of BERT is to improve human language understanding for machines. In a search perspective this could be in written or spoken queries issued by search engine users, and in the content search engines gather and index. BERT in search is mostly about resolving linguistic ambiguity in natural language. BERT provides text-cohesion which comes from often the small details in a sentence that provides structure and meaning. BERT is not an algorithmic update like Penguin or Panda since BERT does not judge web pages either negatively or positively, but more improves the understanding of human language for Google search. As a result, Google understands much more about the meaning of content on pages it comes across and also the queries users issue taking word’s full context into consideration.
BERT is about sentences and phrases
Ambiguity is not at a word level, but at a sentence level, since it is about the combination of words with multiple meanings which cause ambiguity.
BERT helps with polysemic resolution
Google BERT helps Google search to understand “text-cohesion” and disambiguate in phrases and sentences, particularly where polysemic nuances could change the contextual meaning of words. In particular, the nuance of polysemous words and homonyms with multiple meanings, such as ‘to’, ‘two’, ‘to’, and ‘stand’ and ‘stand’, as provided in the Google examples, illustrate the nuance which had previously been missed, or misinterpreted, in search.
Ambiguous and nuanced queries impacted
The 10% of search queries which BERT will impact may be very nuanced ones impacted by the improved contextual glue of text cohesion and disambiguation. Furthermore, this might well impact understanding even more of the 15% of new queries which Google sees every day, many of which relate to real-world events and burstiness / temporal queries rather than simply long-tailed queries.
Recall and precision impacted (impressions?)
Precision in ambiguous query meeting will likely be greatly improved which may mean query expansion and relaxation to include more results (recall) may be reduced. Precision is a measure of result quality, whereas recall simply relates to return any pages which may be relevant to a query. We may see this reduction in recall reflected in the number of impressions we see in Google Search Console, particularly for pages with long-form content which might currently be in recall for queries they are not particularly relevant for.
BERT will help with coreference resolution
BERT(the research paper and language model)’s capabilities with coreference resolution means the Google algorithm likely helps Google Search to keep track of entities when pronouns and noun-phrases refer to them. BERT’s attention mechanism is able to focus on the entity under focus and resolve all references in sentences and phrases back to that using a probability determination / score. Pronouns of “he,” “she,” “they,” “it” and so forth will be much easier for Google to map back in both content and queries, spoken and in written text. This may be particularly important for longer paragraphs with multiple entities referenced in text for featured snippet generation and voice search answer extraction / conversational search.
BERT serves a multitude of purposes
Google BERT is probably what could be considered a Swiss army knife type of tool for Google Search. BERT provides a solid linguistic foundation for Google search to continually tweak and adjust weights and parameters since there are many different types of natural language understanding tasks that could be undertaken. Tasks may include:
Coreference resolution (keeping track of who, or what, a sentence or phrase refers to in context or an extensive conversational query)
Polysemy resolution (dealing with ambiguous nuance)
Homonym resolution (dealing with understanding words which sound the same, but mean different things
Named entity determination (understanding which, from a number of named entities, text relates to since named entity recognition is not named entity determination or disambiguation), or one of many other tasks.
Textual entailment (next sentence prediction)
BERT will be huge for conversational search and assistant
Expect a quantum leap forward in terms of relevance matching to conversational search as Google’s in-practice model continues to teach itself with more queries and sentence pairs. It’s likely these quantum leaps will not just be in the English language, but very soon, in international languages too since there is a feed-forward learning element within BERT that seems to transfer to other languages.
BERT will likely help Google to scale conversational search
Expect over the short to medium term a quantum leap forward in application to voice search however since the heavy lifting of building out the language understanding held back by Pygmalion’s manual process could be no more. The earlier referenced 2016 Wired article concluded with a definition of AI automated, unsupervised learning which might replace Google Pygmalion and create a scalable approach to train neural nets:
“This is when machines learn from unlabeled data – massive amounts of digital information culled from the internet and other sources.” (Wired, 2016)
This sounds like Google BERT. We also know featured snippets were being created by Pygmalion too. While it is unclear whether BERT will have an impact on Pygmalion’s presence and workload, nor if featured snippets will be generated in the same way as previously, Google has announced BERT will be used for featured snippets and is pre-trained on purely a large text corpus. Furthermore, the self-learning nature of a BERT type foundation continually fed queries and retrieving responses and featured snippets will naturally pass the learnings forward and become even more fine-tuned. BERT, therefore, could provide a potentially, hugely scalable alternative to the laborious work of Pygmalion.
International SEO may benefit dramatically too
One of the major impacts of BERT could be in the area of international search since the learnings BERT picks up in one language seem to have some transferable value to other languages and domains too. Out of the box, BERT appears to have some multi-lingual properties somehow derived from a monolingual (single language understanding) corpora and then extended to 104 languages, in the form of M-BERT (Multilingual BERT). A paper by Pires, Schlinger & Garrette tested the multilingual capabilities of Multilingual BERT and found that it “surprisingly good at zero-shot cross-lingual model transfer.” (Pires, Schlinger & Garrette, 2019). This is almost akin to being able to understand a language you have never seen before since zero-shot learning aims to help machines categorize objects that they have never seen before.
Questions and answers
Question and answering directly in SERPs will likely continue to get more accurate which could lead to a further reduction in click through to sites. In the same way MSMARCO is used for fine-tuning and is a real dataset of human questions and answers from Bing users, Google will likely continue to fine-tune its model in real-life search over time through real user human queries and answers feeding forward learnings. As language continues to be understood paraphrase understanding improved by Google BERT might also impact related queries in “People Also Ask.”
Textual entailment (next sentence prediction)
The back and forth of conversational search, and multi-turn question and answering for assistant will also likely benefit considerably from BERT’s ‘textual entailment’ (next sentence prediction) feature, particularly the ability to predict “what comes next” in a query exchange scenario. However, this might not seem apparent as quickly as some of the initial BERT impacts. Furthermore, since BERT can understand different meanings for the same things in sentences, aligning queries formulated in one way and resolving them to answers which amount to the same thing will be much easier. I asked Dr. Mohammad Aliannejadi about the value BERT provides for conversational search research. Dr. Aliannejadi is an information retrieval researcher who recently defended his Ph.D. research work on conversational search, supervised by Professor Fabio Crestani, one of the authors of “Mobile information retrieval.” Part of Dr. Aliannejadi’s research work explored the effects of asking clarifying questions for conversational assistants, and utilized BERT within its methodology. Dr. Aliannejadi spoke of BERT’s value:
“BERT represents the whole sentence, and so it is representing the context of the sentence and can model the semantic relationship between two sentences. The other powerful feature is the ability to fine-tune it in just a few epochs. So, you have a general tool and then make it specific to your problem.”
Named entity determination
One of the natural language processing tasks undertaken by the likes of a fine-tuned BERT model could be entity determination. Entity determination is deciding the probability that a particular named entity is being referred to from more than one choice of named entity with the same name. Named entity recognition is not named entity disambiguation nor named entity determination. In an AMA on Reddit Google’s Gary Illyes confirmed that unlinked mentions of brand names can be used for this named entity determination purpose currently.
BERT will assist with understanding when a named entity is recognized but could be one of a number of named entities with the same name as each other. An example of multiple named entities with the same name is in the example below. Whilst these entities may be recognized by their name they need to be disambiguated one from the other. Potentially an area BERT can help with. We can see from a search in Wikipedia below the word “Harris” returns many named entities called “Harris.”
BERT could be BERT by name, but not by nature
It is not clear whether the Google BERT update uses the original BERT or the much leaner and inexpensive ALBERT, or another hybrid variant of the many models now available, but since ALBERT can be fine-tuned with far fewer parameters than BERT this might make sense. This could well mean the algorithm BERT in practice may not look very much at all like the original BERT in the first published paper, but a more recent improved version which looks much more like the (also open-sourced) engineering efforts of others aiming to build the latest SOTA models. BERT may be a completely re-engineered large scale production version, or a more computationally inexpensive and improved version of BERT, such as the joint work of Toyota and Google, ALBERT. Furthermore, BERT may continue to evolve into other models since Google T5 Team also now has a model on the public SuperGLUE leaderboards called simply T5. BERT may be BERT in name, but not in nature.
Can you optimize your SEO for BERT?
Probably not. The inner workings of BERT are complex and multi-layered. So much so, there is now even a field of study called “Bertology” which has been created by the team at Hugging Face. It is highly unlikely any search engineer questioned could explain the reasons why something like BERT would make the decisions it does with regards to rankings (or anything). Furthermore, since BERT can be fine-tuned across parameters and multiple weights then self-learns in an unsupervised feed-forward fashion, in a continual loop, it is considered a black-box algorithm. A form of unexplainable AI. BERT is thought to not always know why it makes decisions itself. How are SEOs then expected to try to “optimize” for it? BERT is designed to understand natural language so keep it natural. We should continue to create compelling, engaging, informative and well-structured content and website architectures in the same way you would write, and build sites, for humans. The improvements are on the search engine side of things and are a positive rather than a negative. Google simply got better at understanding the contextual glue provided by text cohesion in sentences and phrases combined and will become increasingly better at understanding the nuances as BERT self-learns.
Search engines still have a long way to go
Search engines still have a long way to go and BERT is only a part of that improvement along the way, particularly since a word’s context is not the same as search engine user’s context, or sequential informational needs which are infinitely more challenging problems. SEOs still have a lot of work to do to help search engine users find their way and help to meet the right informational need at the right time.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
About The Author

Dawn Anderson is a SEO & Search Digital Marketing Strategist focusing on technical, architectural and database-driven SEO. Dawn is the director of Move It Marketing.
https://www.businesscreatorplus.com/a-deep-dive-into-bert-how-bert-launched-a-rocket-into-natural-language-understanding/
0 notes
Text

#tumblr #sciencefiction #brain #mind #thinking #deepmind #deepthoughts #deepthinking #human #futuristic #evolution #philosophy
0 notes
Text
AI and Conflict of Interest in Society
Assuming that the future will be characterised by not only advanced general AI and superintelligence but will also entail mergence between human ‘mind’ and AI, it is interesting to imagine conflict of interest in AI-future.
Society comprises people; people (either as individuals or as groups, doesn’t matter whether institutionalised or not) have different interests; the interests are in conflict with each other especially because of the fact that resources are limited and politics of allocation of resources to serve those interests (‘prioritization’) mostly produces skewed results because policy making is hijacked by a few. Drawing a broad conclusion, all conflicts in the world are result of the dynamics described in the preceding sentences.
Human beings have interests because they need to survive not only biologically but also socially by continuously participating in the process of advancing ahead in life according to standards set by society. Society, in order to set the standards, allocates value to certain things and human beings strive to attain and accumulate those things to progress ahead socially. Economic status is one such standard on which social system is based. In the process of what can be called as development, human beings tend to exclude each other because resources are limited. In other words, everyone is trying to make it to the top, according to standards set by society, using the available limited resources and thus, in the process, ending up working against interests of other people. This causes conflict of interests and gives rise to other problems.
Though cooperation can be spotted, transiently, as long as attainment of common goal is concerned, it is pertinent to note that common goal is relevant not because it serves as a common platform for the whole society to come together and strive for a common aim, in essence, nor is it seen because society offers pure incentive to work in cooperation with each other; it is relevant because it is cumulation of individualistic goals. In other words, cooperation is just a more efficient way of achieving individualistic personal goals, if common referred to as common goals, and the same is the result of how societal structure has been shaped by influencing human factors.
Machines/robots/AI (‘AI’) do not and will not have interests because they are outside the social system and thus do not need satisfaction of those interests for social survival. If human beings, being conscious beings, perceive AI as part or extension of social scenario, AI are inherently outside the domain. Even if machines are able to become conscious beings, which is very unlikely, they will not be conscious the way human beings are. This implies that AI and humans cannot share the same conceptualisation of social dynamics because their understanding of the same is rooted in their consciousness which cannot be the same. Therefore, AI, despite being conscious, will not engage in conflict of interests and prioritization the way humans do. It could be something else, but not this.
In case of humans and robotised-humans (as this is what one can conclude from the work being done by companies like Neuralink and Deepmind) in AI-future, no one would need to worry about satisfaction of needs in view of unlimited resources (extra-terrestrial mining and artificial generation of food), exponential economic growth with increased productivity and output.
In closing, society might witness absence of major types of conflicts that we have witnessed through the evolution of civilisation.
AI and Conflict of Interest in Society was originally published on Shenzhen Blog
0 notes
Text
Everything You Should Know About Machine Learning

Programming Computers: Then and Now I find it fascinating that today you can define certain rules and provide enough historical data to a computer, reward it for reaching closer to the goal and punish it for doing bad, which will get it trained to do a specific task. Based on these rules and data, the machine can be programmed to learn to do tasks so well that we humans have no way of knowing what steps it is explicitly following to get the work done. It’s like the brain, you can’t slice it open and understand the inner workings. The days when we used to define each step for the computer to take are now numbered. The role we played back then, of a god to the computers has been reduced to something like that of a dog trainer. The tables are turning from commanding machines to parenting them. Rather than creating code, we are turning into trainers. Computers are learning. It has been called machine learning, for quite a while now (defined in 1959 by by Arthur Samuel). Other names being artificial intelligence, deep simulation or cognitive computing. However now, it really has picked up and based on the amazing things it can help computers do now, it is clearly going to be the future of what the IT industry will transform into. What made Machine learning reach this inflection point it is at right now? Three things: 1.We now have better algorithms. 2.Drastic explosion in computing power. 3.We as human beings have amassed a large amount of data that the machines can learn from. Formal definition (from 1959): “Field of study that gives computers the ability to learn without being explicitly programmed.” Everyday Examples of Machine Learning at Work
Sounds a wee bit like fiction, but it’s all around us. Consider the Gmail’s spam filter, or the way Gmail can sort through your stuff and classify them into primary mail, updates, promotions and social. Have you ever wondered how Google news reads millions of news articles every day and sorts them into categories for you to find? And the way Facebook sorts through thousands of posts and lets only the ones it thinks you would like on your wall? Of which I’m not a big fan.
Note how Google news can gather the similar stories from a number of sources and put them together. This is not the task of a human being, or even a massive team – updating hundreds of such stories every minute. Imagine the number of jobs machine learning will cut!
The best one in my opinion is the way my Google photos app (Play store, Apple app store) can return a picture of “sea” from the photos I had clicked at a beach 9 years ago. Manually, even by using all that I know about my own life, I would have taken hours to sort through my own photos and find it! And Google photos search can sort through thousands of my old pictures, understand which one has a sea in it and show it to me within a fraction of a second. It indeed is fascinating! The best part, it automatically uploads newly taken pictures and can clean your local disk if you tell it to. Plus, you get to store unlimited number of photos (if they are less than 16 megapixels). I no longer store pictures on my iPhone. If you are wondering, let me also tell you, Google did not pay me to say all that. Yes, it is that cool of an app. The Game of Go – Conquered Thanks to machine learning, far more complex artificial intelligence has now learned to defeat the human champion of this game called Go. At the first glance, Go looks like a simple game with players taking turns to put black and white stones on a 19×19 grid. A player get’s to choose from around 200 moves when it is his/her turn (the number is 35 for chess). And if you think about the whole game, the number of possible positions the stones can be put on a Go board exceeds the number of atoms in the universe!

Demis Hassabis of Google DeepMind with the world Go champion, Lee Sedol. Go is a game like chess, which unlike chess requires a player to make moves which involve a profoundly complex way of “feel and intuition” which had been hard to make computers understand, until now. Recently, if you had been following the news, you must have heard that an AI programmed by Google’s engineers, AlphaGo defeated the human world champion of Go in the past decade, Lee Sedol, over and over again. The five match series ended in a 4-1 win for AlphaGo. The live streams of all the matches are available for anyone to watch on YouTube. The inner workings of AlphaGo were published in a highly reputed scientific journal. It reports how the Google’s AI bot learned to master Go by combining a variety of techniques such as Monte-Carlo tree search and Deep neural networks that were trained by showing a huge number of past human Go matches to it. And they go rogue too With a mind of their own, machines are no less humans anymore. They learn from us and evolve with us. Thus they are prone to learning from our biases. The very app I was lauding before, Google photos, which has a tagging system based on machine learning, created a furor just last year by tagging two dark skinned people as gorillas. Google engineer quickly apologized for the AI’s f*** up and as a quick fix removed the tag of gorilla altogether. This wasn’t exactly the system going rogue, but needed a long term fix to clearly distinguish between dark skinned humans and primates. Google is working on it.
One major example of an AI going rogue always brings up the example of the Microsoft’s twitter bot Tay. It was designed to learn from the humans of Twitter and start conversing like human beings. It was an experiment about “Conversational Understanding.” Now, based on how people talked, learning from everyone’s micro-blogs, trolls included, the AI in under 24 hours learned to be a racist a$$h0|3. Although not even close to reaching Skynet like rogue levels, this is still a slightly frightening thing about AI. Learning machine learning Far from all of the complex algorithms Google, Facebook and the other large corporations with immense amount of talent can manage to create, I believe everything has a humble beginning. I mean even the individuals who make up the team of AlphaGo must have each started from the bottom because humans are not born with the complete knowledge of machine learning pre-installed in their minds. Agreed, they must have started much earlier, must have amazing brains and a range of other random factors that might have contributed to their superior state. So what? Thus, utilizing the free course ware available to everyone with a decent internet connection, I decided to start learning about machine learning myself. So what if I do not reach that level. The very joy of learning it comes with, is a thing enough for me to stick. In my search for the course material I landed on this springboard page that listed 9 courses, classified into three groups – beginner, intermediate and advanced machine learning courses with three courses in each category. Fascinated by all of these amazing feats that a computer can be trained to perform, I wanted to dip my own toes in it too. With no idea of what this course would cover and if I would even be able to understand it, I bravely started the beginner machine learning course offered by Stanford on Coursera (link). It being a self-paced course, I went all in for the first 3 days. In these three days I covered 3 week’s course material, including all of the assignments and quizzes. Soon I realized that was quite a bit of information. So I spent the next two days reviewing notes and taking up a self-designed project. I will soon be updating the progress on this project here. For the hint, it was an idea I got from my recent Los Angeles trip while I was waiting in line for the Harry potter ride. From whatever little understanding I have about Machine learning, I believe that it is a technique which parents a range of algorithms that can help a computer get trained. To my surprise the first course I am taking involved the basics of what I was already doing during my research – fitting data to curves. Although in my research I had be doing much more complex fitting to get measurable parameters from the collected data, the course inclined towards talking about the actual inner workings of regression in statistics to specifically build the foundation for more complex machine learning algorithms to come later. I never knew that simple regression / fitting was a part of machine learning, moreover an essential building block! In my opinion, anyone who deals with any kind of data must take this course, just for the fun of it. You will definitely learn something useful. My goal for now is to be able to program neural networks at some point in the future. Later, I want to be able to implement neural evolution algorithms, which I suppose is based on a genetic algorithm and allows biology like implementation to computer programs. Extremely fascinating, but then I’m talking about something I do not know much about. http://awesci.com/everything-you-should-know-about-machine-learning/ #ajpinfo READ MORE:

Sex as a christmas gift?

Porsche club events must be amazing – why?

Nine Principles to Living 100+

On Not Knowing
Pet Food Tasters

You Need to Know About Driverless Technology Changing the Automotive Industry

7 Health Benefits of Chocolate — Backed by Science!

Child Birth or Getting Kicked in the Balls

The Marvelous Eyes Of Spider – Do you know?

Crocodiles Do Not Die

PEOPLE ARE AWESOME – AMAZING

Titanic – if you love history

Quotes On Friendship To Warm Your Best Friend’s Heart

How to look good – How to be a Boss

Earth – One Amazing Day

Metal-Eating Man
Read the full article
0 notes
Text
AlphaGo Zero trains itself to be most powerful Go player in the world
AlphaGo Zero trains itself to be most powerful Go player in the world
(credit: DeepMind) Deep Mind has just announced AlphaGo Zero, an evolution of AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go. Zero is even more powerful and is now arguably the strongest Go player in history, according to the company. While previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn…
View On WordPress
0 notes
Text
AlphaGo Zero trains itself to be most powerful Go player in the world
AlphaGo Zero trains itself to be most powerful Go player in the world
(credit: DeepMind) Deep Mind has just announced AlphaGo Zero, an evolution of AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go. Zero is even more powerful and is now arguably the strongest Go player in history, according to the company. While previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn…
View On WordPress
0 notes
Text
AlphaGo Zero trains itself to be most powerful Go player in the world
AlphaGo Zero trains itself to be most powerful Go player in the world
(credit: DeepMind) Deep Mind has just announced AlphaGo Zero, an evolution of AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go. Zero is even more powerful and is now arguably the strongest Go player in history, according to the company. While previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn…
View On WordPress
0 notes