annoyinglyunlikelystudent-blog - Tumblr blog

annoyinglyunlikelystudent-blog

Untitled

11 posts

Don't wanna be here? Send us removal request.

Statistics

We looked inside some of the posts by annoyinglyunlikelystudent-blog and here's what we found interesting.

Average Info

Notes Per Post

Likes Per Post

Reblog Per Post

Reply Per Post

Time Between Posts

4 months

Number of Posts By Type

Text

Video

Link

Quote

Last Seen Tumblr Blogs

zazter-den

Deep End of the Dragon Den

2K posts

robfirsworld

Untitled

4 posts

emotiondelivered

Emotions Delivered

192 posts

ankaratravestiarzu

Ankara Travesti Arzu

11 posts

ouresthetic

Our Esthetic

353 posts

Fun Fact

There are dozens of funny blogs to kill time on Tumblr.

annoyinglyunlikelystudent-blog · 3 years ago

Text

Alignment is a Never-ending Problem

You don’t solve alignment once

The prospective emergence of artificial general intelligence (AGI), specifically the emergence of a sentient AI capable of few-shot problem-solving, spontaneous improvisation, and other complex cognitive functions requiring abstraction has been posited by some to be an existential risk. Yet more optimistic futurists, such as Ray Kurzweil, have heralded AGI as a crucial step to the continued evolution of human civilization. Positive or not, this development will be at the intersection of the domains of artificial intelligence and consciousness.

It seems like AI is here and it is here to stay. Just like AI is evolving, we are evolving too! Our moral values are slowly but surely changing. Now imagine AI that is always aligned with human moral values. Wouldn’t it be very convenient if AI was keeping track of our ever-changing moral values and adjusting itself accordingly? Wouldn’t it be crucial to have the trust that decisions taken by AI are and will be in line with human moral values? AI that is always aligned with the ever-changing human moral values would make the integration of AI into society so much more successful, and easier. In the end, AI should serve humans, it should make human lives easier, and more comfortable. A decision taken by AI that is no longer aligned with the current human moral values will be of no use to humans. Therefore the use of AI will no longer be fulfilling its fundamental goal of serving humans. However, a changing AI that is always aligned with human moral values will in principle always be serving humans.

This leads us to the issue of AI alignment. This is the concept of instilling a superintelligent machine with human-compatible values - an incredibly difficult, if not insurmountable, task. AI ethics researchers have pointed to the tendency of instrumental convergence, in which intelligent agents may pursue similar sub-goals to radically different ends. An intelligent agent with apparently harmless sub-goals can ultimately act in surprisingly harmful ways, and vice versa – a benign ultimate goal might also be achieved via harmful means. For example, a computer tasked with producing an algorithm that solves an intractable mathematics problem, such as the P vs. NP problem, may, in an effort to increase its computational capacity, seize control of the entire global technological infrastructure in an attempt to assemble a giant supercomputer.

Pre-programming a superintelligent machine with a full set of human values is computationally intractable. Furthermore, AI's dynamic learning capabilities may precipitate its evolution into a system with unpredictable behavior, even without perturbations from new unanticipated external scenarios. In an attempt to design a new generation of itself, AI might inadvertently create a successor AI that is no longer beholden to the human-compatible moral values hard-coded into the original AI. AI researchers Stuart Russell and Peter Norvig argue that successful deployment of a completely safe AI requires that the AI not only be bug-free, but also be able to design new iterations of itself that are themselves bug-free.

In the "treacherous turn" scenario postulated by philosopher Nick Bostrom, the problem of AI misalignment becomes catastrophic when such AI systems correctly predict that humans will attempt to shut them down after the discovery of their misalignment, and successfully deploy their superintelligence to outmaneuver such attempts. Some AI researchers, such as Yann LeCun, believe that such existential conflict between humanity and AGI is highly unlikely, given that superintelligent machines will have no drive for self-preservation, and therefore there is no basis for such zero-sum conflicts.

It can be difficult for software engineers to specify the entire scope of desired and undesired behaviors for an AI. Instead, they use easy-to-specify objective functions that omit some constraints. Consequently, AI systems exploit the resulting loopholes and reward hacking heuristics emerge, in which AI accomplish their objective functions efficiently but in unintended, sometimes harmful, ways. This leads us to the following question: what algorithms and neural architectures can programmers implement to ensure that their recursively improving AI would continue to behave in a benign, rather than pernicious, manner after it attains superintelligence? One recommendation put forth is for a UN-sponsored "Benevolent AGI Treaty" that would mandate that only altruistic AIs be created – an untenably naïve suggestion, given that it is predicated upon programmers’ ability to foresee the unforeseeable future behaviors of their creation. Ultimately, AI alignment depends on progress in AI interpretability research and algorithmic fairness. In most AI systems, ethics will not be hard-coded, but will instead emerge as abstract value judgments extracted from lower-order objective functions such as natural language processing and reinforcement learning-derived decision-making. Given that AI learns such functions in a bottom-up manner, performance is highly dependent upon the provided data. Innate biases in the data will be maintained (and perhaps even exacerbated) by the algorithm. How does one extricate an algorithm from the biases of its creator?

0 notes