symbolicsystems - Tumblr blog

symbolicsystems · 5 years ago

Quote

We present a neurosymbolic framework for the lifelong learning of algorithmic tasks that mix perception and procedural reasoning. Reusing high-level concepts across domains and learning complex procedures are key challenges in lifelong learning. We show that a program synthesis approach that combines gradient descent with combinatorial search over programs can be a more effective response to these challenges than purely neural methods. Our framework, called HOUDINI, represents neural networks as strongly typed, differentiable functional programs that use symbolic higher-order combinators to compose a library of neural functions. Our learning algorithm consists of: (1) a symbolic program synthesizer that performs a type-directed search over parameterized programs, and decides on the library functions to reuse, and the architectures to combine them, while learning a sequence of tasks; and (2) a neural module that trains these programs using stochastic gradient descent. We evaluate HOUDINI on three benchmarks that combine perception with the algorithmic tasks of counting, summing, and shortest-path computation. Our experiments show that HOUDINI transfers high-level concepts more effectively than traditional transfer learning and progressive neural networks, and that the typed representation of networks significantly accelerates the search.

HOUDINI: Lifelong Learning as Program Synthesis

0 notes

symbolicsystems · 5 years ago

Quote

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.

Learning to learn by gradient descent by gradient descent

0 notes

symbolicsystems · 5 years ago

Quote

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Attention Is All You Need

0 notes

symbolicsystems · 5 years ago

Quote

In contrast to entropy, which increases monotonically, the "complexity" or "interestingness" of closed systems seems intuitively to increase at first and then decrease as equilibrium is approached. For example, our universe lacked complex structures at the Big Bang and will also lack them after black holes evaporate and particles are dispersed. This paper makes an initial attempt to quantify this pattern. As a model system, we use a simple, two-dimensional cellular automaton that simulates the mixing of two liquids ("coffee" and "cream"). A plausible complexity measure is then the Kolmogorov complexity of a coarse-grained approximation of the automaton's state, which we dub the "apparent complexity." We study this complexity measure, and show analytically that it never becomes large when the liquid particles are non-interacting. By contrast, when the particles do interact, we give numerical evidence that the complexity reaches a maximum comparable to the "coffee cup's" horizontal dimension. We raise the problem of proving this behavior analytically.

Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

0 notes

symbolicsystems · 5 years ago

Quote

Merely by existing, all physical systems register information. And by evolving dynamically in time, they transform and process that information. The laws of physics determine the amount of information that a physical system can register (number of bits) and the number of elementary logic operations that a system can perform (number of ops). The universe is a physical system. This paper quantifies the amount of information that the universe can register and the number of elementary operations that it can have performed over its history. The universe can have performed no more than 10120 ops on 1090 bits.

Computational capacity of the universe

0 notes

symbolicsystems · 5 years ago

Quote

Recent studies have emphasized the merits of search processes that lack overarching objectives, instead promoting divergence by rewarding behavioral novelty. While this less objective search paradigm is more open-ended and divergent, it still diers significantly from nature’s mechanism of divergence. Rather than measuring novelty explicitly, nature is guided by a single, fundamental constraint: survive long enough to reproduce. Surprisingly, this simple constraint produces both complexity and diversity in a continual process unparalleled by any algorithm to date. Inspired by the relative simplicity of open-endedness in nature in comparison to recent non-objective algorithms, this paper investigates the extent to which interactions between two coevolving populations, both subject to their own constraint, or _minimal criterion_, can produce results that are both functional and diverse even without any behavior characterization or novelty archive. To test this new approach, a novel maze navigation domain is introduced wherein evolved agents must learn to navigate mazes whose structures are simultaneously coevolving and increasing in complexity. e result is a broad range of maze topologies and successful agent trajectories in a single run, thereby suggesting the viability of _minimal criterion coevolution_ as a new approach to non-objective search and a step towards genuinely open-ended algorithms.

Minimal Criterion Coevolution: A New Approach to Open-Ended Search

0 notes

symbolicsystems · 5 years ago

Quote

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernetworks can be viewed as relaxed form of weight-sharing across layers. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve near state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks. Our results also show that hypernetworks applied to convolutional networks still achieve respectable results for image recognition tasks compared to state-of-the-art baseline models while requiring fewer learnable parameters.

HyperNetworks

0 notes

symbolicsystems · 5 years ago

Quote

We present a reinforcement learning framework, called Programmatically Interpretable Reinforcement Learning (PIRL), that is designed to generate interpretable and verifiable agent policies. Unlike the popular Deep Reinforcement Learning (DRL) paradigm, which represents policies by neural networks, PIRL represents policies using a high-level, domain-specific programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verification by symbolic methods. We propose a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maximal reward. NDPS works by first learning a neural policy network using DRL, and then performing a local search over programmatic policies that seeks to minimize a distance from this neural "oracle". We evaluate NDPS on the task of learning to drive a simulated car in the TORCS car-racing environment. We demonstrate that NDPS is able to discover human-readable policies that pass some significant performance bars. We also show that PIRL policies can have smoother trajectories, and can be more easily transferred to environments not encountered during training, than corresponding policies discovered by DRL.

Programmatically Interpretable Reinforcement Learning

0 notes

symbolicsystems · 5 years ago

Quote

Mutation testing assesses test suite efficacy by inserting small faults into programs and measuring the ability of the test suite to detect them. It is widely considered the strongest test criterion in terms of finding the most faults and it subsumes a number of other coverage criteria. Traditional mutation analysis is computationally prohibitive which hinders its adoption as an industry standard. In order to alleviate the computational issues, we present a diff-based probabilistic approach to mutation analysis that drastically reduces the number of mutants by omitting lines of code without statement coverage and lines that are determined to be uninteresting - we dub these arid lines. Furthermore, by reducing the number of mutants and carefully selecting only the most interesting ones we make it easier for humans to understand and evaluate the result of mutation analysis. We propose a heuristic for judging whether a node is arid or not, conditioned on the programming language. We focus on a code-review based approach and consider the effects of surfacing mutation results on developer attention. The described system is used by 6,000 engineers in Google on all code changes they author or review, affecting in total more than 14,000 code authors as part of the mandatory code review process. The system processes about 30% of all diffs across Google that have statement coverage calculated.

State of Mutation Testing at Google

0 notes

symbolicsystems · 5 years ago

Quote

This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

DARTS: Differentiable Architecture Search

0 notes

symbolicsystems · 5 years ago

Quote

Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.

Relational inductive biases, deep learning, and graph networks

0 notes

symbolicsystems · 5 years ago

Quote

This paper shows that every sublevel set of the loss function of a class of deep over-parameterized neural nets with piecewise linear activation functions is connected and unbounded. This implies that the loss has no bad local valleys and all of its global minima are connected within a unique and potentially very large global valley.

On Connected Sublevel Sets in Deep Learning

0 notes

symbolicsystems · 5 years ago

Quote

This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. The material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra.

A Brief Introduction to Machine Learning for Engineers

0 notes

symbolicsystems · 5 years ago

Quote

All of the interesting systems (e.g. transportation, healthcare, power generation) are inherently and unavoidably hazardous by the own nature. The frequency of hazard exposure can sometimes be changed but the processes involved in the system are themselves intrinsically and irreducibly hazardous. It is the presence of these hazards that drives the creation of defenses against hazard that characterize these systems.

How Complex Systems Fail

0 notes

symbolicsystems · 5 years ago

Quote

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

Reflections on Trusting Trust

0 notes

symbolicsystems · 5 years ago

Quote

We present a theoretical framework for the compression of automata, which are widely used in speech processing and other natural language processing tasks. The framework extends to graph compression. Similar to stationary ergodic processes, we formulate a probabilistic process of graph and automata generation that captures real world phenomena and provide a universal compression scheme LZA for this probabilistic model. Further, we show that LZA significantly outperforms other compression techniques such as gzip and the UNIX compress command for several synthetic and real data sets.

Automata and Graph Compression

0 notes

symbolicsystems · 5 years ago

Quote

The idea that a genetically fixed behavior evolved from the once differential learning ability of individuals that performed the behavior is known as the Baldwin effect. A highly influential paper [Hinton G.E., Nowlan S.J., 1987. How learning can guide evolution. Complex Syst. 1, 495-502] claimed that this effect can be observed in silico, but here we argue that what was actually shown is that the learning ability is easily selected for. Then we demonstrate the Baldwin effect to happen in the in silico scenario by estimating the probability and waiting times for the learned behavior to become innate. Depending on parameter values, we find that learning can increase the chance of fixation of the learned behavior by several orders of magnitude compared with the non-learning situation.

The Revival of the Baldwin Effect

0 notes