blog-anish - Tumblr blog

blog-anish · 7 years ago

Text

Hillel's Brainfuck Interpreter, explained

It took me some fiddling to get Hillel Wayne’s original vim-macros-only Brainfuck interpreter running, so I figured I’d share an asciinema, in the spirit of reproducible shitposting!

Any mistakes are mine; The interpreter itself is based on Hillel’s.

Why was it so hard?

The tricky part was the [ operator: I figured his 'clF0`b needed to do something other than move. Looking at the brainfuck spec, [ behaves like jz end_of_loop. `bh%mbmb clearly moves the mark `b (that is, the program counter) to the matching ], so 'clF0 has to crash the macro if the mark 'c (that is, the position of the head on the tape) is on a 0. That’s super clever; I wouldn’t have thought to use the fact that vim stops running your macro if you make an impossible move.

Alright, so how do we do that? Well, l just moves us one left, and F0 tries to go back to the previous 0 on this line. Okay, so it normally fails immediately, since there’s nowhere left to go after the numbers on each line. The first thing I tried to do is add a space after each number, but the C-a/C-x increment/decrement operator deletes the trailing whitespace. The solution is to use one of my favorite vim options virtualedit (invaluable for asciiart! see the modeline at the top). Now, we can go left all we like, but F0 still fails if there’s no 0 to our left!

Huh? Sorry, I don’t speak linenoise…

Let’s break it down:

What’s Brainfuck?

Brainfuck is a small, Turing-complete, notoriously impenetrable, language. Here’s the semantics denoted into C, according to wikipedia:

Brainfuck command C equivalent ------------------- --------------------------------------------------------- (Program Start) char array[INFINITELY_LARGE_SIZE] = {0}; char *ptr=array; > ++ptr; < --ptr; + ++*ptr; - --*ptr; [ while (*ptr) { ] )

That is, we have a pointer into an infinitely long tape, and we can move back and forth along the tape with < and >, increment and decrement the memory cell on the tape that we’re currently looking at with + and -, and finally, we can loop until our pointer ends on a 0, using [ and ].

How?

You’ll notice that the commands from the above table mirror the block of text at mark `a on lines 8-14. That’s no coincidence: this is where we keep a lookup table of commands to execute. Mark `b is our program counter, and mark `c is the location of our pointer in memory/the tape. So at every step, we want to read the symbol at the program counter, look up the meaning of that symbol in our table of meanings, and then run that meaning/command on our tape. We’ll come back to that macro at the end, but let’s start with the easier ones and move up:

+ just needs to increment the tape cell by 1, and since we’re already on it, we just use vim’s built in command to increment the next number in the line, ^A

- similarly just decrements the number on the current line using ^X, the opposite of ^A

> moves to the next tape cell. Since we have one cell per line with the current position marked by `c, we just move down a line (j) and update `c (mc)

<, similarly moves to the previous cell, so we just move up a line (k) and update `c (mc)

] is finally a little more complex. It ends a looping construct, so it needs to change the program counter. It goes to the location of the program counter (`b), jumps to the matching [ delimiter (%), and then decrements the program counter (hmbmb), so that the next instruction is the start of the loop. Note that we need to run mb twice to change the mark b to a different character on the same line. The first mb deletes the mark instead of moving it, since this same line was the old location of the mark.

[ is the companion to ] that’s a little more complex and trick since it needs to actually check a conditional. See the first section of this post for an explanation of how it works.

Our macro on line 3 does exactly that every time we call it, so each call to that macro is one step on the machine. This works by moving to mark `b (`b), yanking the character into register "a ("ayl), jumping to mark `a (`a), starting a search (/), and then pasting in register "a (^R^Ra^M)^[note that ^R means “hold down control and hit R”], going to the next word (W), yanking into register "a the rest of the line ("ay$) which contains the macro to run, then moves to the tape 'c, runs the relevant command @a, and then moves mark `b one character to the right (`blmbmb). Remember, if you couldn’t follow any of that, vim’s :h is very good!

0 notes

blog-anish · 7 years ago

Text

Popularity of Haskell Extensions

Recently on lobste.rs, there was some discussion of "small" language features that add stuff to learn for beginners, but can be pretty ergonomic for more experienced users. (well, it started as a discussion of PEP 572, but let's not revisit that flamewar here).

I mentioned that the Haskell community's / GHC's solution to this is to throw all of these behind {-# LANGUAGE #-} pragmas (so that PEP might be {-# LANGUAGE Colonoscopy #-}, but probably something less silly), and /u/minimax mentioned that that didn't really help, since it was so common to run into Haskell projects that use a large number of extensions, you basically have to learn them all. This is definitely something that I struggled with when I was learning to contribute to LiquidHaskell. Let us start, as, he did, by just asking ghc how many extensions there are:

!ghc --supported-extensions | wc -l

235

Obviously this estimate dramatically overstates how bad the problem is. For starters, a number of those are (unused/undocumented/experimental) and almost half are flags that turn off the corresponding language extensions. Let's get rid of the flags that turn off extensions:

extensions = !ghc --supported-extensions | grep -v '^No' | grep -v '^Haskell' len(extensions)

113

So, actually, we didn't have to go through this excercise: the GHC documentation lists 100 GHC extensions. I wonder what the 13 others are?

I've pasted the relevant table from firefox into /tmp/ghc_ext_docs

docs = !awk '{print $1}' /tmp/ghc_ext_docs !wc -l /tmp/ghc_ext_docs set(extensions) - set(docs)

100 /tmp/ghc_ext_docs {'AlternativeLayoutRule', 'AlternativeLayoutRuleTransitional', 'AutoDeriveTypeable', 'DoAndIfThenElse', 'DoRec', 'GHCForeignImportPrim', 'GeneralizedNewtypeDeriving', 'JavaScriptFFI', 'MonoPatBinds', 'ParallelArrays', 'PatternSignatures', 'PolymorphicComponents', 'RecordPuns', 'RelaxedLayout', 'RelaxedPolyRec', 'UnliftedFFITypes'}

Okay, so some of this is just spelling (GND can be spelled either Generalized or Generalised; GHC accepts both, and RecordPuns in listed in the docs as NamedFieldPuns), but there really are a few undocumented extensions floating around in here.

But that still leaves us with over 100 extensions! So, how commonly used are all of these? Well, there's an easy-ish way to answer that! GitHub search has an API, so we can actually search for occurrences of each pragma.

Scraping GitHub

The github search API is actually very simple. I'm GETing a url with a search query that includes language:haskell to search haskell files, and I'm searching for LANGUAGE <PRAGMA> to find uses of a pragma.

token = !cat ~/.tokens/github url = lambda x : "https://api.github.com/search/code?q=LANGUAGE%20"+x+"+language:haskell&per_page=1&access_token=" + token[0]

How many haskell files on GitHub have LANGUAGE pragmas at all?

import requests total = requests.get(url("")).json()["total_count"] total

512696

# Restore cached data instead of hitting the server again. with open("/tmp/haskell_extensions.txt", "r") as myfile: data=eval(myfile.read()) uses_list = [x[1] for x in data]

This took a bit of getting right. It turns out that if you keep hitting the GitHub API, even with a sleep timeout over the documented throttling, you'll hit an anti-bot measure, and the server will tell you how long to wait to retry in the HTTP headers.

import time def uses(n : int, ext : str): # Sleep so that we don't hit the server too hard time.sleep(n) print(ext) response = requests.get(url(ext)) try: ans = response.json()["total_count"] except KeyError: print(response.headers) # Implement additional throlltling if asked return uses(int(response.headers["Retry-After"]), ext) print(ans) return ans

Alright, let's scrape some data! We're going to search for all the hits for LANGUAGE PRAGMA in haskell files for each pragma:

# Sleep at least two seconds between queries uses_list = [uses(2,ext) for ext in extensions]

# Save search results for later !echo "{list(zip(extensions,uses_list))}" > /tmp/haskell_extensions.txt

I've gist'd the results, so that you can play with the data yourself without having to query the server (or wait for the queries to run...)

!gist /tmp/hask*

https://gist.github.com/0e85d4bd392ec5a9f40229718bc12cfc

Making Some Plots!

# https://pythonspot.com/matplotlib-bar-chart/ import matplotlib.pyplot as plt; plt.rcdefaults() import numpy as np import matplotlib.pyplot as plt

I like to show plots that occur "naturally" before I play with them at all. It feels more honest somehow, even though obviously the fact that I haven't added my own biases doesn't mean that the visualization is representative of anything at all.

y_pos = np.arange(len(extensions)) plt.bar(y_pos, uses_list) plt.xticks(y_pos, extensions) plt.ylabel('Frequency (github hits)') plt.title('Haskell Language Pragma Popularity') plt.show()

Let's sort them, normalize them by the number of total hits for LANGUAGE, and plot that. I'm then going to filter out just the extensions that appear in more than 5% of haskell files that have extensions, but I'm going to leave the plot with all of them in if you feel like zooming and panning a lot.

freqs, exts = list(zip(*list(reversed(sorted(zip(uses_list, extensions)))))) normalization_factor = total freqs = list(map(lambda x: x/normalization_factor, freqs))

plt.rcParams['figure.figsize'] = [50, 5] y_pos = np.arange(len(exts)) plt.bar(y_pos, freqs) plt.xticks(y_pos, exts, rotation=45, ha="right") plt.ylabel('Frequency (github hits)') plt.title('Haskell Language Pragma Popularity') plt.show()

freqs2, exts2 = list(zip(*filter(lambda x:x[0]>.05, zip(freqs,exts))))

plt.rcParams['figure.figsize'] = [10, 5] y_pos = np.arange(len(exts2)) plt.bar(y_pos, freqs2) plt.xticks(y_pos, exts2, rotation=40, ha="right") plt.ylabel('Frequency (github hits)') plt.title('Haskell Language Pragma Popularity') plt.show()

So, you can only 10 extensions show up in more than 10% of the Haskell files on GitHub, and 20 show up in more than 5%! Moreover, not all of these extensions add to the language. For example, Safe just removes features, like the unsafe... functions from the standard library, and Strict just changes the default evaluation order. Admittedly, some of these (like RankNTypes) can genuinely get pretty compilicated, but the most common extension, OverloadedStrings can be explained in a sentence: string literals have type IsString a => a instead of String, and CPP just enables the C preprocessor.

Still, this is potentially a lot of extensions to learn if you want to drop into a large, mature, Haskell project that uses a lot of GHC features.

0 notes