Tumgik
theohonohan · 4 days
Text
Bernard Chazelle on algorithms (2006)
Hold on! To make sense of the world, we have math. Who needs algorithms? It is beyond dispute that the dizzying success of 20th century science is, to a large degree, the triumph of mathematics. A page's worth of math formulas is enough to explain most of the physical phenomena around us: why things fly, fall, float, gravitate, radiate, blow up, etc. As Albert Einstein said, “The most incomprehensible thing about the universe is that it is comprehensible.” Granted, Einstein's assurance that something is comprehensible might not necessarily reassure everyone, but all would agree that the universe speaks in one tongue and one tongue only: mathematics.
But does it, really? This consensus is being challenged today. As young minds turn to the sciences of the new century with stars in their eyes, they're finding old math wanting. Biologists have by now a pretty good idea of what a cell looks like, but they've had trouble figuring out the magical equations that will explain what it does. How the brain works is a mystery (or sometimes, as in the case of our 43rd president, an overstatement) whose long, dark veil mathematics has failed to lift.
Economists are a refreshingly humble lot—quite a surprise really, considering how little they have to be humble about. Their unfailing predictions are rooted in the holy verities of higher math. True to form, they'll sheepishly admit that this sacred bond comes with the requisite assumption that economic agents, also known as humans, are benighted, robotic dodos—something which unfortunately is not always true, even among economists.
A consensus is emerging that, this time around, throwing more differential equations at the problems won't cut it. Mathematics shines in domains replete with symmetry, regularity, periodicity—things often missing in the life and social sciences. Contrast a crystal structure (grist for algebra's mill) with the World Wide Web (cannon fodder for algorithms). No math formula will ever model whole biological organisms, economies, ecologies, or large, live networks. Will the Algorithm come to the rescue? 
https://www.cs.princeton.edu/~chazelle/pubs/algorithm.html
0 notes
theohonohan · 20 days
Text
A not-so-bitter lesson
The Bitter Lesson is an essay that makes the claim that, in AI, computational brute force tends to sweep before it. As hardware gets faster, old attempts to model cognition are in danger of being made irrelevant, no matter how elegant or promising they once seemed.
A more general principle is the idea that computer science is not governed by ideas, but by what can be done on the available hardware. Computer scientists would like to think that ideas are what matter, rather than whatever hardware is currently on the shelves. In the field of Programming Language Theory, in particular, there is a focus on “ideas of lasting value”, as opposed to incidental technical details which will inevitably quickly become obsolete. It’s valid to aim to teach undergraduates only material of lasting value—but there’s also an element of prestige attached to the supposedly eternal truths the academics are working on. When Robert Harper refers to the fact that the mathematical idea of a variable (as used in functional programming languages) has been around since “antiquity”, it’s clear that he intends this to be evidence for its canonical importance. For some, the Greek letters and apparent impenetrability of PLT seem to be evidence of elitism. There’s also, perhaps, an element of insecurity. The Curry-Howard correspondence is most often used by computer scientists to borrow ideas from the more august, tweedy and established discipline of logic (rather than allowing logic to borrow from computing). It’s an isomorphism, but in practice it has a direction: from logic to functional programming.
Most undergraduates realise that coming up with elegant concepts is just building castles in the sky until their effectiveness is mathematically proven or measured in practice. Every student learns this not-so-bitter lesson: that you can’t just invent grand and clever-sounding computer science concepts in a vacuum. They learn that intricate algorithms, with good asymptotic performance, are often bested by brute force approaches, due to the complexity of real machines and compilers. Knowing this, they don’t get attached to their ingenious ideas when they turn out not to be an improvement on more obvious or established methods.
I’m an observer of the Haskell (and Agda, Coq, etc.) scene, and it seems to me that it suffers from this problem of overvaluing elegant ideas. The Haskell community will tell you that they are pursuing truly important, timeless principles of programming, rooted in category theory. Well, they won’t tell you that, but it’s implicit that they are pursuing “ideas of lasting value” rather than working pragmatically (and straightforwardly) with the latest hardware. There are plenty of valuable concepts in the Haskell canon, but there are also ways in which Haskell is disconnected from the kind of software than 99% of the world needs to write. In my opinion, the lack of a repertoire of Haskell patterns for solving typical problems is evidence of this. With Haskell, the correct approach seems to be that the programmer should be a mathematician, should think very hard and come up with an exotic abstraction that is applicable to the problem, and should then write their program from scratch, guided by pure intuition (and, I suppose, using a few Haskell libraries). Haskell is the opposite of a glue language: Haskell is for solving problems with your intellect rather than solving them by downloading and combining packages which other people have put together. This orientation towards difficulty and abstraction is prestigious, but it’s not always appropriate—perhaps not even often appropriate. Looking for a quick, dumb solution (“worse is better”) is a much lower-maintenance strategy.
The not-so-bitter lesson is that it is the running code that matters, not the (possibly gratifying) thought and problem-solving that led to it. The code is the proof, but, unlike proofs in mathematics, it may be invalidated by changes in technology. Mathematical proofs have a kind of monotonicity: a proof, once established, can never be rolled back by being disproved—only by being forgotten. These demonstrations, these temporary, contingent proofs of the effectiveness of the programmer’s thinking seem to me to be what matters. Computer science would be very different without hardware.
1 note · View note
theohonohan · 27 days
Text
Changing names in "Track Changes": Microsoft Word vs Apple Pages
I've come across an interesting difference in the way Track Changes is implemented in Microsoft Word and Apple Pages.
The difference relates to the semantics of changing the name of the person making the edits. In Microsoft Word, such a change is not retroactive: existing changes to the document will retain the old name. For users who started editing without considering what name they wanted to be attached to the edits, this can be inconvenient (a particularly common pattern is that they start editing under their own name and later realise that they want to anonymize their edits).
Apple Pages has a much simpler model. In Pages, changing the name of the editor from "Name1" to "Name2" will effectively search for and replace all instances of "Name1" with "Name2" in Track Changes metadata. This means that it's easy to retroactively change the name on all of the tracked edits to the document. It does also mean that it's possible to make converging changes to editor names, which means that the distinction between the changes made by two (or more) editors is lost. So the Pages approach is a simpler and more powerful, but doesn't try to enforce an immutable trail of changes in the way Word does. Of course, a reasonably skilled Word user can fake a document with any Track Changes metadata they want, so there's nothing secure about Track Changes in any cryptographic sense—there's no audit trail.
0 notes
theohonohan · 2 months
Text
In the archive
Years ago, I did a couple of weeks’ work experience in an archive. By that stage I knew how to program. I discovered that the archival storage system involved two sets of numbers. I don’t remember what they were called, but let’s say they were reference codes and location codes. A reference code typically referred to a particular archive box, and the location of that box was stored in a location index. Location codes are really only useful inside the archive, may change over time, and can be considered private, whereas references codes are exposed to the public, are immutable, and can thus appear in print.
It seemed that there would be an obvious analogy to computer storage. I was surprised that one didn’t immediately become clear to me. Eventually I realised that there references in the system are bidirectional, so it’s quite different from a typical key-value store.
Location codes can be understood as variable names (identifiers) and reference code of the box stored at a particular location can be understood as a value. It’s also possible, however, to view reference codes as the names of pointer variables which contain the current location of the relevant box. These two interpretations constitute the bidirectional system of references in the archive.
Because reference codes are primary, the pointer interpretation is more appropriate. It’s rare in programming for a pointed-to object to contain a reference back to the variable that designates it—for one thing, aliasing means that there may be more than one pointer variable pointing to any given object. In the archive, there’s just one reference in the index to each archive box, and by sitting on a particular shelf, it effectively designates a single location code. This is all trivial enough. The idea of a closed loop of references brings to mind the sparse-dense trick.
I’m interested in the process of declaration or allocation of shelf spaces in the archive. I imagine an archivist taking a fresh archive box (possibly folding it up from a flat-pack piece of cardboard), writing a label for it, filling it, finding a shelf space for it, and then entering the location code in the location index. 
To a computer scientist, it might seem to make sense to allocate space first, before doing anything involving a particular, concrete box. In that case, allocation is the first step, and there’s a period after allocation during which a location code is reserved but not associated with any particular reference code. In programming languages, it’s common for variable declaration and the allocation of that space to actually happen simultaneously; initialisation (the filling of a box and placing it on the shelf?) can happen later. Thus, the reference code is assigned at the time when the storage is allocated, but only later actually attached to a physical box. For one thing, this prevents the archivist from filling more boxes than they have storage space for.
In an archive, the pointer interpretation implies that, officially, each reference code designates a location code. I find this slightly counterintuitive. Perhaps it’s clearer to think in terms of de Saussure’s analogy that signified and signified are as inseparable as two sides of a sheet of paper. In that case, neither is really primary: it’s an unbreakable bidirectional connection.
I also like the parallel between creating a new archive box and creating a new wiki page. Archive boxes don’t change over time, whereas wiki pages can be in constant flux, so the situation is a little bit different. This quote from Ward Cunningham about a wiki as a party, at which new rooms appear when needed, is unexpected and charming:
A wiki is like a party that doesn't have to stop. It's a party that doesn't get crowded because new rooms appear when needed. It's a timeless party where you can try each conversation over and over until you get it right.
It’s worth pointing out that there are two stages in the creation of a new “room” (a new wiki page). First there’s the mention: the insertion, in some existing page, of a wikilink to a non-existent page. Secondly there’s the use, in which someone follows that link and, by editing, fills out the content of the new page. The existence of the link by itself is a kind of affordance, inviting any reader to create the new page. I’m tempted to invoke the linguistic term actuation here, because it sounds right: the first editor installs a new doorknob; the second editor grasps it and turns it, opening the door for the first time. This process is, of course, a lot more informal than the archive equivalent, perhaps the creation of a new fonds rather than a single new archive box.
0 notes
theohonohan · 2 months
Text
Referential opacity and wikilinks
I like this section of a blog post about the ability to refer to a wiki page that doesn't exist—to mention it without having to create it:
If you’ve ever had a good wiki experience, you know what this feels like in practice. Groping towards an idea on one page you realize its relation to another page and quickly make a [[Bracketed Link]] or CamelCaseAssociation to pull that idea into your web. But most non-wiki environments frustrate this fluidity. They don’t want to know the name of the page — they want to know its location, which is like asking someone to give up using variables in their code and start addressing memory directly. It can be done, but it is going to kill your flow. What’s more, these frustrate one of the crucial features of wiki practice: they don’t let you link to pages that don’t exist yet. https://hapgood.us/2015/10/08/building-a-pseudo-wiki-on-tumblr/
The question arises of how to name the page you're declaring the potential existence of. A good name will hint at the topic if not define it completely. The worst name will be completely meaningless, when taken out of context.
Wikipedia has a couple of policies (1, 2) that deal with the related issue of piped links (links which display different text from the title of the page linked to). Piped links create flexibility, but also the risk of misuse. A piped link should not create an "easter egg" effect, or force the reader to engage in "mystery meat navigation". Both of the preceding phenomena involve the user not knowing what the link points to until they click it, or at least hover their mouse over it.
The Wikipedia style guideline called EASTEREGG is intended to discourage people from creating bad piped links, and is discussed in a section headed "Transparency". That word might imply that the concept referential transparency is involved, but it seems that the situation is not quite so simple.
Referential transparency is a property of a term's context, not of a term itself. Colloquially, in a referentially transparent context, terms with the same denotation can be freely interchanged. In a wiki context, this would imply that the link text could be freely substituted by any suitable term with the same denotation. English language phrases aren't generally referentially transparent, though.
The example that comes to mind is the phrase "John believes Kansas City is in Kansas". (Since Kansas City is in fact (mostly) in Missouri, John is mistaken). This sentence is not referentially transparent with respect to the term Kansas City. It's a referentially opaque context. Thus, we can't replace "Kansas City" with the words "the largest city in Missouri" without changing the meaning of the sentence. (Presumably, John doesn't believe that the largest city in Missouri is in Kansas). If the words "Kansas City" were a wikilink of the form [[Kansas City|https://en.wikipedia.org/wiki/Kansas_City,_Missouri]], the link text "the largest city in Missouri" would in theory be acceptable, except for the referential opacity problem. So Wikipedia's injunction to use "transparent" link text doesn't actually have anything to do with referential transparency. It's a matter of intuitiveness and clarity. Needless to say, anyone editing a wiki page should attempt to link only from referentially transparent linguistic contexts; where the name (e.g. "Kansas City") is being mentioned rather than used, the only appropriate link is to a page about the name itself rather than about its denotation. On Wikipedia, it seems that, when a suitable page exists (one which discusses a word itself rather than its meaning), it is often a disambiguation page or a Wiktionary page. One exception is the pages on this list of example sentences.
Perhaps the original issue of referring to a non-existent page for the first time ("declaring" it) can cast some light on piped linking. Ideally, a piped link's alternative text could serve as the title of the yet-to-be-created page, even if it's not the most obvious title for that page. In other words, the piped link text should clearly have the same denotation as the ideal title. This is a less restrictive constraint than that imposed by a referentially opaque context, and indeed it sounds similar to the freedom granted by referential transparency. The presence of the extra stylistic restriction that piped links should really only amount to reformattings of the linked article's title means that referential transparency is not implicated. I think!
0 notes
theohonohan · 2 months
Text
Notes on Census Design
The next census to be conducted in Ireland will take place in 2027. For the first time, there will be a option to complete the census online. Having worked as an enumerator on the 2022 Census, and also having an interest in the design of public services, I have some thoughts on this subject.
Part I of this post discusses the technical background of census-taking, focusing on recent developments. Part II goes into the design aspects of delivering a multi-modal (paper and digital) census.
Part I: Traditional, register-based and combined censuses
In a census, whether householders are interviewed by an enumerator, fill in a paper census questionnaire, or complete a census questionnaire on a mobile phone or laptop, the process is still considered ‘traditional’. The householder is providing their information on demand. There is an alternative, known in Europe as a register-based approach, and in the US as an approach based on administrative records (or administrative data). This refers to the extraction of population data from records already maintained by the state. Often, the traditional and register-based approaches are used together, to create a so-called combined census. Many European countries now operate combined or purely register-based censuses.
In the US, the American Community Survey was introduced in 2010 as a replacement for the US census long form. It is a compulsory survey sent to 250,000 households every month. In some countries, such as Italy, where a similar approach has been adopted, it is referred to as a permanent census.
The register-based method of gathering population statistics uses pre-existing administrative databases, and takes the public out of the loop altogether. There’s no longer any form to be filled in, because the data is collected automatically. In the Netherlands, lack of public cooperation with the 1971 Census led to the cancellation of the 1981 Census. Consequently, the Dutch census has been register-based for 40 years.
Crucially, administrative data is anonymized at the point where it is ingested by the statistical systems. This guarantees that no individual is identifiable in the population data, just as responses to the traditional census are confidential.
An understanding of the difference between administrative data and statistical data can make a crucial contribution to the public’s willingness to cooperate with a traditional census. Levels of cooperation have historically been high in Ireland, but attitudes can change.
The following list of challenges faced by traditional census taking is presented by Skinner (2018):
increasing costs—Fienberg & Prewitt (2010) report, for example, on the 2010 census in the United States costing approximately USD$13 billion, double what was spent on the 2000 census, which in turn doubled the 1990 cost;
intrusiveness, privacy concerns, and response burden, especially when the census is mandatory (e.g. Prewitt 2004);
lower public cooperation and participation;
difficulties in accessing secure apartments and enumerating unsafe areas;
more complex living arrangements, for example, individuals living in multiple locations (such as children of separated parents), the homeless, nomads, refugees, and other hard to reach populations; and
timeliness in relation to user needs, for example, needs for more frequent data on changing patterns of internal and international migration.
Most of these issues are present, some in an incipient form, in the Irish setting.
An online (digital) option has the potential to help with items 4 and 5, and possibly with items 1 and 3. In the Irish setting, items 2 and 3 are apparently yet to manifest seriously. The level of cooperation with the Irish census is high. Ireland is not impervious to international trends, however, and these issues do have the potential for developing into problems for the traditional census. Item 6 could be addressed by the permanent census model, and could also justify an administrative-records-based approach. 
On the administrative records question, the Irish CSO has stated: “The use of administrative data in population estimation is not unique to IPEADS [Irish Population Estimates from Administrative Data Sources] or even the CSO and has been driven by both the benefits of using administrative data in censuses and the greater difficulties encountered in the traditional census. The benefits include reduced cost, reduced burden on respondents, improved timeliness and greater frequency of results. The challenges which administrative data usage can address include difficulties in recruiting field staff as well as establishing contact with householders.” Irish Population Estimates from Administrative Data Sources, 2021
The administrative records approach essentially avoids all of the difficulties faced by the traditional census, with the drawback that it offers more limited data which is also less precise in some ways.
Kenneth Prewitt, speaking in 2008, anticipated the changing landscape of data collection:
Public cooperation is a very serious problem and we’ll talk in more detail about that. But there is a larger issue of which public cooperation is just one element. I believe that over the next quarter-century or so, the government will increasingly merge administrative data and survey data. What we today understand as the “national statistical system” will more properly be thought of the “national information system.” Sample-based survey data will be part of that system, but less dominant than it has been in the previous half-century or more. For example, the new SIPP [Survey of Income and Program Participation] that is under consideration might be based on 50% administrative records and 50% survey data. If so, that is an indicator of where the whole system is going. One reason for the turn to administrative data – and other data sources, such as commercially provided scanning – is the survey response rate problem: the unit cost of each respondent to a survey is high and getting higher. If, as some have suggested, we can control the response rate issue with incentive payments, there will be further cost increases – as well as data quality problems. Nonresponse, by the way, is not just how many respondents answer but also item nonresponse. There has been less attention to item nonresponse, but in the 2000 Census there was a sharp increase of item nonresponse, reaching into the 20% range on several questions. […] As I said, however, there is a larger, more complicated challenge to survey data. It will occupy a steadily decreasing role in the nation’s information system. Already a number of European countries, especially the Nordic countries, will tell you that less than 25% of the information used by the government comes from surveys. The administrative data are already collected by the government for program management purposes; why not use it in lieu of survey data to understand the economy and society? Even if we were not facing a response rate problem, the sheer density of administrative and surveillance data presents a challenge to our traditional reliance on survey data as the platform for the national statistical system. By the way, by “surveillance,” I do not have in mind the Patriot Act so much as the data we provide every time we use a credit card or book a flight. This is the digital footprint each of us leaves. The sheer amount of digitized data is enormous and we are at the early stages of its expansion and of the data mining methodologies used to extract information from it. We cannot be surprised if the government (now, for example, facing a full cycle 2010 decennial census, which includes the American Community Survey, that will exceed $12b) asks “cannot it be much cheaper to see what we can learn from all of this administrative and digital data than to try to find people and convince them to answer what they see as our intrusive survey questions? https://www.surveypractice.org/article/2927-an-interview-with-kenneth-prewitt
The CSO has discussed replacing the traditional census with something like the American Community Survey. The Irish Community Survey would be a rolling, permanent survey sent to some fraction of the population every month (or every few months, I'm not sure exactly what the frequency would be).
According to Prewitt, the possibility of making the ACS voluntary was examined, but found to result in “high nonresponse, increased costs, and data deterioration”. The CSO intends to make the analogous Irish Community Survey compulsory. In this important respect, the ACS model resembles the traditional census. Most of the design considerations for the traditional census apply to the ACS (and ICS) as well.
I will discuss these design considerations in Part II.
Part II: The census as a service
In 2027, the Central Statistics Office will conduct the next scheduled census. For the first time, the public will be given the option of filling in the census form online. Paper questionnaires will still be available, but the experience of having a human enumerator call to the door to deliver (and then later to collect) the census form is on its way to being a thing of the past.
A digital census can be positioned as a greener, more secure, more economical, more user-friendly and more “contemporary” option. But there is a risk of a digital census being tainted by association with the intensive surveillance practices implemented by tech companies. In Ireland, the paper-based census seems to have, to date, largely avoided such associations with surveillance and Big Brother.
The task of managing public perceptions of the census calls for serious study. As Prewitt (2005) writes: “New research initiatives might start with attitudinal studies, both experimental and with surveys, that more carefully document what the public believes is intrusive and how these beliefs intersect with what the public understands about the importance of statistical data. Because the concern is in part with declining levels of cooperation, better studies are needed than those currently available on the conditions that trigger changes in behaviour. Also needed is research on how government and legitimate data collection by the private sector, especially university-based, can be immunized against the generalized “leave me alone” response. More specifically, it is important for cognitive scientists and experts in questionnaire design to investigate how the wording of particular questions and their placement in the survey [sequencing] might reduce the respondent’s sense of being intruded upon”.
The first online censuses took place in the early 2000s. The US Census in 2000 very quietly introduced the option of responding online—effectively a soft launch of the digital census, without publicity.
Digital census infrastructure is exposed to online threats, such as denial-of-service attacks. In Australia, where there has been a digital option since the 2006 Census, the Census website suffered a major attack on Census night in 2016 and was temporarily taken offline.
The Australian Bureau of Statistics learned from the 2016 experience to take the delivery of a digital census more seriously in all respects: “For the 2021 Census, the ABS took a user-centred design approach to building a digital service, not just a website.” There is very extensive documentation of the delivery of the 2021 Census on the ABS website. It was a resounding success. Arguably, the ABS overcorrected and overprepared after the 2016 fiasco, but in any case the lesson here is that a national statistical organization, such as the ABS or CSO, needs to be on the ball when implementing a digital census option. Ensuring not only the cybersecurity, but also the usability, of the 2027 Census in Ireland is a new challenge for the CSO.
One concern with the new multi-modal aspect of Census 2027 is the need to make sure that people can complete the process, and complete it with ease, whatever combination of events or choices occur. The process should be smooth and it should save them time.
So, for example, a household should be able to request a paper form, start filling it in, accidentally spill pasta sauce on it and then decide to complete the census online instead, without having to jump through any hoops. 
It should be possible to start completing the form on a mobile phone, stop half-way through, and then complete the process on a laptop. 
If an access code is needed to complete the census online, it should be easy and simple to request a new code if the first one is lost. In the 2021 UK Census, hundreds of thousands of people availed of such an option:
Paper questionnaire and access code requests Households could request a paper questionnaire or new access code from the contact centre or website. Field staff were also able to give out or order these for households. Overall, in paper-first areas, 1.6% of households directly requested new paper questionnaires and 6.6% of households requested new access codes. In online-first areas, 7.5% of households requested paper questionnaires and 10% requested new access codes. In total, paper questionnaires were provided to 1.74 million households (excluding paper questionnaires delivered as initial contact or as a reminder). Field staff provided 239,000 households with paper questionnaires and ordered new access codes for 329,000 households. https://www.ons.gov.uk/peoplepopulationandcommunity/householdcharacteristics/homeinternetandsocialmediausage/articles/designingadigitalfirstcensus/2021-10-04#paper-questionnaire-and-access-code-requests
Similarly, the 2021 Census in Australia saw very extensive use of the ‘no Census number’ pathway:
A ‘no Census number’ pathway This enabled respondents to complete their online Census if they had not received a login code or they had lost their Census letter. They could request a code after providing their address and mobile phone number. This self-service pathway received a very high uptake, with 1.75 million requests for a code, and reduced the need for respondents to interact with the Contact Centre or a field staff member. The ‘no Census number’ pathway proved invaluable and popular, accounting for 20% of all online responses.
The designers foresaw that many people would lose their login code, and provided a quick and routine remedy. As a result the members of the public who found themselves in this position weren’t shamed, or made to feel stupid, and were not inconvenienced.
All the possibilities need to be anticipated. It’s no good if a significant fraction of households get stuck and have to phone the helpline in order to work out how to proceed. The necessary design process can profitably be considered as a service design task.
The success of the service design of the digital 2021 Census in Australia is indicated by the fact that their Census Contact Centre received relatively few calls:
The Census Contact Centre Run by Services Australia call centre team on behalf of the ABS, this was open seven days a week, 8am – 8pm, and focused on resolving queries on the first call. A great deal of work was done on scripting so the information given to respondents was clear and anticipated any further questions they might have. On-hold messages were developed so that respondents could hear relevant information such as how they could self-serve online while waiting for their call to be answered. The Contact Centre answered 645,833 calls between 5 July and 1 October, significantly fewer than expected. The Contact Centre did not need to implement any congestion management strategies during the entire period. The 2021 Census night was the first Australian Census that did not see the Contact Centre overwhelmed by high volumes of callers.
This is the service design element that comes into the picture with a multi-modal census: it becomes necessary to thoroughly design all of the possible pathways people may follow through the system. Previously, a human enumerator could assist the householder, and literally provide a personal service. The new system is inevitably both less personal and more complex. It will take significant design work to make it simple for the public to use.
There also needs to be an acknowledgement that sometimes human intervention will be necessary. The Australian 2021 Census again:
An escalations team This was established to respond more quickly to difficult queries or complaints. Fifty-three staff managed and responded to 40,000 escalations and 6,000 contact us forms. They also managed correspondence that came in via: ● the ABS website ● the ABS privacy team ● the Census Data Capture Centre ● other channels. This team made 30,000 calls to respondents, including 650 assistance calls where they completed the respondents’ Census form with them over the phone. Part of the escalation team looked after the Parliamentary Hotline, which responded to more than 400 phone and email enquiries made by Members of Parliament to support their constituents.
There is a Customer Relationship Management aspect to this, which may require more robust systems than previously existed. In the Irish 2022 Census, the smartphone app used by every enumerator had only very basic functionality for storing contact phone numbers, and it was more difficult for enumerators to keep track of interactions with households than it ought to have been.
In the case of the (compulsory) American Community Survey, since 2012 a paper questionnaire is automatically sent by mail to anyone who does not respond online. This “digital first, but defaulting to paper” approach places the minimum extra burden on people who do not wish to complete the survey online. All they have to do is wait for the paper questionnaire to arrive. The implementation of this system involves sending out letters inviting householders to respond online, giving them some time to complete the online survey, and then mailing out a questionnaire to households that haven’t responded online by a particular date. In the case of the ACS, the paper questionnaire arrives about two weeks after the initial invitation to respond online.
The system in place for the ACS seems to be slightly more complex and less user-friendly than it might be:
Should I respond online or by mail? ● After you reply online, you may receive a questionnaire in the mail. You do not need to mail it back if you have completed the online survey. ● If you prefer not to respond online, please fill out the questionnaire you will receive in the mail and send it back in the postage-paid envelope. ● If you have lost or forgotten your PIN, please fill out the questionnaire you will receive in the mail and send it back in the postage-paid envelope. https://www.census.gov/programs-surveys/acs/respond/respond-online.html
(There is a race condition in this arrangement, which needs to be explained to the public: a paper questionnaire may already be in the mail by the time they respond online.)
The ACS process involves a User ID as well as a PIN, and if you lose either, you need to call a helpline. This, again, is not as simple as it should be:
Can I reset my PIN? ● If you provided an answer to one of the security questions during your initial login, you can reset your PIN if lost or forgotten. Enter your User ID at https://respond.census.gov/acs and select the "Click here if you do not know your PIN" link. ● If you have lost or misplaced your User ID, please call 1-800-354-7271 for assistance. https://www.census.gov/programs-surveys/acs/respond/faqs.html
One change to census practice that facilitates easy and convenient response is the introduction of a response window in place of the emphasis on filling out the questionnaire on Census night. This is a concept from survey practice, and refers to the period within which the survey can be completed. The most significant change, for censuses, is that the response window starts before Census night. The householder can complete their questionnaire early, by going online at any time after receiving their login code. In the Australian 2021 Census, this flexible facility for early responses was publicised and led to higher rates of early response than in previous censuses. This contributed to increased levels of response overall .
(NB In the case of a de facto census, such as those conducted in Ireland or Australia, the respondent must still answer the survey questions with respect to the people who will be present in the household on Census night, rather than at the time of completing the survey. This is potentially confusing!)
The 2021 Census was the first time that the ABS actively encouraged households to complete and submit their Census form as soon as they received their materials, meaning this could be done before Census Night on 10 August. The letters also emphasised 12 August as a date after which the household may receive contact from the ABS. This messaging was designed to provide a ‘response window’, where previous censuses highlighted Census Night only. A range of operational information, including completed online and paper Census forms, call centre agents, the Census Digital Service and field staff observations, provided the ability to monitor progress in real-time at small area levels, and was used to highlight issues such as areas of low response, so that strategies could be enacted quickly to respond. If a dwelling had not returned a Census form by Census Night, reminder and non-response procedures were carried out by Census field staff, which included house visits and reminder letters, to ensure everyone in Australia on Census Night was counted. https://www.abs.gov.au/census/guide-census-data/census-methodology/2021/how-data-collected
The details of timing for response windows, due dates etc. are somewhat complex, when compared to the straightforward old system of delivering forms in advance of Census night and collecting them afterwards. The schedule needs to be carefully thought through. In the Australian case, research was carried out to determine how best to communicate the response window concept to the public.
As well as the service design requirements, there are the specific requirements that go into the design of a digital survey. Digital questionnaires differ from paper forms in some familiar ways. Although I'm not an expert on surveys, I’m going to dip into those aspects here, since the questionnaire is one of the most visible parts of a digital census.
The questions that are presented to the user may depend on the answers to previous questions (this is called branching or question routing). This can make the process of filling out a digital form shorter and simpler than that of filling out a paper form. The software can verify when the form has been completed, so it’s less likely that a required section will be skipped unintentionally. Such features can significantly reduce the burden of completing the census.
A digital questionnaire can be designed to be more conversational and engaging than a paper questionnaire, and even to incorporate elements of gamification. When applied appropriately, these strategies could presumably improve response rates. In particular, more sensitive questions can be contextualised with details of how the information is used. This may help to reassure respondents who are uncomfortable answering the more sensitive questions. Subtle signalling how much of the questionnaire has been completed can be a suitable way of providing encouragement. On the other hand, a census is a serious business, so too much informality and fun would be out of place. The "time capsule" feature of the Irish 2022 Census was comparable to gamification in a sense, as it was an invitation to more personal, light-hearted and open-ended participation in the census process. On the statistical side, the ABS used gamification, in the form of an app called Run That Town, to stimulate interest in the results of the 2011 Census.
As well as their sophisticated strengths, digital forms have weaknesses and can be more trouble than paper forms. Broadly speaking, this is the area of software design problems, where the rigidity of a coded questionnaire hinders successful completion.
For example, problems in the verification of the input data may prevent the user from successfully completing the questionnaire. This can be due to bugs (which are rarely a problem with paper forms) but digital surveys can also inherently lack flexibility in comparison to paper questionnaires.
With a paper questionnaire, a user can get away with skipping one or more questions, if, for example, they deem the questions too intrusive. (This is referred to as item nonresponse.) With a digital survey, there’s the possibility that, in that case, the software will prevent the survey from being submitted because some responses are missing. One possible outcome of this is that user will just give up, turning item nonresponse into nonresponse on the survey as a whole (what is termed unit nonresponse). This is clearly an effect to be avoided, but I don't know exactly how.
The traditional census is, in essence, a survey, and implementing an online census option should entail following the best practices in that field. What makes the census more delicate is the fact that responding is mandatory, so great attention needs to be paid to accessibility and to minimizing the burden on respondents.
Significantly, any steps toward eliminating human enumerators from the process will have the effect of removing the friendly in-person mediation they have provided between the public and the census-taking organization. Whatever system replaces them must perform this role.
I haven’t mentioned visual design (specifically, typography and UX) at all here, not because it is not important, but because it’s not as difficult to get right.
Clear and consistent typography is an essential part of the accessibility of a service, and even government agencies need to make the effort to project a credible, trustworthy and professional image. But the CSO probably does not need to invest in developing a high-end “design system”, although that would be in line with international best practice.
It shouldn’t be necessary, any longer, to wheel out the old Steve Jobs quote: “Design is not just what it looks like and feels like. Design is how it works.” A public service, such as the census, needs to be thoroughly designed. The apparently more cosmetic aspects of design ("what it looks like") shouldn’t be denigrated as inessential, but they must be supported by a sound architecture.
References
Irish Government Action Plan for Designing Better Public Services (December 2023)
https://www.gov.ie/en/publication/1e3e2-action/
On the history of censuses in the Netherlands
Statistics Netherlands: "125 years of population censuses: 1971" https://www.cbs.nl/en-gb/visualisations/timeline-125-years-of-population-censuses/1971
Statistics Netherlands: "Dutch census has been digital for 40 years" https://www.cbs.nl/en-gb/corporate/2023/27/dutch-census-has-been-digital-for-40-years
Chris Skinner’s 2018 article on challenges faced by the traditional census, and links therein 
Skinner, Chris, "Issues and Challenges in Census Taking", Annual Review of Statistics and Its Application 2018 5:1, 49-63 https://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-041715-033713
Fienberg, Stephen E.  & Kenneth Prewitt, "Save your census", Nature volume 466, page 1043, (2010) https://www.nature.com/articles/4661043a
Prewitt, Kenneth, "What If We Give a Census and No One Comes?", Science, Vol 304, Issue 5676, pp. 1452-1453, 4 Jun 2004 https://www.science.org/doi/abs/10.1126/science.1097727
CSO page about Irish Population Estimates from Administrative Data Sources
Irish Population Estimates from Administrative Data Sources, 2021, CSO Frontier Series Research Paper, 11 July 2023 https://www.cso.ie/en/releasesandpublications/fp/fp-ipeads/irishpopulationestimatesfromadministrativedatasources2021/
Interview with and information about Kenneth Prewitt
https://en.wikipedia.org/wiki/Kenneth_Prewitt
Peytchev Andy , "An Interview with Kenneth Prewitt", Survey Practice, Vol. 1, Issue 1, 2008 https://www.surveypractice.org/article/2927-an-interview-with-kenneth-prewitt
Information on the US Census and American Community Survey
Mule, Thomas, Administrative Records and the 2020 Census, April 01, 2021 https://www.census.gov/newsroom/blogs/random-samplings/2021/04/administrative_recor.html
Lo Wang, Hansi, Despite Cybersecurity Risks And Last-Minute Changes, The 2020 Census Goes Online, March 2, 2020 NPR Morning Edition https://www.npr.org/2020/03/02/807913222/despite-cybersecurity-risks-and-last-minute-changes-the-2020-census-goes-online
Whitworth, Erin, Internet Data Collection: Final Report, Census 2000 Evaluation A.2.b, US Census Bureau, August 14, 2002 http://web.archive.org/web/20050126120912/http://www.census.gov/pred/www/rpts/A.2.b.pdf
Census Bureau to Offer American Community Survey Internet Response, US Census Bureau, December 17, 2012 https://www.census.gov/newsroom/releases/archives/american_community_survey_acs/cb12-247.html
Respond to the ACS: Respond Online, US Census Bureau https://www.census.gov/programs-surveys/acs/respond/respond-online.html
Respond to the ACS: Frequently Asked Questions, US Census Bureau https://www.census.gov/programs-surveys/acs/respond/faqs.html
Information on the Australian Census
2016 Census Overview, Australian Bureau of Statistics https://www.abs.gov.au/websitedbs/d3310114.nsf/Home/Assuring+Census+Data+Quality
Delivering the 2021 Census, 10 August 2022, Australian Bureau of Statistics https://www.abs.gov.au/census/about-census/delivering-2021-census
Census methodology: How the data is collected, 28 June 2022 https://www.abs.gov.au/census/guide-census-data/census-methodology/2021/how-data-collected
Counting On Us: How Start Dates Affect Census Participation, Behavioural Economics Team of the Australian Government and Australian Bureau of Statistics, 07 November 2019 https://behaviouraleconomics.pmc.gov.au/projects/counting-us-how-start-dates-affect-census-participation
Information from the Office of National Statistics (UK) on designing a digital first census 
Fraser, Orlaith, Designing a digital-first census: Paper questionnaire and access code requests, UK Office of National Statistics, 4 October 2021 https://www.ons.gov.uk/peoplepopulationandcommunity/householdcharacteristics/homeinternetandsocialmediausage/articles/designingadigitalfirstcensus/2021-10-04
Customer Relationship Management
https://en.wikipedia.org/wiki/Customer_relationship_management
Race condition
https://en.wikipedia.org/wiki/Race_condition
Gamification
https://en.wikipedia.org/wiki/Gamification
Gamification of Australian census data
Chambers, Joshua, "Gamification: When governments up their game", Civil Service World, 07 Apr 2015 https://www.civilserviceworld.com/in-depth/article/gamification-when-governments-up-their-game
Run That Town, Millipede https://millipede.com.au/work/run-that-town.html
0 notes
theohonohan · 2 months
Text
Lara Almarcegui
Tumblr media
In 2005 I was an architecture student at the University of East London. In a period of psychosis, I found myself wandering around the city. One thing that caught my eye was building sites: new construction rising “as if by enchantment” (to use Louis Sébastien Mercier’s phrase), other structures being demolished. The changing fortunes of different entities and institutions—genii locorum, even—seemed to be reflected in this activity. Eventually, on my walks, I gravitated towards sites where building materials were piled up, at rest. This condition seemed to exemplify what I was looking for.
The Spanish artist Lara Almarcegui’s work often takes the form of heaps of raw materials. It is frankly sensuous: a 2013 interview in Frieze magazine included the line “What do you like the look of? I like soil, mud and clay so much I want to touch them.” This emphasis on the erotics of materials informs the aesthetic strategies Almarcegui uses. (By erotics, I mean the sensuous rather than the intellectual aspects of experience.) Often her installations use materials borrowed from the construction industry for the duration of the exhibition, just piled up in a gallery space. The quantity of each material is determined by, for example, a one-to-one relationship with the constituents of the gallery building itself (in the case of her show at the Secession in Vienna). In the words of the artists NE Thing Co. (Ingrid and Iain Baxter) in 1968: “Piles are not pretentious–they are just there being beautiful and doing their thing.” 
Almarcegui’s directness in drawing attention to materiality is combined with a interest in vacant lots—what are known in urbanistic discourse as terrains vagues. She describes her younger self, growing up in the 90s in Spain, as having an interest in architecture, and, through pursuing that, having discovered a personal sentiment that there was “too much construction, too much architecture, too much design”. She decided to stand against architecture. The notion of a terrain vague has become, for her, a symbol of resistance. She has referred to Spanish architect Ignasi de Solà Morales’s 1996 text on the subject:
The relationship between the absence of use, of activity, and the sense of freedom, of expectancy, is fundamental to understanding the evocative potential of the city's terrains vagues. Void, absence, yet also promise, the space of the possible, of expectation.
Terrains vagues are within the boundaries of the city, but temporarily outside its systems of land use and circulation. This indeterminate “excluded middle” status results in the German term Stadtbrache (or brache) being used—braakliggend terrein in Dutch—which originally referred to fields left fallow.
The French expression dates back far beyond Marcel Carné's 1960 film of the same name. The earliest occurrences in French literature are to be found at the turn of the 19th Century.
For Almarcegui, the significance of terrains vagues can be summed up by the word “possibility”. Her evocation of this condition is unfussy—the piles she presents are unagitated, in repose. One might compare Pierre Huyghe’s contribution to dOCUMENTA (13), Untilled, which involved an overgrown vacant lot. The limits of the Huyghe’s artwork were deliberately left unclear, but it included artistically hyperactive elements such as a thin white dog called “human” with one front leg coloured pink. In contrast, in her work Almarcegui rejects overplanning of any kind. She resists the imposition of intentions beyond her own resonant concept. While her work may refer to predecessors such as Robert Smithson, Gordon Matta-Clark or Lynda Benglis, it is, at the same time, simple.
The artist’s rejection of the totally designed, administered environment is not, however, moralistic. Although avowedly opposed to overconsumption, she is at pains to make clear that she is not trying to present her practice or her life as a model of environmental awareness. Indeed, her installations can be expensive. A gallery floor might need special reinforcement to carry the weight of tonnes of crushed materials; transport of the borrowed materials to and from builder’s yards results in CO2 emissions. For Almarcegui, these permanent impacts are justifiable in the process of achieving the aesthetic function of her work. In other respects, her process is self-sufficient: she typically takes her own documentary photographs of the ephemeral installations in situ, rather than delegating the process to a photographer. She is, in that respect, a believer in the conventional individualistic, contemplative ideal of art.
1 note · View note
theohonohan · 2 months
Text
Rivets
an extract from https://web.archive.org/web/20160406131907fw_/http://mysite.du.edu/~jcalvert/tech/rivets.htm
A rivet comes as a circular steel rod with a forged head, the manufactured head, on one end. For use, it is placed red-hot into a hole conventionally 1/16" greater in diameter. The length of a rivet is the distance from the underside of the head to the end of the fresh rivet. The thickness of the material to be joined is called the grip of the rivet. The length of the rivet to be used for a certain grip is given in tables. The rivet is then set by forging a field head onto it.
At one time hot rivets were thrown to a riveter by a rivet boy who heated them at his portable rivet forge. The rivets were inserted into the hole, and a heavy set held against the manufactured head. A set was held against the end of the rivet on the other side, and was hammered vigorously, forging the field head in short order. This was hand riveting. The work could be lightened by pneumatic or hydraulic hammers, but most satisfactory was machine riveting, in which high pressure was applied, usually hydraulically, to form the field head. This was quiet and fast, but required access to both sides of the work for the anvil against which the pressure could work.
Rivets were heated to 950°F–1050°F, handbooks say. This is not hot enough for an austenitic transition, so it must only have been for causing thermal expansion, not for softening the metal. By the time the field head was formed, the temperature was much lower. Carbon steel expands at about 6.5 ppm per degree F, so the cooling rivet contracted and pressed the plates very strongly. This is no trivial effect, since cooling through 500°F will produce a stress of \((6.5\times10^{-6})(500)(30\times106) = 97,500\) psi if there is no change in length. The tension in the shank will not actually get this high, since the steel will yield long before this takes place. When the field head is forged, the rivet shank also expands to fill the hole, which is also desirable.
This shrinkage puts considerable stress on the rivet head, but rivet heads can apparently resist a tension at least up to their yield strength. The sharp corner beneath the head is a definite stress-raiser, and a head would fail by a crack starting from this corner. The end of the rivet hole is sometimes rounded in preparation, which would guarantee a fillet at this point. If the head should be weak, one presumes that it would pop off when the rivet cooled, so the defective rivet could be replaced.
0 notes
theohonohan · 2 months
Text
Alliterative antinomies
This post is just a dumping ground for these phrases:
Mess and method
Beauty and boredom
Tedium and transcendence
Calculation and charm
Profit and prophecy
Impassive and impassioned
Functional and figurative
Math and myth
Factual and fanciful
Ordinary and ornate
Literal and lyrical
0 notes
theohonohan · 3 months
Text
Shift register sequences
I looked into shift register sequences a few years ago, as a way to create a random dissolve. This is a fairly well-known trick, sometimes referred to as FizzleFade. An \(n\)-bit maximal Linear Feedback Shift Register is used to cycle through all but one of the possible strings of n bits. In most shift register applications, it’s the output bit that counts. In FizzleFade, the contents of the register are what we care about.
There are two variants of LFSR, the Galois and the Fibonacci. They both use polynomials, “taps” to create feedback in the register, and in both cases they use XOR to calculate addition mod 2 (we are working in GF(2)). At first it might seem that the Galois and Fibonacci LFSRs can’t be equivalent, since the Fibonacci type XORs the tap values together to create an input bit, whereas the Galois type XORs the output bit together with a set of bits withi n the register. These are very different operations.
It turns out, though, that Galois and Fibonacci LFSRs create the same stream of output bits, so they are closely related. I’m tempted to say that they are ‘duals’, which is the go-to term when two mathematical structures are somehow opposite without being exactly complementary.
The mathematical operation on which LFSRs are based is long division of polynomials over GF(2).
As everyone knows, long division produces a pair of values, the quotient and the remainder. The Galois LFSR stores the remainders (both the ‘intermediate’ remainders, and the final remainder) in the register. At each step, it subtracts the divisor from the register using XOR. So the contents of the register change abruptly at each step. The quotient can be recovered by watching the output bits.
Here’s a visualisation of the calculation performed by a 3-bit Galois LFSR, formatted as long division:
Tumblr media
The contents of the register at each step are bolded. The first thing to notice is that after seven steps the register returns to its initial state. This property is a result of the choice of divisor. It corresponds to the divisor polynomial being a factor of \(X^n + 1\) for \(n= 2^3 - 1\), and not for any smaller value of n. Any divisor polynomial with this property is called a primitive polynomial.
The upshot of this is that the cycle length of this particular LFSR is \(2^3 - 1\), which is \(7\). Since there are only \(8\) possible values for the contents of the register, and the “all zeros” value is excluded, this is the maximum length of cycle. The successive output bits of the shift register give the quotient of the division operation.
The Fibonacci LFSR is more difficult to understand, but it’s also an implementation of the same long division process. At each step, the bit values at each of the tap locations are gathered and XORed together, which is the same as adding them (or subtracting them), and also equivalent to a parity check. The way to conceptualize this is that a digit at position \(n\) in the divisor only contributes if a) it is a \(1\) and b) a division took place \(n\) steps ago. The resulting bit corresponds to the sum of digits in the current column, and it tells us what the next bit of the quotient is. This bit is shifted into to the input of the register. As a result, the contents of the register in the Fibonacci case are the digits of the quotient.
The duality here, then, is that the Galois LFSR works on rows, and the Fibonacci LFSR works on columns. As in many cases of duality, one form is like an inside-out version of the other, not just an opposite. In this case, the Galois form is a log more natural. Indeed, because the Fibonacci implementation stores (part of) the quotient in the register, there is no place to “input” a dividend polynomial—it is specialised to dividing polynomials of the form \(X^n\), while the Galois LFSR can be readily adapted to do general polynomial long division.
In explaining this, I feel the need for more detailed terminology dealing with the workings of long division. What words are there to describe the condition when a one is entered in the quotient, and a subtraction takes place? I’ve referred to this as “a division” above, but that doesn’t seem quite right. On the other hand, it doesn’t seem likely that using words like subtrahend will improve the description.
0 notes
theohonohan · 3 months
Text
Packed like squares of wheat: Pixel art and the rhizomatic
Tumblr media
This piece is, in a way, about Australia. At Mossman in Far North Queensland there is a sugarcane mill, which I was lucky to visit in the late 1980s. Every summer since it was built in 1896, it processes sugarcane into sugar. In recent years it has processed around a million tonnes of cane, from over 100 nearby farms. All of the waste products of the process (cane tops, bagasse, molasses, filter mud) have a use; the fibrous bagasse is used to fire the boilers to produce the steam that runs the plant’s electricity generators. As of November 2023, the mill is in danger of being shut down permanently.
As a child, my focus was on the machinery of the mill. It was an impressively gloomy and grimy place, although it was not operating at that time of year. Like most boys, I was interested in the mechanics more than the ingredients. I visited quite a few factories and power plants with my dad around that time, mostly in Ireland. I wasn’t cut out to be farmer or a naturalist, and very far from being an enthusiast for biology.
You could describe this bias as an interest in the processor rather than the processed. The focus is on the tightly-wound gadget (the mill) rather than the loose bulk of the material to be processed (sugarcane). I think we live in a world which has come to appreciate compact devices over extensive natural territories. This is the same emphasis which rates railway locomotives as more interesting than the permanent way, that prefers great men and prime movers of all kinds to the mass of ordinary people, that appreciates mountains but not plains.
Machines (sugarcane mills, locomotives, computers) have intricate, interlocking, complex parts. When in operation, they are energetic, acting rather than acted upon. They grab our attention more readily than the apparently inert materials that they process. The term capital, which is a cognate of cattle, captures this machinery in general. Capital is a stock rather than a flow.
The relative importance of the mill and the sugarcane which feeds it is, in a more realistic view, equal. There’s no point having a mill if the cane doesn’t grow. A smartphone is little use without the Internet. It is important to cultivate a proper understanding of the way our machines depend on larger systems and supply chains, and to appreciate these larger systems for themselves. There is a different kind of interdependence, larger and more consequential than the interlocking complexities of machinery.
The Australian writer Bruce Pascoe has described Aboriginal life and care for the land as a “gentle braid”. There were no smoking mills in Australia before the colonists arrived, though Pascoe believes that agriculture existed in a so-called pristine form (not imported from elsewhere). Indigenous knowledge of botany facilitates farming as well as hunter-gatherer lifestyles. It’s interesting to consider the different texture and properties of this indigenous knowledge, when compared to the knowledge of the colonizer. For one thing, indigenous knowledge would have been passed down orally and by demonstration, and took the form of practices rather than theories. In a sense, indigenous knowledge is practiced as if it was a matter of appropriateness—“we do it this way because we have found this method to produce the best results”. The stereotypically scientific knowledge of the coloniser is more theoretical and has a deductive rather than inductive basis—“our theory tells us to try this alteration in order to scale up the process”. A theory holds the promise of adaptability to new circumstances, whereas an established practice, while effective, might be denigrated as little more than matter of tradition and appropriateness, unable to cope with a new situation.
The notorious deaths of the explorers Burke and Wills were caused by a lack of engagement with indigenous knowledge. They were cooking and eating bread made from nardoo flour, in imitation of the Aboriginals, but failing to prepare the nardoo seeds correctly. They starved to death because their nardoo-based diet was not nutritious. What they were missing was not a biological theory of the nardoo plant, but a practical recipe for preparing nardoo flour. 
I am not sure what the moral of the story of Burke and Wills is (perhaps it was just a case of arrogance) but it highlights the essential role played by practices that may not be scientifically substantiated, that are yet to be systematized. A perspective that decentres science would take account of the significance of these practices, of folk knowledge and the particularity of the various crops we all depend on for sustenance.
When thinking about this, the practice of pixel art comes to mind as an example of a non-mechanized form of activity. Although pixel artists use computers, they barely use any tools other than a palette of colours and the ability to select a pixel with the mouse. There’s no real reason why pixel art couldn’t be practiced on paper with a pencil grid and a set of crayons or paints. The pixel artist doesn’t care about the CPU or the complex possibilities of coding. All there is is the blank canvas and the colours.
The link between pixel art and the gentle braid of agriculture in pre-colonial Australia is perhaps a bit tenuous. I’m not claiming that both are simply uncomplicated. Both avoid major investments in machinery and capital, but they are not without complications. The pixel artist can bring all of his or her skill and intelligence to bear on the work, despite not engaging in any meaningful sense with the computer as a machine. The Aboriginal person doesn’t need to practice any kind of scientific method to have a extremely nuanced and comprehensive grasp of the landscape and its ecosystems.
The grid underlying pixel art doesn’t imply any kind of stratification. It doesn’t have a centre, much as a rhizomatic understanding of human relationships doesn’t have a centre. There is no monolithic absolute taproot in a rhizomatic system—a society characterized by what Édouard Glissant called the poetics of relation. A landscape roamed over and criss-crossed by desire lines and remembered associations is similarly gridded (though, admittedly, in a less regular manner). The pixels of an image are simply packed together “like squares of wheat” (this is a phrase from Philip Larkin’s poem The Whitsun Weddings). Although there are straight lines, there are no rules per se—neither scientific laws nor customary regulations. The pixel grid, like the natural world of the outback or the bush, is a experienced as a simple medium, despite the fact that what you can do there is fairly constrained—fairly circumscribed. It takes considerable skill to make a success of it. Somehow, you have to learn the ropes of foraging as a hunter-gatherer, or of drawing and painting as a pixel artist.
The image of a the simple digital canvas seems, in its pristine quality, to evoke the pristine environment of Australia before colonisation. This parallel between the object of interest of media studies and that of agronomy isn’t deep—it’s just a case, to me at least, of comparable moods or moments. Like the immaterial and (in spite of NFTs) unownable image world of pixel art, Australia was never a totally unstratified terra nullius, always a landscape into which associations were constantly being layered by its inhabitants. Bruce Pascoe evokes an anticapitalist paradise:
There were no fences for property delineation in Australia. Fences were sometimes built to enclose or herd animals but they were constructed in such a way as not to impede progress across the landscape. Instead, boundaries of language and lore were described on trees and rock and in song and dance. Everybody knew where they stood and although they had responsibility for geographic regions and features and could pass that responsibility to other members of the clan, they could never own it personally. —“Capital” by Bruce Pascoe in Meanjin, Summer 2021
0 notes
theohonohan · 3 months
Text
Automobility Testimony
I’ve been learning to drive again over the past few months, having given it up immediately after passing my test at the Barking test centre in 2005 (it’s a long story).
My instructor in London was a black evangelical Christian called Emeka. He was a really good instructor. Apart from anything else, he never reacted with shock when I did something wrong. This mattered a lot to me, because I’m pretty sensitive to being perceived as unreliable. I try to be deliberate and scrupulous about everything I do. I mean scrupulous in the strict sense of “very concerned to avoid doing wrong”. When learning to drive, the student is obviously objectively incompetent, so it’s impossible to avoid errors, and there isn’t time to carefully deliberate over every decision. So it is important to have an understanding instructor.
The one or two few driving lessons that I took in Dublin, previously, made me feel like a neurominority. Maybe I am, to a degree. It's a problem because the goal when driving is, as Emeka put it, to be part of the traffic, to conform. So I found it really difficult when a other instructor reacted to an error as if it was an affront, and looked at me as if I had two heads. I felt like this was a judgement that I was unsuited to being a driver. In fact, with Emeka’s tuition, I easily passed the test first time, and I have never had an accident on the roads, either in a car or on a bike.
The definition of a neurominority is essentially someone who has an atypical “spiky profile” of cognitive strengths and weaknesses. So a driving instructor who is used to “normal” students may find a neurominority’s performance a bit perplexing—there will be unexpected errors and rough edges, along with above average performance in other parts of the task.
Neurominority is, in my opinion, a good word to use, in this particular context, because the driving instructor is used to teaching the “neuromajority”, and that experience conditions their expectations. The instructor tries to categorise the student based on how previous students have performed. In the case of a neurominority, the student’s performance won’t conform to the instructor’s expectations. For example, a student that seems like the dull and competent type may turn out to have unpredictable behaviours and eccentricities. The last thing an instructor should do in this situation is dismiss the student as a some kind of intractable can of worms.
Driving instructors, in general, are a male-dominated group, and driving as an activity is part of a consumerist car culture, symbolising “speed, security, safety, sexual desire, career success, freedom, family, masculinity” (John Urry, “The ‘System’ of Automobility”). In addition to this normative culture, homophobia is not uncommon among driving instructors, and it makes life particularly difficult for LGBT people who need to learn to drive: see this Daily Mail article, and another from the Toronto Star.
It tends to be, generalising unfairly, a bit of a caveman demographic, and other forms of intolerance and discrimination are to be expected. Teaching someone to drive can be classed as an “intimate service”. It involves sitting with the student in a car for tens of hours, so some level of rapport is necessary. This, among other factors, means that for minorities of all kinds, negotiating the process of learning to drive is tricky. This is reflected in the statistics for test passes.
I believe (with support from Hofstede) that the masculine consumerist tendency is particularly marked in Ireland. Driving is a man’s world, and car culture reflects an emphasis on success and competition. 
A neurominority learning to drive may be quite out of step with this culture, and may have to deal with being othered by their instructor, along with being treated as an inexplicable and unpredictable freak, someone whose superficially acceptable driving skills continue to harbour the potential for “nasty surprises” (to evoke the most problematic framing of the situation). 
What’s missing in this perspective is an appreciation of the fact that these students are coming from a different place, cognitively speaking, and will learn differently. It should be assumed that they can still achieve the standard needed for safe driving. They will perhaps need more careful and prolonged teaching (relative to the typical student) to build solid skills and iron out all rough spots.
0 notes
theohonohan · 3 months
Text
Informal justification of Stirling's Approximation
I saw this on Quora, and I liked it enough to want to reproduce it here. The author is Chi Feng. (For more rigour see http://irishmathsoc.org/bull77/Holland.pdf)
There is a term that stands-out from Stirling's approximation:
\[\ln n! \approx n \ln n - n + O(\ln n)\]
i.e., the \(n \ln n - n\) term, which is just the indefinite integral of the natural log.
Hint: recall that
\[\int \ln x\,\mathrm{dx} = x \ln x − x + C\]
Where does this term come from?
Since multiplication inside a logarithm is equivalent to adding logs together, we can rewrite the left hand side as
\[\ln n! = \sum_{k=1}^n \ln k\]
A common strategy for approximating sums of this type is to replace the sum with an integral (See Euler–Maclaurin formula for more information):
\[\sum_{k=1}^n \ln k \approx \int_1^n \ln k\,\mathrm{dk}\]
Evaluating the definite integral on the right, we get
\[\int_1^n \ln k\,\mathrm{dk} = n \ln n - n + 1\]
Since the error introduced by approximating the sum as an integral is of order \(O(\ln n)\), we can ignore the \(+1\) term since it is much smaller than the error term (see the Wikipedia article on Stirling's approximation for an in-depth derivation of this error).
Thus, we recover Stirling's approximation:
\[\ln n! \approx n \ln n − n + O(\ln n)\]
To review:
We took the log of \(n!\) so that we could rewrite the product as a sum
We approximated the sum with an integral
We arrived at Stirlings Approximation!
0 notes
theohonohan · 3 months
Text
Synthetic Aperture Radar demystified
Tom Clancy, author of Reaganist techno-thrillers, was once reported in as having said "More than anything else, I'm a technology freak, and the best stuff is in the military." I'm not sure I believe this to be true. Maybe it was true in the 80s. In any case, I'm writing this article to explain the principle behind an advanced technology that the military have been known to use, and which is pretty obscure.
I first read about Synthetic Aperture Radar in a book about aviation by the excellent popular writer Bill Gunston. I recall that there was an illustration showing how an aircraft could fly along a straight path, with the radar dish in the nose pointing to one side, and somehow all of the positions of the radar antenna would be added up by a computer to produce a large synthetic antenna. There was no explanation of how it worked.
Years later, I did some research, and now I understand the operating principle. The first thing to note is that it's not an interferometric trick. Aperture synthesis in radioastronomy works by interferometry, and phased arrays also use multiple antennae in a similar way. Phased arrays are used extensively to produce focused radar beams, but they aren't part of the SAR concept.
SAR is essentially a Doppler technique. I'll try to explain it by analogy with conventional Doppler radar.
Imagine a policeman or policewoman standing in the middle of a huge piazza. Let's say he or she is standing on a little podium for safety—because there are many cars driving around the piazza in all directions. Give this police officer an omnidirectional radar speed gun. Radar speed guns work on the Doppler principle. Usually they're used to measure the speed of objects moving directly toward the gun, but it should be easy to see that an omnidirectional speed gun can also determine the point of closest approach of a vehicle: it's just the point in time at which the Doppler shift is zero. So, using a suitable omnidirectional radar speed gun, our police officer can determine the distance of closest approach of each vehicle as they cross the piazza. What the omnidirectional radar doesn't tell us is the direction of the point of closest approach. We just get a list of times and distances, a pair for each vehicle.
SAR uses the same information gathering method, but in a different geometric setting. In SAR, the radar is mounted on an aircraft or a satellite, which is flying in a straight line at some height above the ground. The ground is, obviously, moving relative to the plane. Our radar is still, in principle, an omnidirectional transmitter, and at each moment we can extract from the data all of the returns which have zero Doppler shift. These represent reflections from the ground somewhere in a plane perpendicular to the aircraft's trajectory. We don't know what radial direction the reflections are coming from—we just know the range and (due to the Doppler trick) the fact that they lie in a certain plane. This kind of range measurement is called a slant range, in SAR terminology.
If the aircraft stays on the same straight-line trajectory, we will gather data for a continuous stack of planes along the flight path. So, essentially, we can gather data for the whole volume around the flight path by just collecting reflections from targets that are precisely "abeam" at each moment.
There is a big drawback with this data, though, because we have effectively extended our antenna in one direction only—along the fightpath. I hope it's clear that an extension along the flightpath results in an increase in resolution only in that dimension. The long "synthetic aperture" produces a sharp beam. We still have no way of determining whether the radar echoes are coming from above or below the horizon. It turns out that this doesn't matter in practice, because we can make the assumption that we are looking down and sideways toward the terrain, and because our view is slanted (oblique), the range data we gather can be "unwrapped" to create an image. It isn't a perfect cartographic "view from above", and there is some foreshortening of slopes.
Tumblr media
One significant strangeness of SAR images is that buildings often invalidate this assumption about the simple unwrappability of the range data. Due to the downward angle the radar is imaging at, the SAR may get a return from the top of a wall before the return from the bottom of the wall. In the resulting image, the wall will be inverted, with the top appearing below (and before) the bottom. This effect is called layover. In order to be useful, SAR imagery which suffers from this problem requires interpretation, either by humans or software.
One term used to refer to the SAR principle in the early days was "Doppler Beam Sharpening". It should now be clear that, although this sharpened beam has a precise azimuth (it is perpendicular to the direction of flight), the beam is vertically much broader, and the elevation corresponding to a particular return is unknown. Focused SAR (as opposed to Unfocused SAR) uses the whole Doppler history of a particular target point to improve resolution. The way I think about this (and one possible implementation) is a matched filter applied to the data. Not only will the filter pick up the point where the Doppler shift is zero, it will catch all of the echoes as the target approaches and recedes, "focusing" them to determine a very accurate time of closest approach.
There are some variants of SAR which use slightly more complex approaches: Inverse SAR and Interferometric SAR. But the basic principle is simple, and it's the Doppler method.
0 notes
theohonohan · 3 months
Text
Polish Notation and Laziness
I decided to write a parser for Łukasiewicz's parenthesis-free notation, more commonly known as Polish Notation. One of the first Google hits I read suggested that the easiest way to do this is to reverse the expression and decode it as Reverse Polish Notation. Parsing RPN with a stack is a very easy and simplistic solution, so this suggestion is obviously a cop-out. I decided to write a Haskell program instead.
The expression to be parsed looks something like this: \(EKKCprCqrKCpsCqsCApqKrs\)
Upper-case letters are operators and lower-case letters are variables. Polish notation corresponds to a preorder traversal of the syntax tree for the expression. So my Haskell program just reads each character of the expression, from left to right, and builds the tree as it goes along. It's very natural to do this in Haskell, thanks to algebraic data types. Because it's an incremental process, the program can parse truncated expressions. The result, in that case, is a syntax tree with one or more "holes". This is a fairly familiar item in functional programming, and it's a straightforward way of representing the structure of an incomplete expression.
The "reverse it and parse as RPN" approach is clearly unable to make sense of truncated expressions in Polish Notation. What's more, a truncated RPN expression will result in a collection of values rather than a single value with holes. So there's a kind of duality here between Polish and Reverse Polish Notation.
The advantage of RPN is that, evaluating from left to right, every operator can be applied immediately to its operands. This is because the standard RPN parser pushes every operand it encounters onto a stack, and then pops them all off when it encounters an operator. Operators are never stored by the parser. This ability to calculate values immediately means that an RPN parses for arithmetic expressions doesn't have to build a syntax tree. Somebody programming in assembly, in the 1950s or 60s, would have found it laborious to implement such a data structure. In the 70s, though, computer scientists became aware of the choice between strict and lazy evaluation strategies. Along with the invention of algebraic data types (also in the 70s), this altered the significance of Polish and Reverse Polish Notation.
The traditional simple RPN parser is strict or eager: every operator evaluates its arguments at the earliest opportunity. The Haskell parser I wrote is lazy: it builds a syntax tree and only evaluates it when parsing is complete (if at all). A lazy evaluation strategy requires the cabability of storing suspended computations (known as thunks), which is what the syntax tree ADT amounts to. This kind of complexity doesn't play into the typical RPN-based system. It's also possibly to implement a Polish Notation parser that works in an eager way, storing operators on a stack and applying them as soon as their arguments are available. Of course, the outermost operator still won't be applied until the very end. This possibility means that there isn't a necessary connection between Polish Notation and laziness. It's arguably much more natural, though, to build a syntax tree from an expression in Polish Notation than from an expression in Reverse Polish Notation. So, the way I see it, Polish Notation is a spiritual precursor of modern functional programming (along with the lambda calculus and Schönfinkel's combinators) while Reverse Polish Notation is a kind of efficiency hack, intended for use by those who have no choice but to stay close to the metal.
One problem that the simple RPN evaluator avoids is the memory usage that results from creating thunks. Most Polish Notation expressions are extremely short, but the fact remains that, while an eager RPN evaluator uses essentially constant memory, a lazy evaluator builds a data structure of a size proportional to the length of the expression. This hangs around in memory (as a thunk) until we evaluate it. In Haskell, this becomes a serious consideration with the foldl function, which is intended to work on arbitrarily large lists. Haskell's solution is to provide a strict version of foldl called foldl'. foldl' takes one element at a time from the head of the list it is given, and combines it (strictly) with the initial value. As a result, it works like a tail-recursive function as it consumes the list, and uses constant memory.
My Haskell Polish Notation parser.
0 notes
theohonohan · 3 months
Text
liner notes from Interstellar Fugitives 2 (2005)
"Order and Chaos" Order and chaos. Two primal opposites from the beginning of time. Chaos has lead to the destruction of some of the greatest civilizations the earth has ever known. In the form of drugs, disease, and war, we are very familiar with how chaos has asserted itself as the destroyer of the best humanity has to offer. Bringing order to chaos has been a primary concern of people the world over. But order is not exempt from this destructive dynamic as well. Order is the force behind almost every dictatorship known in human history. American slavery was one of the most orderly systems devised, as well as one of the most brutal. Slaves were carefully accounted for and classified as Negro, mulatto, quadroon, octroon and so on and each had a relatively specific social and economic function. Germany’s Nazi regime asserted order in that country and beyond, at the cost of millions of lives. The chaos of 9/11 has lead to an order that asserts itself as a straight jacket on the civil liberties of men, women, and children around the world.
But chaos has come to the rescue numerous times throughout history. The chaos of the Civil War opened the door to the end of slavery in the United States (the last country in the Americas to do so). World War II saw the end of the Nazi regime, war being the definition of chaos. Even Gandhi’s nonviolent liberation movement can be seen as creating chaos within the Empire’s structure, leading to its eventual downfall. It is impossible to conceive of a revolution that was not born in chaos or cause chaos in the dismantling of the oppressive order. Indeed, it is ironic, or, perhaps destined, that the counter to chaos would be order, and the counter to oppressive order is chaos.
But chaos has another existence, one far older than current popular concepts.  It would be easy to dismiss chaos as the opposite of order, to equate it with disorder.  This would be wrong.  In the beginning there was Chaos.  There is a legend of how Chaos was a gracious host, who treated visitors well.  Two Emperors wanted to repay Chaos.  They observed that humans had seven holes they used to see, hear, smell and speak with and thought to create them for the faceless Chaos.  It is said that a hole was created each day for Chaos and on the seventh day, when Chaos experienced the world as humans do, with only 5 senses, Chaos died.  Chaos is the void, the nothing that is everything.  The everything that is nothing.  Chaos speaks not only to confusion, but also to possibility, to creation, to life itself.  Order is a pattern, a structure that guides, a defined program, and while a positive order is recognized as beneficial, it quickly becomes toxic without positive chaos.
"For untold years..." For untold years, music has been a battleground for these forces of chaos and order.  Again, the creative mind, the formation of something from a void is an act of tapping into chaos, finding a way to set it free in this world.  Yet there has always been pressure from order to define music, to lock it into a framework.  There were ancient battles over written music, whether it preserved music’s essence, or like the ancient Chaos, gave it a structure that killed it.  The same battle took on a different form with the advent of recorded music.  No longer would musical artists be free to play what they felt, there would be increased pressure to perform music the exact same way it was recorded.  Jazz musicians and others fought to prevent this, while pop stars willingly gave in and became programmed artists, producing music from order rather than chaos, a theoretical impossibility, much like encouraging a dead person to give birth.  Chaos would not remain beyond the new technology, the accessibility of recorded music allowing for marginalized artists to gain fans from places they’d never been, and providing a forum for R1 and Z-gene figures to communicate with others. 
But in the 20th Century, Order took hold in numerous areas of life in numerous locations across the planet.  Agents of Order had been behind numerous plots, many times using negative chaos to further its own selfish goals.  In one infamous example, the order of a nation was disrupted by the chaos of a forced drug trade imposed by agents of Order.  When chaos is dictated by order, to increase power over another, then universal forces are out of balance.  While Opium was the tool used to destroy one nation (though only momentarily), it inspired another to use Cocaine in its Crack form to destroy its own people.  It was during this century that Order gave rise to group of beings known as the Programmers.  The function of the Programmers was simple, to eliminate Chaos, to instill their brand of Order on everyone, from Nation 2 Nation.  This meant that when race radio was abolished in the US, when the creative chaos of listening to music from all genres became popular, the Programmers went into action, creating radio “formats” designed to establish a new Order, and molding the music industry itself into structures designed to dictate to the masses what was and wasn’t acceptable.  There were rebels, those who embraced Chaos even stronger, but Programmer Order worked overtime to suppress those voices.  In Detroit, alliances were built between the urban population and chaotic elements from Germany (Kraftwerk), Japan (YMO), other parts of the US (Bus Boys, Prince, Public Enemy), even outer space (Parliament/Funkadelic, Sun Ra).  But the Programmers were ready.  New forms of chaotic musical expression, inspired by the rebel agents of Chaos was labeled “techno” and faced a defined Order that would forever haunt the Detroit agents.  Radio rebels, such as the Wizard (Jeff Mills), and the Electrifying Mojo were forced from the broadcast spectrum by Programmers, fearful of the power of their musical chaos.
"In Asia..." In Asia, Programmers used violent conflict to destroy traditions and establish new orders.  Western forms of dress, entertainment, religion, and music would replace native ways, making a more uniform order between regions of the globe.  Rather than the orderly chaos of social evolution, nations would face a type of schizophrenia.  The established order would be Western-inspired, but the native culture would be stripped of its meaning, and function as tourist entertainment for world travelers.  Traditional practices would be reduced to a cultural safari where Others could marvel at the “foreign-ness” of the people.  Meanwhile the people would have no real awareness of their own culture beyond the commercial aspects designed for Others rather than themselves.
The Programmers maintain that they are controlling the masses “for their own good,” purporting to act as shepherds to a flock, but in reality, setting the masses up like lambs for the slaughter.  The notorious R- and Z-genes, possessed by numerous rebels, including the infamous Interstellar Fugitives (ISF), had always provided the chaos necessary to keep the Programmers, on some level, in check.  But chaos has no boundaries.  The ISF have been wracked with chaos from within and without.  After completing several successful away missions, the Mystic had followed the Path to dimensions unknown.  In a pitched battle, the Drexciyan forces transitioned, leaving this world to its fate, at the same time, not revealing what that fate was.  The Deacon faced off against demons seeking to steal his soul, Agent Chaos became a victim of his namesake.  Even Mad Mike was affected, establishing Base 3000, but the process left his access to his sonic arsenal limited.
So it was at a low point for our brothers when they were contacted by the Asian Underground.  It had been a long time since anyone had heard from any of the other cells in the World Power Alliance.  The Programmers had long ago taken hold of the continent, but for some reason, there was an urgency that the ISF could not ignore. Not that they were prepared for the level of control the Programmers had on the masses.  WPA agent Kamikaze had been ambushed, but died living up to his name by protecting an unknown resistance cell from detection.  There was an organized chaos that the ISF had brought to the table, time and again, but the destructive chaos that had claimed Agent Chaos was spreading to the others.
TRANSMISSION BEGIN The following is a report from Program Central concerning the ISF mission in Asia.  Many of the details of their mission remain coded.  The following is what has been deciphered from intercepted transmissions as well as from Programmer operatives who have engaged the enemy. 
The ISF were encouraged to assist the Asian Underground in an effort to disrupt our Programmed Culture Initiative.  As you know, the PCI was established to create Asian cultures and histories that would not only be free of politics and conflicts, but ones that would be more passive and commercially profitable than in the past.  There have been reports of some citizens experiencing something similar to what is known at Programmer Central as ‘post-emancipation psychosis’, that is, the person perceives a discrepancy between our propaganda that they are free, and the reality of a Programmed existence.  A mindsweep should reveal these potential R1s and extensive electroshock therapy should correct this.  The Asian Underground sought the ISF’s help in locating what may be an R1 factory.  Apparently there is a rebel cell that was unknown to even the majority of the Asian Underground known as the Pagan Design Force.  The PDF is a cell devoted to keeping the ancient arts alive, not through preservation, like we do in the museums, but through a chaotic process of blending styles and evolving them much like the infamous Z-gene Bruce Lee did with his “art.”  The PDF is a small cell and it is surmised that they have been isolated in some way, so they were not in a position to take on the Programmers directly.  However they see the value in preserving that art, that history, for the benefit of others.  Very few knew about them, very few could.  If their existence became known to the common person, without the proper support, they would be easy targets for our Revision Programs.  Being underground then, for them, is not a matter of choice, but one of necessity.  A series of tribal groups defend the territory around the PDF, and the ISF themselves had to fight the PDFs defenses before being allowed access to their headquarters.  This was apparently to make certain the ISF were not viral Programmers, a tactic that has been used effectively in other regions (see COINTELPRO).
While we have no information on what conversations took place, soon after saw the launch of several attacks on our outposts, culminating in what could only be called a mind bomb, resulting in the re-introduction of chaos to the citizens of numerous spaces.   The concern is that with the freedom of thought that is a trademark of chaos, our ability to Program will be slowed and the PCI may need to be revised or abandoned.
A great deal of work has been done introducing negative chaos to the ISF.  In fact, viral Programs have been activated within the ISF to accelerate their destruction.  It seems that the PDF have been made aware of this and have used the concept of yin and yang to counteract our work.  Yin and yang is one of the concepts we’ve worked so hard to relegate to the past.  It implies that chaos and order coexist, that they compliment each other.  The ISF have been able to use this to disrupt our negative chaos by putting it in the context of half of a greater whole.  They used yin/yang to see the positive order in their situation and are actively using it to create a chaos that threatens to give people a sense of power unseen since the early days of our Program initiatives.  In fact, there is no small irony in the fact that it was not the ISF who ultimately saved Asia, but an Asian concept that helped save the ISF and provided the seeds to our defeat.  However this is only one battle lost.  And the ISF and the AU both had casualties as well.  We will continue to Program against the agents of Chaos.  They have made their objectives known.  They are not interested in negotiating or assimilating.  They claim to seek some sort of harmony, but that balance would destroy everything we have built.  The ISF must be stopped.  They want nothing more than the complete destruction of Order… TRANSMISSION ENDED
0 notes
theohonohan · 4 months
Text
Full abstraction and the Old Etonians
“Full abstraction” sounds exciting—something like “full employment” or “fully automated luxury communism”. It’s a term from denotational semantics in theoretical computer science, describing a concept invented by Robin Milner in 1975. I’ll discuss it in more detail at the end of this post.
Robin Milner was a graduate of of Eton College, but he wasn’t from a wealthy background. He was a scholarship boy, and specialised in mathematics (https://users.sussex.ac.uk/~mfb21/interviews/milner/) from the age of 16. He won another scholarship to read maths at King’s College, Cambridge, where he was taught, if I am guessing correctly, by Norman Routledge. (Later in his career, Routledge moved to Eton himself, where he taught mathematics to Timothy Gowers among others. Gowers is not a computer scientist, so I’m not going to discuss him here, except to note that, being a descendent of Ernest Gowers, among others, he was more privileged than Milner. Both Milner and Gowers had a strong interest in musical performance.)
I want to use Milner as an example of the kind of computer scientist who was on the staff at Cambridge University when I was a student there (in the second half of the 1990s). He was initially trained as a mathematician and only later started working with computers. By my time he was head of the Computer Laboratory in Cambridge. He never did a PhD, but was a well-loved scholar and respected researcher, known for the discovery of Hindley-Milner type inference and the invention of the programming language ML. The tone of the Computer Laboratory varied from research group to research group, but there was an emphasis on theoretical rather than practical research topics.
The Denotational Semantics course was taught in the third year of the CS degrees (“Part II”). Andrew Pitts gave the lectures when I took it. I didn’t find him to be a very effective lecturer. The course Pitts developed in 1997 is still being taught (by others) with minor changes, today.
Norman Routledge comments that the tradition at Cambridge was not to worry too much about the quality of the lectures: https://youtu.be/WFEnqZSXWoE?t=83. The staff at the Computer Lab were employed for their excellence as researchers rather than their pedagogical skills. I was in a weak position because I didn’t have the mathematical maturity (or the general maturity) to succeed in denotational semantics. Andrew Pitts was (you guessed it) a former (Cambridge) mathematics student, and it seems to me that the way he delivered the denotational semantics material was akin to the way you might introduce a graduate mathematician to a topic in computer science: in other words, effortless mathematical maturity was expected. Across computer science in general, the first generation of academics, starting in the 1950s and 1960s, was typically drawn from the ranks of mathematics graduates. There is a sense in which mathematically flavoured computer science is an old-fashioned activity—and the belief that computer science is just a subfield of maths is no longer really an adequate description of the field. Theoretical computer science now has a flavour of its own, distinct from mathematics, and there are many other fields of CS which aren’t mathematical at all. The tendency to approach computing as a mathematical problem is valuable, but it’s not the whole story.
The topic of “full abstraction” is discussed in the very last lecture of the Denotational Semantics course. It’s tempting to say that it’s the crowning, most abstruse topic in the course, and the concept least likely to be used in later life among all the theoretical content of the CST. It’s certainly up there with the most abstruse ideas, anyway. Still, I feel the need to consolidate my understanding of these things, and I’m still thinking about it 25 years later.
Full abstraction is a property of a mapping between a language and its semantics. In general, you can think about it as a property of a translation between two languages. Here’s the abstract of the 1975 paper by Robin Milner which introduced the concept:
A semantic interpretation \(\mathcal{A}\) for a programming language \(L\) is fully abstract if, whenever \(\mathcal{A}〚\mathcal{C}[M]〛\sqsubseteq \mathcal{A}〚\mathcal{C}[N]〛\) for two program phrases \(M, N\) and for all program contexts \(\mathcal{C}[\;\;]\), it follows that \(\mathcal{A}〚M〛\sqsubseteq \mathcal{A}〚N〛\). A model \(\mathcal{M}\) for the language is fully abstract if the natural interpretation \(\mathcal{A}\) of \(L\) in \(\mathcal{M}\) is fully abstract. 
The Denotational Semantics notes have a very pithy definition of full abstraction: 
A denotational model is said to be fully abstract whenever denotational equality characterises contextual equivalence.
This is saying exactly the same thing as Milner’s paper, but it is either impenetrable or completely perspicuous, depending on your perspective.
My understanding of full abstraction is that, given a set of program phrases that mean the same thing (that are mapped to the same meaning by the semantic intepretation) a fully abstract interpretation will result in an identical set of synonymous meanings regardless of what context the program phrases are put into. The word “context” is a bit vague, here. Milner describes a context as a “hole” into which a phrase can be inserted. It’s safe to assume that contexts aren’t active—they are more like environments in which the free variables in a phrase are looked up, etc.
The two ways in which a semantic interpretation can fail to be fully abstract are as follows: first, the interpretation might distinguish “too finely” between the meanings of phrases, so that contextually equivalent phrases are assigned different denotations. Conversely, phrases which, in isolation, are assigned the same denotation, might not be contextually equivalent.
The SEP explains full abstraction in the following way:
Each of these accounts [denotational and operational semantics] induces its own synonymy relation on the phrases of the programming language: in a nutshell, the full abstraction property states that the denotational and operational approaches define the same relation. (https://plato.stanford.edu/entries/games-abstraction/)
The “synonymy relations” referred to here correspond to the “denotational equality” and “contextual equivalence” mentioned in the DenSem notes.
The PLS Lab wiki has an even shorter definition (https://www.pls-lab.org/en/Full_abstraction). It states that "equivalent programs are translated to equivalent programs". This reflects a broader definition of full abstraction: the concept has been applied to compilation, for example.
A fully abstract compiler guarantees that two source components are observationally equivalent in the source language if and only if their translations are observationally equivalent in the target. Full abstraction implies the translation is secure: target-language attackers can make no more observations of a compiled component than a source-language attacker interacting with the original source component. (https://www.khoury.northeastern.edu/home/amal/papers/fabcc.pdf)
One way to think about this is that a given program phrase will, when translated, have, on the one hand, some contexts that correspond to normal use. I’m going to call these definitive contexts (by analogy with definitive stamps: they're standard, ordinary and serve to support the definition of the phrase's intended meaning). They are ordinary and result in the expected behaviour. There will also be, on the other hand, the possibility of some strange contexts, which the person who wrote the phrase might not have foreseen, and that are probably not expressible in the source language. I’m going to call these exotic contexts (by analogy with exotic terms: “terms that do not correspond to any concrete term in the object-language being encoded”; see https://www.itu.dk/people/carsten/papers/esop08.pdf). Now, a fully abstract translation means that if the translations of two phrases both “do the right thing” in a definitive context, they must do the same thing in any exotic context as well.
As someone mentions on StackExchange, full abstraction means that exotic contexts (again, this is just my term) are not “able to poke our terms or semantic meanings in undesirable ways and spoil their equivalence. "Undesirable ways" means in ways that the programming language itself cannot poke them.” https://cstheory.stackexchange.com/a/20029/71809.
The application of full abstraction to secure computation seems to have originated in a research report by Martín Abadi. This is how he describes it:
We say that two expressions are equivalent in a given language if they yield the same observable results in all contexts of the language. A translation from a language L1 to a language L2 is equationally fully abstract if 
it maps equivalent L1 expressions to equivalent L2 expressions, and 
conversely, it maps nonequivalent L1 expressions to nonequivalent L2 expressions [Plo77, Sha91, Mit93]. 
We may think of the context of an expression as an attacker that interacts with the expression, perhaps trying to learn some sensitive information (e.g., [AG97a]). With this view, condition (1) means that the translation does not introduce information leaks. Since equations may express not only secrecy properties but also some integrity properties, the translation must preserve those properties as well. Because of these consequences of condition (1), we focus on it; we mostly ignore condition (2), although it can be useful too, in particular for excluding trivial translations. (https://archive.org/details/bitsavers_dectechrep_161004)
(Abadi’s condition (2) is usually called adequacy; some authors consider it to be part of full abstraction, and others mention it separately. It has the effect of guaranteeing that all of the phrases don't get simply get "trivially" mapped to the same denotation, like the kernel of a homomorphism.)
So, that’s my understanding of the meaning of full abstraction, as things stand. I might come back to this.
By the way, the reference [AG97a] in Abadi's report refers to work done in 1997, at the Computer Lab in Cambridge, on the spi calculus, a development of Robin Milner's pi calculus. It was a formal method for investigating cryptographic protocols. These calculi of communicating systems (including CCS, the earliest one invented by Milner) seem only to have been used to prove "short form" interactions in concurrent systems. Cryptographic protocols, with just a couple of parties and less than a dozen messages, are an ideal case for them; more complex systems are still overwhelming.
0 notes