#my coding knowledge is passably being able to Google how to do basic things in various semi coding languages as needed | Explore Tumblr posts and blogs

nsfmc · 7 years ago

Text

A checklist for computer science undergrads

influenced by john regehr's 'basic toolbox' post about this topic, i thought i would throw my hat into the ring given that my experiences have been different than john's and seem to be at odds with what i have observed from working with many competent developers.

As i was leaving grad school, a friend of mine suggested to me that a winning strategy in Industrial Design had been to pick some medium that you worked well in and focus on doing all your work with that. The rationale here was that starting anew each project with a new medium invariably impacted the execution of the final deliverable distracting your prof/critic/peers from the high-level feedback you actually wanted on your work, creative vision, etc.

The advice there is to focus less on the tool and more on using a tool efficiently to communicate your ideas. In most cases it does not matter what the tool is as long as you can deploy it to solve problems in your domain.

Much of the tooling that exists in CS is directed at very specific users: working programmers. using these tools correctly as an undergrad is aspirational, but often their execution is distorted in academic contexts.

Every lab or workplace should expect to bootstrap new hires on internal tooling/workflows and almost none of them should assume prior knowledge. Depending on the aims, the only hard requirement should be ability to program in a language or framework similar to the one being used.

Core skills

A single programming language

You do not need to be ultimately proficient in every language, you just need to be able to sketch out and implement a solution to most problems you encounter in one language you enjoy working in. Which language you pick does not matter. If you are in john's classes, however, you should probably ensure that you know two languages: a compiled/systems-ey one (rust, go, c, java, swift, clojure, etc) and a scripting language (python, ruby, javascript, clojurescript, elm, mathematica, anything goes here as long as it has a repl or runtime that you can use to hammer out solutions to problems).

If you're not one of john's students, typically the scripting language will suffice (although it is generally rare to finish a cs program being exposed to only one language).

s/Text Editor/Touch Typing/i

The advice to be familiar with a text editor is largely a request from others who expect you to competently pair-program with them at their pace. The point of knowing an editor is much the same as knowing at least one language passably: it should not be something that gets in your way.

More essential than being comfortable with a specific editor (it honestly does not matter which one as long as you like using it and you are productive with it) is being comfortable touch typing. In the event that slack or other IM platforms have not made you a better touch typist, it is well worth investing time if only so that the act of writing anything is no longer a major time hinderance.

At some point, you may find yourself bored or in need of procrastination and decide you want to customize your editor: that is a perfect time to try something like sublime or atom or vi or emacs.

rough shell experience

you should be able to navigate around a filesystem, make directories, read directory listings and read the cli help documentation for most commands.

you absolutely do not need to know the details of your shell's preferences around glob expansion or how to write legible shell scripts. you can learn that, but after a certain point, all the obscure functionality ends up beng more "dev-ops" style knowledge that rarely pays any dividends except when developing commercial developer-facing internal tooling.

incidentally, getting students past the hurdle of commandline BS is almost certainly a job of an advisor (or postdoc). Ignoring it helps nobody and if a research project's documentation (q.v. below) is poor or nonexistent, the PI only has themselves to blame for this ongoing time commitment.

reading documentation

this is probably the weakest skill i have seen from folks coming out of undergrad. nobody expects you to know all of a language, all of its quirks, etc etc. what you are expected to know is how to find the answer to any reasonable question around your language or toolchain of choice.

A useful skill: you should be able to, given a stylized block of shell commands, paste those into your terminal one-by-one in order to bootstrap some project i.e. ./configure && make && make test. nobody should expect that you understand autoconf unless your research project is specifically devoted to it in some obscure way (i'm sorry if this is the case).

Specifically, you do not need to know how to parse an excel-formatted csv, but you should know where to look (or be able to find a solution) in order to do that in a reasonable amount of time. You do not need to know what an ideal runtime serialization format is for your language, you only need to call back on the terms you learned in your cs classes: marshalling, serialization, persistence, writing data, etc. although it can be useful at the extremes, be skeptical of the amount and quality of programming language trivia you know offhand.

writing documentation

no, this is not technical writing. this simply means you should be able to write a plain text file for each project that outlines

how to build some program

what its implicit dependencies are

what its arguments are

what the exposed/public api is

aside from being useful to others, in roughly six weeks or half a semester, this will invariably be of use to future-you as well.

a good acid test here is pointing a friend to the project and asking if they can build it and understand how they might use it. at some point you will embed this knowledge into a Makefile, shell script, or some other dsl, but until then it is infinitely more useful to write down the steps.

html

unless this is your job (or you intend it to be) you only need to know how to make an academic-level webpage which requires only the most basic knowledge of semantic html: h1, h2, ul/ol li, p, a, img, pre, strong, em (optionally hr, dl dd & dt). avoid css. if anyone gives you shit, you can invoke "Default Systems" giving you a perfectly valid excuse to stop devoting any more attention to design after you have mastered those tags.

reproducing errors

it is unclear when you are an undergrad or novice if you have encountered a truly exceptional case or if you simply have no idea what you're doing. Make a habit of reproducing and then writing down steps to reproduce edge cases you encounter and share them with people you ask for help from.

above and beyond, if you can identify the specific step (or code or whatever) that you invoke that (seemingly) causes the error, you will have an easier time teasing apart the nature of bug as you are telling someone else about it.

the most basic of data visualization skills

all this means is that nobody is actually good at doing this and everyone thinks that two hours peeking at ggplot2 has made them wizards at communicating the complexity of some dataset or results. it hasn't.

in many cases it suffices to be able to graph something from mathematica, R, d3, mathplotlib, or google sheets / excel. again, nobody cares how you do it as long as you do it and it doesn't take you all day. if your lab or workplace has some in-house style for doing this, they will need to train you how to do that anyway.

nonlinear spider-sense

the single reason "big o" notation is taught in school is so that at some point you can look at a performance regression and say "ha, that almost looks like a parab—o.m.g." the ability to recognize code or performance that appears nonlinear (or pathologically exponential) is probably one of the core things that i think undergrads should try to hone because during almost no other time will you be asked repeatedly, and at length, to explain the space/time complexity of arbitrary blocks of code.

computers are fast enough that you can usually be blasé about performance but eventually you'll start looking. being able to recognize something that is accidentally quadratic is often the most practical day-to-day application of cs theory—hone this spider sense.

Nice to haves

Version Control

there is a large chasm between "git for one" and "using git as a team" and that harsh valley is almost certainly due to the large amount of human communication and coordination required to work on a project as a team. Most people stress learning git, but this is largely useless advice because most of git or hg's corner cases and weirdness only come up when you're trying to integrate your work successfully among your teammates. It is good advice to perhaps become vaguely competent using git or mercurial or rcs, that experience will almost certainly pale in comparison to the massive flail when you are trying to set up multiple worktrees to create integration branches that contain the contents of multiple prs (each likely with their own rebase/merge/squash quirks).

to that end, you should learn to, say, create a commit and push your work, but everything else beyond that is almost certainly guaranteed to be complicated by whatever your team's workflow is (github prs, phabricator, gerrit, etc). i have rarely met people outside of professional or open source contexts that are capable of producing sensible chained commits or sane pull requests, it is simply not a skill that is required outside of contributing to open source or working on a commercial application. When people ask for git experience they secretly crave this flavor of professionalism that it took months to acquire at each of their prior jobs or internships.

A Presentation Tool

the baseline here is very low, you only need to be able to make a presentation and in all likelihood if you are still an undergrad, you easily have ten-plus years of doing this already. worry about fonts/design/transitions/etc once your content is solid.

most people produce terrible presentations making the needed baseline here quite low—it is more important that you know how to practice giving a presentation than it is to actually create the slides for it.

debugger knowledge

i have met many successful professional working programmers that have little to no idea how their language's debugging tools work. if you are a gdb wizard this sounds shocking on its face but lots of developers make do just fine without them. This is not to say that you should be willfully ignorant of debuggers or eschew them (especially if this is part of your curriculum), but nobody should look down on you if you learn (or are taught this) On The Job.

many of these tools are technically robust but have a ui only moderately less hostile than an opaque box of loose razorblades and chocolates. much like git, most developers internalize some form of stockholm affection for these tools despite their poor design, nonexistent editor integration, and often incomplete terminal support.

you should understand roughly what a debugger is and what it can (and can't) do, but it's almost certain that you won't need to have mastered debugger internals straight out of college.

build systems

this is honestly a "top of maslow" need. This is great knowledge if you are planning to distribute code or need it to build dependably/reliably on others' computers, it is absolutely inessential for an undergrad to understand to do this level of orchestration except as documentation for others to evaluate that your project actually builds etc etc. if your advisor or boss asks you to learn something like make or whatever, then by all means.

You should know what a make tool is for and when it is necessary, but you should not expect that to apply to the lion's share of work you do in school.

working for a period of time before asking for help

although this should be a core skill many adults are incapable of doing this effectively. there is a tradeoff between "i'm learning" and "i'm being unproductive." In an academic lab, arguably much of your experience will appear to be some quantum state that simultaneously inhabits both extremes but your goal should be attempting to independently arrive at a solution and after some time cut-off (which you should negotiate with your advisor/postdoc/pi/whatever) you should say "i tried $A, $B, and $C to accomplish $GOAL and was unable to make any progress because $ERR_A, $ERR_B, and $ERR_C."

even the act of noting down "what i am trying to accomplish, how i tried, what went wrong" may in itself lead you to a correct solution, but without having done that due dilligence and outlined those aspects, it will be difficult to receive good feedback from somebody that is trying to help you.

unit/integrated/etc testing

if you find that something like TDD is useful for you as a productivity or refactoring tool, keep doing that! most working software people cannot even agree on what the point of testing is, so it feels unfair to burden undergrads with this. in a professional context, you will be in a codebase with some established testing norms, you need only mimic those until you have determined what works for you.

there are lots of sane and sensible resources for writing tests or thinking about tests. understand that everyone does testing slightly differently so your best bet will be to figure out how testing plays a part wherever you go. in most cases, that codebase will have a specific incantation to invoke tests, your best bet is to ask how they do things there are just go from there if the setup is not obvious.

understanding scope

most academic projects are poorly managed because they have inconsistent pressure to be profitable beyond whatever funding inspired them. simultaneously, many academic advisors are not trained well to manage or lead a team (remember, most were hired to write grants and produce research papers (or possibly to teach)). management is something an advisor is literally picking up "on the job".

If you are unsure what exactly you are supposed to do, you should clarify as soon as possible what deliverable is expected and when it is due. This seems obvious, but because communication is complicated you may end up assuming you need to, for instance, resolve outstanding cli argument parsing bugs rather than only needing to add support for a new one. Understanding the scope of a project you've been assigned prevents you from doing redundant work or opening prs that will never get merged.

language idioms

If you are cozy with a programming language, the natural evolution here is to begin learning what idiomatic programming is like for it: what are common libraries, do people tend to program it functionally or imperatively, for or map?, what patterns are awkward or hard to read, what are common tools in its toolchain, how do people use it to write web services, how do people use it to avoid shell scripting, what are its peformance pathologies, etc. this is the extension to knowing how to read the documentation: it is developing intuition about the language to avoid doing counterproductive work in the future.

Many developers learn one language and become fluent in its quirks then proceed to apply those to every language they see later on. if you encounter this as a novice, it may appear that they are simply Better Programmers and not, instead, people who are speaking a pidgin-python with a heavy haskell accent.

To recap

It is something of a mistake to hope that a cs student will have the gradually developed and refined skills of a professional tradesperson. Graduating cs students often do not have strong professional software development experience (this is what internships are meant to accomplish) but are good at thinking about design/architecture. if, at the very minimum, as an undergrad you can churn out some ruby and have the runtime execute it, you're usually in great shape.

most cs programs do not train students to develop tightly crafted applications with industry-tested documentation/syntax/structure/workflows etc. bootcamps, however, do stress this sort of thing, which causes a confusing periodic wave of "college is dead, long live bootcamps."

when looking at job descriptions or other checklists, it's useful to try to gaze back at the abyss and ask "why was this listed here?"

John's research is compiler-focused, deals with undefined behavior, and often invokes llvm, c, and other "low level" toolchains. a strong undergrad cs student will be able to intern with john productively because the core of his research focus is mostly general to computer science: correctness, compiler behavior, etc. someone with deep knowledge of C, llvm, compiler design/internals, etc is almost certainly in a position to become one of his graduate students or postdocs. I think john's list is interesting, but i think it emphasizes details that are often foreign to developers at all skill levels.

finally this list is biased itself, so take it with a grain of salt: all my work experience is in design and frontend/backend web development and the skills listed here represent the qualities i've observed from successful interns and developers i have interviewed and worked with in the past ~ eight years. my experience is clearly n=1, but among the things i've noticed is that it's easy to get people to learn git, but it's hard to get somebody to internalize recursion, nonlinear growth, or canonical architecture patterns within the same time period. i'm not saying it's impossible, but if you're a cs student, this is 100% what the point of most cs programs is.

2 notes · View notes