magic-bad - Tumblr blog

magic-bad · 8 months ago

Text

In Code Reviews, "Do It THAT Way Instead..."

It is cheap to say, "Do not do it this way". You might also explain why their way is not good. It is good to supply a rationale. However, if that's all you say, then your feedback is very very very deficient, because what the reviewee needs instead is NEW (different) code to replace what they did originally. It is an integral part of your job to help come up with a better alternative, not just say that the reviewee is wrong.

The title of this post is an example of the advice here: it is telling you the GOOD way to do it; it isn't just saying that what you did is bad (and why it is bad).

This is what I call "constructive" feedback. "Constructive" does not simply mean that you are taking care of the reviewees feelings (orthogonally though, you should take care of their feelings). It means that you are supplying code (or at least a fairly clear picture of it) that the reviewee can actually use to satisfy you.

This does not mean you need to paint an exact picture. You don't necessarily need to supply the exact literal code they should use instead (exact literal code is good though!). In many cases, a sketch will suffice. If reviewees can in practice reliably turn your sketch into code that you are immediately satisfied with, then it is sufficient. If you haphazardly drawn three straight lines, and expect the reviewee to turn that into the Mona Lisa, you are very very very failing at one of the core responsibilities of your job.

Sometimes, you are not sure how to do it better. All you know is that the current way has problems. It's ok if this happens occasionally. When you have such a remark in draft, it is your job to spend a little bit more time figuring out how the reviewee can do things differently.

Your job as a reviewer is not to point out flaws per se; that is merely a means to an end. Your actual job is to arrive at good code. As a reviewer, you and the reviewee are partners, colaborators. Remember, you are on the same team. If the reviewee is struggling with your feedback, you are not doing your job.

0 notes

magic-bad · 8 months ago

Text

Tyranny of Robots

Let me give an example. We would like to do

use fruit::{ apple, banana, orange, };

instead of

use fruit::{apple, banana, orange};

But we can't, because the robot insists on the latter. Unsurprisingly, brain-less robots do not know how to write better code than humans. Nevertheless, we put them in charge of (some aspects of) writing code. This is a bit insane.

Yes, there are many cases where robots help us. But I think the thinking is that "it's a robot; therefore, it would never LIE to us; therefore, we should put it in charge of writing code". That is, people have a tendency to put blind trust into robots, more trust than brainless robots really deserve.

Yes, robots are perfectly consistent, and consistency is a good thing. But when a robot does something bad, it does that bad thing consistently! Remember, there's nothing stopping robots from doing a bad thing, because robots are written by... FALLIBLE HUMANS!

Why robots screw up: due to lack of brain, they cannot use the MEANING of code to decide what code should look like. That is, they cannot follow our prime directive: Always be meaningful. There is no straightforward way to implement

fn is_meaningful(code) -> bool { ...

For example, no robot is ever going to be able to say, "this name is incorrect", yet good naming is absolutely essential for readable code. Again, look to our moral imperative: ABM. Get that tattooed onto your retina. What this means is that robots do not solve our most important problems.

I'm not saying get rid of robots. Generally speaking, robots good! Don't throw the baby out with the bath water! I'm just saying, understand that bots are limited. Do not follow them blindly, because they themselves are blind.

0 notes

magic-bad · 8 months ago

Text

Beware of Kitchen Sinks

The word "utility" sounds sophisticated. But what actually happens in practice is that people cannot come up with something more meaningful, so they just slap "util" onto their name, and declare victory. Resist this urge.

For example, suppose you come up with a nice string splitting function. You might think, "this can live in a library named string_utils". You might think that since the word "utility" fits, it is a good name. To illustrate the problem with this, allow me to slightly exaggerate: suppose instead of "utility", I decide to instead use "code" in the name. "code" fits, right? It is certainly true that what you did is code. And yet, nobody would use "string_code" as the name, because that would be vacuous. The same is true of "utils", albeit to a lesser degree. Think about which word is doing the real semantic heavy lifting. Is it "string" or "utils"? This is really another case of marklar.

Don't be creating a place that encourages everyone to dump all their random tangentially related things. What you have created is an abomination, similar to this:

Here's what to do in this example: name the library string_split (or similar). You might be thinking, "A separate library JUST FOR SPLITTING?". Yes. Because bundling bad. The implementation of splitting does not involve joining (and vice versa). They can be separate, and therefore, they probably should be.

Another common "kitchen sink" word: "context". People use "context" when they do not want to name all the ingredients that are actually needed bake a cake. Implicit bad!

Explicit is better than implicit. --Zen of Python

Don't tell me that I need to supply a "fully stocked" kitchen in order for you to bake a cake. What if your last name happens to be Lindt, and as a result, you always had a giant pile of cocoa beans in your pantry when you were growing up. As a result, you do not consider a kitchen to be "fully stocked", unless there is a large supply of cocoa beans. Well, guess what, the rest of us are not named Mr. Lindt, ok?

The solution to your bad "fully stocked kitchen" recipe is simple: if you need cocoa beans to make a cake, just list cocoa beans as an ingredient. This is in fact how ALL recipes work. And algorithms are just data recipes. Nothing more; nothing less. Therefore, always list your ingredients (or the FDA will come after you!). Do not skip that job by saying, "oh you know. Just make sure to supply anything that any 'decent' kitchen would have.".

Now, if Jeeves the butler always does your grocery shopping for you (again, because your name is Lindt), and as a result, you don't know how grocery stores work, you can say, "Here, take Jeeves along with you. He knows what to buy.". Do not ask me to read Jeeves' mind. That's not a thing. It does not matter if I have 57 PhDs in computer science, I will never be able to read Jeeve's mind. Repeat after me: explicit good. Always be meaningful.

0 notes

magic-bad · 9 months ago

Text

Recursive Descent into Madness

One definition of an "algorithm" that I really like is this: a list of instructions. Notice that it does not say "a tree of instructions". That just sounds insane. Nobody can follow that. So why would you write code that's loaded with indentation all over the place?

Let me show a very typical example of "bad" code:

if requirement_1() { if requirement_2() { if requirement_3() { actually_do_the_thing(); } else { err("Requirement 3 is not met."); } } else { err("Requirement 2 is not met."); } } else { err("Requirement 1 is not met."); }

I like to call this the "recursive descent into madness" anti-pattern.

Issues that I'm seeing here:

1. Scalability: Notice that the amount of indentation we need is O(requirement_count). Later, I will show how we can optimize that down to O(1), an infinity percent improvement! I'm sure that in your application, you have some places where more than three conditions need to be met before you can actually do the thing. We really need something that scales.

2. Maintainability: Furthermore, requirements always change. Think about the extensive/invasive code surgery that we would need in order to simply remove requirement 2. Why does removing requirement 2 mean that we have to change the lines that only talk about requirement 3?? When "simple" behavior changes require complex code surgery, that is the very definition of NOT maintainable.

3. Locality: Another very strong sign that this is NOT how we should be doing things: look at how far away `err("Requirement 1 is not met.");` is from `if requirement_1() {`. Clearly, those two lines are extremely related to one another. Yet they are separated by a huge amount of orthogonal code. This is catastrophically bad code locality. Code that is conceptually closely related should be spatially close together. Even for machines, code with bad locality is hostile to instruction caches, and as a result, slower. The human brain is actually not that different from the machine in this regard.

Another sign that something is very wrong here: trying to match up the various `else`s with their corresponding conditions is not so easy. Sure, we use alignment to help, but when the lines are far apart, the effectiveness of alignment at showing correspondences is quickly thrown away. The human eye is just not good at seeing distant alignment. We should be aiming for closely related code to be spatially close together not just horizontally, but also vertically.

Solution

Fortunately, there is an easy solution: linearize. The basic structure you should have in your mind is this:

enforce_requirements(); actually_do_the_thing();

Let's refactor the bad code so that it follows this form:

if !requirement_1() { err("Requirement 1 is not met."); } if !requirement_2() { err("Requirement 1 is not met."); } if !requirement_3() { err("Requirement 3 is not met."); } actually_do_the_thing();

How this addresses the issues in the "bad" code:

1. Scalability: It does not matter how many requirements have to be met, the amount of indentation is always going to be the same. As promised, we have achieved infinity percent improvement from O(n) to O(1). (You really need to pay attention whenever something is O(1), because nothing scales better than that.)

2. Maintainability: Removing requirement 2 entails exactly what you think it should: deleting one chunk of code. Zero extraneous indentation adjustments.

3. Locality: This is literally perfect: if !requirement_i() { is DIRECTLY next to its err(...)';. Also notice that the problem of trying to see distant alignment is completely vanquished.

Bonus tip: locality does NOT only mean that closely related code is spatially close together, but ALSO that less related code is more SEPARATED from one another. Notice the blank lines between code for different requirements. Those chunks are NOT closely related; each requirement is an orthogonal from the others. Therefore, they should NOT be very spatially close together. Blank lines good when used judiciously!

Myth: "Blank" space is an extravagant waste. This was maybe true before we had huge monitors, but we are no longer in those "bad 'ol days". Fact: "Blank" space ALSO carries information. If morse code had no spaces, messages would be one giant continuous tone, which of course, is not actually decodable. Corollary: learn to use blank space judiciously. Indentation is one way to group code (using the x axis), but don't forget about the y axis!

Finally, to come full circle, let me leave you with this:

Flat is better than nested. --Zen of Python

0 notes

magic-bad · 10 months ago

Text

"Marklar" Words, a Naming Red Flag

There are only two hard things in Computer Science: cache invalidation and naming things. --Phil Karlton

People SAY naming is important, but I keep seeing bad naming in the wild (very high WTFs / s). Therefore, it seems everyone is just paying lip service to the supposed importance of naming. Therefore, let me take a moment to convince you that it's ACTUALLY important:

youtube

Turns out, when a word means everything, it means nothing at all. If you pay close attention, this is telling us what "bad naming" actually IS: our words are insufficiently MEANINGFUL. Or to put it another way that is maybe more grok-able to software engineers: the problems is when we use language that has low levels of information content. Content is king. It is in fact, super easy to use many words without transmitting much information. This is what politicians specialize in. To do the same thing in code, simply use bad "marklar" words. Here are some that I have seen many times:

data - Literally the ONLY material that a computer handles is data. This is quite possibly the most generic word you can possibly use. Yet I see people use this all the time. This is the exact computer equivalent of "marklar"!

info - This is just a synonym for data, and therefore, has the same problems.

state - computer = state machine. Once again, everything that a computer handles is some kind of state.

Of course, these word are often accompanied by some modifiers, qualifiers, adjectives. Adjectives good, but if you are using a marklar word for the main noun, then the adjective(s) are doing most if not all of the lifting; whereas, the main noun should be doing most of the lifting. Regardless of clarifying modifiers, marklar words are deadweight. Deadweight is the opposite of engineering. Be rid of "marklar" words!

Verbs

From the previous examples (and the fact that "marklar" itself is a noun), you might think that the problem only arises in the case of nouns. WRONG! Verbs can also be marklar words! Here are some that I've seen flying around:

compute - The only thing that a computer does is... COMPUTE!

do - All operations "do" something. This word does NOTHING to narrow down what the is actually being done!

run - Same as the previous two.

Marklar words are a form of hand waving. When you don't know what to say, but you want the audience to understand, you blurt out "The thing, You Know! The THING!". No, they do not know. Always be meaningful.

Bonus Tip(s)

There are rare and unusual cases where some of the above "banned" words are possibly acceptable. These cases are "the exception that proves the rule". For example, suppose you are developing some networking equipment. To move data around, data is stuffed into envelopes. From an envelope's point of view, it really does not matter at all if the letter inside the envelope is a heating bill to a customer, a love letter to your honey bun, or a declaration of independence to the king of England. The envelope really does not care at all about the contents of the letter. From the envelope's point of view, the letter really is an opaque blob. What's in an envelope really is just "data", because the only thing that the envelope does with it is move the data around. As long as it fits, the envelope does not care.

This exception really shows why marklar is generally wrong; most of us are not envelope engineers. But even if you are, there is a better word: payload. A tell tale sign that marklar might be ok is when you are treating the data as an opaque blob. The only thing you ever do with such data is copy, move, or do byte for byte equality. If you are performing any other operation, then it's not an opaque blob.

In short, unless you are an envelope engineer (most of us are not), keep the word "data" out of your code. But even if you are an envelope engineer, there is very likely still a better word than "data" (e.g. "payload")!

0 notes

magic-bad · 10 months ago

Text

Bundling Bad

You know how when you just want to buy Internet service, but the cable company wants you to also subscribe to TV and phone so that they can charge you more, even though you won't actually use the additional services? That's bundling, and we all hate it.

But we do this in programming all the time. More precisely, you come up with a Widget that is widely applicable, but you put the definition of Widget in your main application code. Now, whenever someone just wants Widget, they have to depend on a library that provides a ton of other stuff that is unrelated to Widget.

The solution is pretty clear, but we are often too lazy to do it, because it does not help you solve your immediate problem(s):

Start a new library, and put Widget into that.

If you do not do this, then every time you make a change that isn't related to Widget, ALL users of Widget will have to be rebuilt, and their tests need to be re-run. This slows everyone (including you!) down by trashing the build cache.

What you are doing is totally silly. You are providing a nail, but every time someone wants just a nail, you are forcing them to schlep the entire hardware store, including the garden hose, paint brushes, drills, and so on. Ok, maybe nails and hammers should go together, because maybe it is highly likely that hammer needs to change whenever nail changes (or vise versa), but certainly, if hose changes, there is no "real/genuine" need for nail users to rebuild and re-run tests. But because of how you (poorly) organized your libraries, the build system thinks, "Wow, this change impacts EVERYONE.". That's because your build system cannot know that changing nail does not actually affect garden hose users. Your build system is counting on YOU to have sensible dependencies, and the first step is to NOT bundle.

In short, do not bundle. Do not be like the evil cable companies.

Bonus Tip

Bazel provides a way for you to avoid people depending on your things: visibility. Like with instance variables, your libraries should be as not visible possible. This advice is just a corollary of a (not widely recognized) principal: need-to-know. It's something that's understood in the security world, but it applies in the general software engineering world too, and we see some instances of that already, such as where in the Google C++ style guide, it says that member variables MUST be private.

0 notes

magic-bad · 10 months ago

Text

How to Use Many Words to Transmit < Epsilon Information

Answer: Be vacuous.

Example: suppose you and a friend agreed to meet. He's a no show. So, you call and ask "Where you is?". He replies, "I am here.". Isn't that an exasperating reply? Don't be that guy!

Forgive me if I'm beating a dead horse, but let me explain: your friend is making a correct statement, but since it cannot be incorrect, he is actively transmitting zero information to you, and not answering your question.

I see this happening in code all the time. It most often looks like this:

// Processes widgets. class WidgetProcessor { ...

This is often done because there is a checklist somewhere that says "everything must be documented". Yeah, ok. You did words. Bravo.

This is probably enough to get past robotic enforcers.

youtube

However, a typical robot enforcer only very superficially inspects your comments. They be like, "Yup, there are > 0 bytes here. Move along.". Do not take the approval from a mindless robot as a compliment to your code.

A dead give away that you are being vacuous: you used words to define themselves. This move is a mental infinite loop. I hope it goes without saying, but from what I've seen, it seems that I must state the following: do not cause mental infinite loops. That is evil.

Successful communication = new understanding by the audience. To understand a new concept, it must be connected back to familiar concepts. This is always possible. You cannot install a roof without first putting up walls for the roof to rest on. No matter how high you go, there is always a path back to the ground. There is no such thing as understanding something in isolation, in a vacuum.

I think one of the excuses people use at this point is, "Yes, but what about

?"

Carl Sagan said, "To bake an apple pie from scratch, you must first invent the universe." That may be true, but we are NOT starting from scratch here. Rather, we begin with the assumption that the audience is "intelligent". What that means exactly will be explained in a future post. For now, let's just say that intelligent != they already know what you know. Others knowing what you know without meaningful communication is clairvoyance, but clairvoyance is not a thing. That's why defining words with those same words does not work. That's why we must connect back to things that they already know. Fortunately, an intelligent audience already knows a large amount. This is why it is practical for you to connect back to things they already know.

0 notes

magic-bad · 10 months ago

Text

We Already Know Your Change "Improves" the Code

Ok, so your change description says that you "improved" Widget (made it "better", or whatever). It's good that you mentioned WHAT was improved (Widget), but you are still missing a key piece of information: How? In what sense? Along what axis? This is super salient information. Do not omit super salient information. (You would think that this goes without saying, but experience tells me otherwise.)

For example, is it more secure? Uses less resources? Simpler? Simply saying that things are "better" says very very little about your change, because

It goes without saying. Nobody makes code changes with the intention of DEGRADING the code!

There are infinity ways code can be better. Your change does NOT improve the code in ALL dimensions.

Always be meaningful. (This 👈 is going to be a recurring mantra of this blog.)

Furthermore, you very likely have enough room in the title to answer the salient question: What is 'better' about Widget? It might not seem like you have enough room, because you already used all the space; however, if you cut out the parts that go without saying, you will find that there is way more space than you thought. This is because you can completely cut out the part where you merely state THAT the code is "improved", "better", etc, and replace that with HOW the code is improved.

E.g. instead of

Improved Widget.

your title can say

Use caching to speed up Widget.

Then, your description can go into further detail. E.g. it can describe the amount of speed up, is it always faster, or are there uncommon cases where it's maybe a little slower, etc.

Corollary: Do not simply state THAT your change fixes a bug in Widget. This is especially true if you use the "fix" Conventional Commit prefix. Instead, your title should try to answer "What was wrong before?" or "What is the new (better) behavior?". Answering just one of those is generally going to give a pretty clear picture the answer to the other. If not, there is more room in the description.

Another way to think about this issue: could your title apply to many other (future) code changes? If so, then you are looking at a big red flag of vacuousness. If so, your title is not very meaningful, and thus, violating the prime directive.

0 notes

magic-bad · 10 months ago

Text

Not All Numbers Are Integers

For example, the velocity of a space craft (m/s could be used), the price of some steel (USD), the amount of ingredients in a recipe (kg). Computers routinely handle these kinds of numbers. These numbers are not exotic at all. Yet integers are not a natural fit for these applications 🤔

Yes, if you pick sufficiently small units (e.g. millimeters per second, USD cents, micrograms), you will have enough precision that you can use integers in your application. Nevertheless, using integers in this way is ultimately shoehorning. When we do that, what we are really doing is reinventing fixed point. E.g. if you choose mm/s as the units associated with your integer, you are really using m/s, but with three decimal places.

One reason we keep on picking fixed width integers (particularly unsigned) is that's what is most straightforward to implement in hardware, and therefore, that's what lower level programming languages give us. It's what's most readily available. When you have a hammer, everything looks like a nail. Failing to touch screws, or worse, trying to hammer them is a (hilarious) mistake.

Remember when as a child, you first learned about fractions, or negative numbers, and it blew your mind? Remember how after getting used to them, your mind was expanded, and a short later, they became natural to you? Do not regress to your primitive less numerate childhood self.

Although 2^64 sounds like a large number, it is dwarfed by almost all integers. Of course, we can't deal with arbitrary integers, but we can be scalable. In order to be scalable, you can't pick something with baked-in limits from the outset. Scalable doesn't mean infinite capacity. It means that as Moore's Law does its thing, you get to keep riding that wave for free.

Believing that L resources "ought to be enough for anybody" is a mistake we see repeated throughout history. The reason people keep making this mistake is that they assume people in the future will be trying to solve the same problems that we are facing today. What actually happens when a resource becomes more plentiful is that people generally find exciting new ways to exploit that resource even harder. Humans are gold fish.

This is especially true of computational resources. People are not content to merely solve the same problems faster when beefier computers come along. Rather, bigger problems suddenly become feasible. (BTW, this is a very good thing if you have chosen a career in software!) E.g. the prodigious amount of energy required to train AI is just the most recent well-known example of "when resources are more plentiful, people find new things to do with it". This is why we should be scared for when Moore's Law finally runs out of steam. When that happens, innovation will suddenly stop advancing as fast as we've grown used to.

A closely related idea: if you are not using units, you are leaving half of your brain on the table. Unsurprisingly, when you are only using half your brain, you a liable to cause the loss of a $200e6 spacecraft. This is maybe something you don't want on your resume.

The moral of the story: do not be satisfied with unsigned 64-bit integers. If you have to use them for some reason, be defensive, because they are not a proper solution for most applications. In math, there is no such thing as overflow. That only happens in computers.

0 notes

magic-bad · 10 months ago

Text

Writing Tests Is NOT the Hard Part

Basic anatomy of all tests:

Prepare the world.

Call the code under test.

Inspect result(s).

Usually, phase 1 is what makes tests "hard to write". Phase 1 is often when you find yourself jumping through flaming hoops, contorting yourself just to arrive at the starting line where you can actually run the shiny new function you just wrote.

Key insight: creation of such hoops does not occur when you start writing tests. Rather, the problem was created earlier, when a past version of yourself wrote the production code. Not sure if people realize this, but that is one of the main points here. The hard part is writing the production code in a way that we can write tests easily. In other words, the challenge is how to write testable code.

You do not feel the pain while you are creating the problem of code that's difficult to test. You only feel the pain later, as you are writing tests. Due to temporal locality, you are likely to misattribute your pain to tests being intrinsically difficult to write. But that is not where the road to perdition begins. Rather, your woes are (ironically) created by your past self, when you wrote the code under test.

And so the real question is not so much "How do I write tests?", but we must rather ask, "How do I write testable production code?".

Negative Example 1: Print

Let's think about how tests detect bugs in our code: they 1. feed in some data, 2. watch what pops out, and 3. compares that to what is supposed to pop out. If what pops out is not what was supposed to pop out, then bug detected.

Let's think about how we can make that job more difficult. One way is to make it difficult to see what pops out. A concrete example: let's say we have a function. Instead of returning a value, it prints to stdout. The problem here is that in order for tests to observe what "pops out", they can't just grab the return value (because there is none). Instead, they need a way to see what you shoved into stdout.

The fix is easy to describe: the function should not print, but rather return. The moral of this example is pretty clear at this point: return value good. As much as possible, use return value, and avoid transmitting output in other ways (e.g. stdout).

Negative Example 2: Output Parameters

A less naive alternative way of transmitting output is "output parameters". A simplified example,

def bulk_up(factor, widget): widget.mass *= factor

What's popping out here: changes to widget. Therefore, tests must inspect widget (in particular, its mass field), after calling bulk_up.

Thinking back to the moral of the story from example 1 suggests a possible alternative design: bulk_up could instead return a modified copy of widget (and not change the input object). That's an alternative, but it might not be the best choice in every situation. If widget is very big, returning a modified copy of it might be too expensive. It depends. Nevertheless, whenever you see output parameters, you should be skeptical, and think "Why not return value?".

There's more that we can say on the topic of how to (not) write testable code, but this is enough for now. Watch this space.

0 notes

magic-bad · 10 months ago

Text

Introduction

Hello, world! This blog is about how (not) to write code. These are my opinions. As such, you ignore them at your own peril 😱

The premise of this blog is an embarrassing truth: turns out, we only have a faint idea of how to write "good" code. Here is what the venerable xkcd has to say about that:

My hobby: citing xkcd.

In order to fix a problem, we should try to understand where it comes from. To answer that, let's take a look at another piece of classic programmer humor:

Here is what I think we should glean from this strip: bad code comes from not seeing our code through the eyes of others. Walking in someone else's shoes is hard (hence even good code has > 0 WTFs /min). This will be a recurring theme of this blog. Check out this principal in the venerable Google's C++ Style Guide (another thing that I like to cite):

Optimize for the reader.

I'm sure they are not the first nor the last person to say something like this, but hopefully, by dropping the name "Google", I can lend some gravitas to this commandment. Let me put it another way: it really doesn't matter if your code makes sense to YOU (the author). What matters is that your code is readily understandable to the next poor sap.

(The next poor sap might be a future version of yourself, a version that does not remember all the things that are currently fresh in your mind. You might not want to hear this, but science tells us that we are bad at putting ourselves into the shoes of our past selves. For example, people have a tendency to misremember what opinions they held in the past, thinking that their old opinions are closer to their current opinions. We really cannot get out of our own shoes, not even to get into our own old shoes!)

Tangent

Will I actually keep posting here? I'm actually optimistic, because I often write similar feedback in code reviews. That will give me motivation to post here on a regular basis, because if I already posted my opinion here, I can cite myself when the similar issues come up again later (which it often does).

0 notes