Original post
By Kragen Javier Sitaker, originally written 20140826, last substantive update 20141015, this paragraph added 20161130.
Pseudocode on paper is an important thinking tool for a lot of programmers, and on the whiteboard for programming teams. But our programming languages are very poorly suited for handwriting: they waste the spatial arrangement, ideographic symbols, text size variation, and long lines (e.g. horizontal and vertical rules, boxes, and arrows) that we can easily draw by hand, instead using textual identifiers and nested grammatical structure that can easily be rendered in ASCII (and, in the case of older languages like FORTRAN and COBOL, EBCDIC and FIELDATA too.) This makes whiteboard programming and paper pseudocoding unnecessarily cumbersome; even if you do it in Python, you end up having to scrawl out class and while and return and self. self. self. in longhand. So this page shows some examples of a paperoptimized algorithmic notation this author has been working on, on and off, over the last few years, to solve this problem.
Mathematics has been penciloptimized for a long time, so this notation rips off as much as it can from those guys.
Example: a simple “point” class
Background derived from Seamless Paper Texture by Merry, licensed under CC BYSA 3.0.
point @x @y
@r = @x²+@y² @θ = atan2 @y @x
r
↑ @r
θ
↑ @θ
⼻ Δx Δy
@x, @y ← @x + Δx, @y + Δy
@r ← @x²+@y² @θ ← atan2 @y @x
Alternatively you could write that last method this way, which is probably how this author would normally do it:
⼻ Δx Δy
@x += Δx
@y += Δy @r ← @x²+@y² @θ ← atan2 @y @x
Actually, if caching @r and @θ is warranted, writing a method to recompute them would be better, but the code is left duplicated to have something to put into the constructorbody.
Perhaps the notation should use indentation or a left or right vertical rule joined with the underline to delimit the extent of the class or function, so that you can have a function below a class without it looking like a method of that class.
Complexnumber arithmetic
Let’s look at a somewhat more complete and useful example: a complexnumber class.
ℂ @re @im
+ z
↑ ℂ (@re+z.@re) (@im+z.@im)
neg
↑ ℂ (@re) (@im)
– z
↑ self + z.neg
· z
re = @re · z.@re – @im · z.@im
im = @im · z.@re + @re · z.@im ↑ ℂ re im
conj
↑ ℂ @re (@im)
÷ z
n = self · z.conj d = z.@re² + z.@im² ↑ ℂ (n.@re ÷ d) (n.@im ÷ d)
See, that’s compact enough that you could quite reasonably write it down in the margin of your textbook as you’re riding the train, or on the whiteboard in your job interview, or whatever.
Loops and conditionals
So far, though, there’s nothing particularly “algorithmic” here; this is nearly just a set of equations describing arithmetic on complex numbers, although they do happen to be effective. So let’s look at some things that contain loops. Here’s Euclid’s Algorithm for ﬁnding the greatest common divisor of two numbers, which is sometimes claimed to be the first algorithm. A vertical line separates the whileloop condition on the left from the body of the loop on the right:
gcd a b
b  a, b ← b, a % b 
↑ a
Here’s Imran Ghory’s FizzBuzz. This uses a conditional, which is written as a vertical line crossed with horizontal lines to separate the cases, with the condition for each case on the left and its consequent on the right. The last case has an empty condition, indicating that it’s always taken; it’s an “else” case.
i 1:101  print

And here’s a linkedlist lookup subroutine which returns the list node at which the match was found, if any.
ﬁnd ℓ k
ℓ 

↑ Λ
You need at least two cases so that the lines cross; the “else” case can be empty. This version is equivalent because the consequent contains an early return:
ﬁnd ℓ k
ℓ 
ℓ ← ℓ.next 
↑ Λ
Compare this to the somewhat more specific OCaml:
let rec find l k = match l with Cons(k2, _) when k == k2 > l  Cons(_, m) > find m k  Nil > Nil
Or the Python, where the None return is implicit:
def find(l, k): while l: if l.k == k: return l l = l.next
This author thinks the OCaml and Python versions are more understandable, but a lot slower to write with a pencil and paper or on a whiteboard.
Slices and Quicksort
This author likes using Golangstyle slices, which are subsequences that don’t copy the original sequence; but the notation uses subscripts for array indexing and slicing. Here’s Quicksort:
qsort S
#S > 1  p = part S qsort S_{:p} qsort S_{p+1:} 
part S
i = 0
j :#S 

↑ i – 1
This version of Lomuto’s algorithm is somewhat simpler than its usual form, but it is equivalent.
With regard to the weird S_{^} thing, Python’s ability to refer to the last element of a list easily by saying s[1] is endearing, but it’s bugprone and sort of hard to read for nonPython people; really what we want is some kind of other number for fromtheend counts, one you can’t accidentally reach by erroneous address arithmetic going off the beginning of the list. Thus S_{1̂}, S_{2̂}, and so on, for the last few elements of the list, with S_{^} being shorthand for S_{1̂}. Is this a good notation?
If you wanted an actually practical version of Quicksort, you’d need to avoid its quadratic worstcase behavior on sorted input. If we use “?”, APLstyle, to give us a random integer less than a limit and greater than or equal to 0, then we get a practical inplace Quicksort with a tiny amount of extra pseudocode:
part S
r = ?#S
S_{r}, S_{^} ← S_{^}, S_{r} i = 0
j :#S 

↑ i – 1
That’s probably the simplest practical implementation of a generalpurpose sorting algorithm. There are lots of tweaks you can make to make it more efficient or robust (Hoare partitioning, medianofN pivot selection, dual pivots, recursion elimination with choice of direction to worstcase limit stack depth, and fallback to insertion sort on small arrays, to name a few), but this is the minimum.
Once you have sorted arrays, one of the most basic things to do with them is to binarysearch them, which gives you a dictionary. Binary search is a notoriously bugprone algorithm; it’s most cleanly stated in recursive form. Note the threeway ifelseifelse here.
bsearch S k
#S  p = #S // 2


↑ 0 
This version of binary search returns the position at which the element should be inserted if it’s not present; if the caller is doing an exact lookup, they need to check whether the element at that index is the correct element or not. This allows the same binary search routine to be used for exactsearch, insertion, and ﬁndingnexthighestafter. That’s why it returns 0 in the case of an empty sequence.
But you could reasonably complain that it is unreasonable for a mere binary search routine to be not only recursive but even nontailrecursive, so that it necessarily allocates space and can therefore crash. The above can be easily transformed into this iterative version, which solves these problems:
bsearch S k
b = 0
#S  p = #S // 2

↑ b
Comparison with traditional algorithmic notation: an extended meditation on heapify
Here we have the MaxHeapify algorithm as CLRS describes it. This is probably the most common way that algorithms are described to people studying algorithms; the language is essentially Algol60, minus type declarations and the tricky bits like callbyname.
Fair use analysis: this copying, for the nonprofit purpose of scholarship, is transformative because the copied work is the subject of commentary; the copyrighted work is a published, factual work; the amount used is a quarterpage of a work of over a thousand pages; and it has no significant effect on the market for the original work. As all four fairuse factors weigh overwhelmingly in favor of the legality of this copying under US law, it is indisputably legal. It’s ridiculous that this author even feels that they must worry about this fucking bullshit, but they do.
In the paperoriented notation presented here, it takes somewhat less ink. CLRS uses this separate “heapsize” pseudofunction to keep track of how much of your array contains the heap, but that’s unnecessary if we package up upper and lower bounds and a reference to the array into a single “slice” or “range” object that we pass to the algorithm; as shown before with Quicksort, this yields some simplification.
maxheapify A i
ℓ, r, m = left i, right i, i
ℓ < #A ∧ A_{ℓ} > A_{i}  m ← ℓ 
r < #A ∧ A_{r} > A_{m}  m ← r 
m ≠ i  A_{i}, A_{m} ← A_{m}, A_{i} maxheapify A m 
There are a couple of annoying things about this version, though. It uses mutation and recursion in a totally unnecessary fashion. The recursion is tailrecursion, so it can be replaced with iteration. It is better to write it as follows, using conditional expressions and an infinite loop with an early return in the middle, showing that it runs in constant space.
heapify A i
ℓ, r = left i, right i a =
b =
A_{i}, A_{b} ← A_{b}, A_{i} i ← b 
But if we use a filtered listcomprehension instead, it becomes substantially clearer, although less efficient if you implement it without doing loop fusion (between the listcomp and the argmax) and loop unrolling (to avoid testing a superfluous loop counter):
heapify A i
J =
m = argmax _{k ∈ J} A_{k}
A_{i}, A_{m} ← A_{m}, A_{i} i ← m 
The giant parentheses here are to avoid visual ambiguity about whether the “J =” is part of the loop condition or is a declaration of a variable to receive the results of the loop. This is especially helpful on loops, since loops don’t have a horizontal line to delimit their horizontal extent the way conditionals do.
Compare that again to the verbose CLRS version:
That may be easier to read, but which one would you rather scribble on a whiteboard or a napkin?
Dijkstra’s algorithm
dijkstra ⌂ e
d = { ⌂: 0 }
a = {}
p = { ⌂ }
p  n = argmin _{x ∈ p} d_{x}
p.remove n 
↑ d, a
Compared to Python on some random real code
How does this notation compare to, say, Python for messy realworld code, rather than textbook algorithms? A first function randomly selected from a randomly selected file from /usr/lib/python2.7 ended up selecting quopri.encode, which is long enough that it would be a lot of work. A second random selection was quopri.needsquoting:
def needsquoting(c, quotetabs, header): """Decide whether a particular character needs to be quoted. The 'quotetabs' flag indicates whether embedded tabs and spaces should be quoted. Note that lineending tabs and spaces are always encoded, as per RFC 1521. """ if c in ' t': return quotetabs # if header, we have to escape _ because _ is used to escape space if c == '_': return header return c == ESCAPE or not (' ' <= c <= '~')
In this paperoriented notation, that looks like this:
needsquoting c quotetabs header
c ∈ “␣␉”  ↑ quotetabs 
c == “_”  ↑ header 
↑ (c == ESCAPE ∨ ¬ (‘␣’ ≤ c ≤ ‘~’)) 
There is a linebreak inside that final parenthesized expression that doesn’t indicate sequencing of statements, but it would have unbalanced parentheses if you ended the statement there, not to mention a trailing logicalor.
(You may be wondering why there’s a special case for ESCAPE, since the ESC character comes before space and the other condition should catch that, but it turns out that quopri.ESCAPE is “=”, the quotedprintable escape character, not the ASCII ESC character. This is one of those examples where using a named constant reduced clarity rather than increasing it.)
If you were actually writing this in longhand on paper, though, you’d probably write something like this:
nq? c qt h
c ∈ “␣␉”  ↑ qt 
c == “_”  ↑ h 
↑ (c == “=” ∨ ¬ (‘␣’ ≤ c ≤ ‘~’)) 
Here’s another random function from the Python standard library, mhlib.MH.openfolder. (Did you have any idea Python’s standard library included functions for operating on MH mailboxes? Have you even heard of MH? Are you very old?)
def openfolder(self, name): """Return a new Folder object for the named folder.""" return Folder(self, name)
Well, that’s not very interesting:
openfolder name
↑Folder self name
Here’s another random function, ftplib.FTP.getline, which was the next random choice after imputil.Importer._reload_hook turned out to be shit.
# Internal: return one line from the server, stripping CRLF. # Raise EOFError if the connection is closed def getline(self): line = self.file.readline() if self.debugging > 1: print '*get*', self.sanitize(line) if not line: raise EOFError if line[2:] == CRLF: line = line[:2] elif line[1:] in CRLF: line = line[:1] return line
A very literal translation:
getline
line = @file.readline
@debugging > 1  print “*get*” (sanitize line) 
¬line  raise EOFError 
line_{2̂:} == CRLF  line ← line_{:2̂} 
line_{1̂:} ∈ CRLF  line ← line_{:1̂} 
↑line
Pattern matching
Symbolic, rather than numerical or I/O, computation often involves patternmatching on data structures. Common examples include computer algebra systems, compilers and interpreters, and database query engines. These all involve case analysis on the structure and partial contents of data structures. Doing that kind of patternmatching with explicit conditionals is tedious and hard to read, so functional languages like ML and Haskell provide special syntax to make it easier.
This notation writes pattern matching just like any other conditional, with the conditions on the left (although now they are patterns) and consequents on the right, with the addition of a horizontal line across the top with the expression to be matched against the patterns above it.
A very simple example is a function to turn a nonempty list of strings into an English list using the Oxford comma:
commalist S
↑
S  
[a]  a 
[a, b]  a  “ and ”  b 
[a, b, c]  a  “, ”  b  “, and ”  c 
[a, b, c, d…]  a  “, ”  commalist [b, c, d…] 
Here the patternmatch provides values for the variables in the pattern. The first three patterns match lists of specific lengths, while the fourth matches lists of any length of three or more (although it won’t get a chance on lists of length three, since they’ve already been snapped up by the previous case.)
In C, the closest we get to patternmatching is the switch statement, which is already quite useful. For example, in vicissicalc.c, we have
static void reactor_loop (void) { for (;;) { refresh (); int ch = getchar (); if (ch == ' ') enter_text (); else if (ch == 'f') view = formulas; else if (ch == 'h') col = (col == 0 ? 0 : col1); // left else if (ch == 'j') row = (row+1 == rows ? row : row+1); // down else if (ch == 'k') row = (row == 0 ? 0 : row1); // up else if (ch == 'l') col = (col+1 == cols ? col : col+1); // right else if (ch == 'q') break; … else error ("Unknown key"); } }
which could just as well be written as a switch (see the HN comment thread for why Darius chose not to use a switch here), and which renders in this notation as
reactor_loop
refresh
