Category Archives: development

A Handy Web Server in Emacs Lisp

As I mentioned in my last post I’m still using Emacs. One of the big reasons is that I do the vast majority of my work as a Data Scientist over a text based terminal.

This differs substantially from the workflow of many of my colleagues. I’m not sure I’d say my workflow is optimal (though the case could be made), but it is foisted upon me by circumstance: my wife is a farmer and we have very low quality internet. Barely broadband, in fact. Consequently, if I want to work remotely, I have to make an effort to use the lowest bandwidth tools available to me. For me that is mosh which maintains a persistent connection to my remote machines and papers over some of the latency and Emacs.

Screen Shot 2019-10-13 at 8.06.58 PM

How the internet gets to my house.

Its important for me that all my data scientific work is reproducible and so I do most of my development in a Docker environment. I also break it up into discrete pieces which I orchestrate with Make. If you are a data scientist or any kind of scientist, you might be wondering how I manage this, since most of our work consists of generating and pouring through lots of figures.

The answer is that I dump all my figures directly to disk and then I look at them over a basic web server. Up until recently, that was something like this:

python3 -m http.server 8080

As I generate figures, I just pull them up in my web browser. This workflow works surprisingly well with tools like plotly, though my usual practice is to generate a pdf, png and plotly version of every figure. When I’m on a low bandwidth connection, I’m usually looking at those PNGs.


This works fine. However, I generate hundreds of figures. The python one liner above always lists the contents of a directory in alphabetical order, which can make it hard to find precisely the thing I’m looking for. In addition to that, I’ve noticed Python 3’s built in web server tends to choke on large HTML files without linebreaks (which is precisely what plotly generates).

After shopping around for a slightly nice solution without much luck (a bunch of low configuration web servers are faster or smarter than Python, but none let me sort by date, as far as I could tell) I just decided to write my own.


The tool needs to show me the most recently modified files underneath a specific directory. Its always going to be running behind a firewall but it should present a pretty small security cross section. It should be minimal in terms of requirements.

I like to write Lisp whenever I can so I decided to write the tool in Emacs Lisp. The added benefit is that it can run directly in the environment I use to do my work.

I don’t talk often about this, but Emacs Lisp is one of my favorite programming languages and Lisp dialects.

Firs things first. M-x package-list-packages Then let’s install the web-server package, which is what everyone seems to be using these days (if anything). We’ll also be using my own shadchen ML-style pattern matching library (which is a bit amateurish but totally serviceable).

We’re going to try to follow good Emacs Lisp conventions. Shadchen doesn’t but I’ll try to rewrite it sometime soon.

;; -*- lexical-binding: t; -*-

Because we’re civilized, modern, people.

(require 'shadchen)
(require 'web-server)

We’re going to need to generate some basic HTML. Consider:

(require 'shadchen)

(defun fserver-el (tag attributes &rest args)
  (cons tag (cons attributes args)))

(defun fserver-render-html-worker (html buffer)
  (with-current-buffer buffer
    (let ((tag (car html))
          (attributes (cadr html))
          (args (cddr html)))
      (insert (format "<%s" tag))
      (if (> (length attributes) 0)
          (insert " "))
      (loop for (a . r) on attributes by #'cdr
            (match a
              ((p #'symbolp s)
               (insert (symbol-name s)))
              ((list (p #'symbolp s)
                     (p #'stringp str))
               (insert (format "%s=%S"
              ((list (p #'symbolp s)
                     (p #'numberp n))
               (insert (format "%s=%S"
                               (number-to-string n))))
              ((list (p #'symbolp s)
                     (p #'symbolp s2))
               (insert (format "%s=%S"
                               (symbol-name s2)))))
            (if r (insert " ")))
      (insert ">")
      (if (> (length args) 0)
          (insert (format "\n")))
      (loop for (a . r) on args by #'cdr do
             ((stringp a) (insert a))
             ((listp a) (fserver-render-html-worker a buffer)))
            (insert (format "\n")))
      (insert (format "</%s>\n" tag)))))

(defun fserver-render-html-to-string (html)
    (fserver-render-html-worker html (current-buffer))
    (indent-region (point-min) (point-max))
    (buffer-substring (point-min) (point-max))))

(defmacro fserver-html (&rest body)
  `(cl-flet ((e (&rest args) (apply #'fserver-el args)))
     (fserver-render-html-to-string ,@body)))

With this little friend we can write the following sorts of code:

 (e 'div '((class nice-list))
  (e 'ul nil
   (e 'li '(selected) "Element 1")
   (e 'li nil "Element 2"))))

And get back the following string:

"<div class=\"nice-list\">
    <li selected>
      Element 1

      Element 2



Isn’t Lisp great?

That covers generating HTML. Surprisingly simple.

Servicing HTTP Requests

Now we need our server.

(defvar *fserver-server* nil)
(defvar *fserver-port* 8888)
(defvar *fserver-handlers* nil)

(defun fserver-add-handler (matcher handler)
  (setq *fserver-handlers* 
        (append *fserver-handlers* (list (cons matcher handler))))

(defun fserver-clear-handlers ()
  (setq *fserver-handlers* nil)

(defun fserver-restart ()
  (if *fserver-server* (ws-stop *fserver-server*))
  (setq *fserver-server* 
        (ws-start *fserver-handlers* *fserver-port*)))


Now we just M-x fileserver-restart. Of course our server doesn’t do anything.

Let’s add some handlers. First a “Hello World”.

 '(:GET . "/")
 (lambda (request)
   (with-slots (process headers) request
     (ws-response-header process 200 `("Content-type" . "text/html"))
       (e 'html nil
          (e 'head nil
             (e 'title nil "Hello World"))
          (e 'body nil
             (e 'div nil "Hello World"))))))))

Try it out, if you’re following along. It works!

Design of the Service

Our server is going to let us do two things: list files associated with a project and then serve those files.

We don’t want to expose our entire hard drive or just a single sub-path to the internet. So the server side will configure a list of project names and their local directory head.

(defvar *fserver-projects* nil)
(defun fserver-add-project (project-name root-directory)
  (setq *fserver-projects* (cons `(,project-name . ,root-directory)

(defun fserver-clear-projects ()
  (setq *fserver-projects* nil))

(defun fserver-get-project-root (project-name)
  (cdr (assoc project-name *fserver-projects*)))

Now we can design our first resource: one that lists all the files in a project, ordered by date:

GET /project/<project-name>/<filter>

Should list all the files beneath the project root (except for some obvious stuff like git contents and temp files). Filter is a set of characters that has to appear in the filename or it isn’t shown. If filter is empty, then all files are listed.

For the sake of simplicity we’re going to limit project names and filters to alphanumeric characters including underscores. This is easy to match as a regular expression, which is what we need to provide to our handler as a first pass at matching:

Here is a first jab:

(defun fserver-project-p (project-name)
  (not (not (fserver-get-project-root project-name))))

(defun fserver-valid-project-path (path)
  (let* ((parts (split-string path "/" t)))
     (string= "project" (car parts))
     (fserver-project-p (cadr parts)))))

(defun fserver-parse-project-path (path)
  (if (fserver-valid-project-path path)
      (let* ((parts (cdr (split-string path "/" t)))
             (project (car parts))
             (root (fserver-get-project-root project))
             (pattern (cadr parts)))
        (list project root pattern))

(defun fserver-handle-project (request)
  (with-slots ((process process)
               (headers headers))
    (match (fserver-parse-project-path (cdr (assoc :GET headers)))
      ((list project root pattern)
       (ws-response-header process 200 '("Content-type" . "text/plain"))
       (let ((files (shell-command-to-string
                     (format "find %s -type f" root))))

Which we register like this:

 (cons :GET
        (rx (and line-start "/project/"
          (one-or-more (or alphanumeric
                           (char "_")))
            (char "/")
            (one-or-more (or alphanumeric
                             (char "_")))
            (zero-or-more (char "/")))
           (zero-or-more (char "/"))))))

(We’re using Emacs’s excellent regular expression construction facilities.)

fserver-parse-project-path validates that we have a good project, parses out the name, retrieves the associated root location, and also extracts the pattern, if any.

Then we just use that information to invoke find and get a raw file list. We want this list to be filtered down by the pattern and to exclude a few other things and then to be sorted by date-time. Finally, after we figure that out, we’re going to convert it to HTML.

We can prune out the git stuff with

(format "find %s -type f | grep -v \\.git" root)

But what is the best way to sort by date?

Something like this is pretty portable:

(defun fserver-stat-command ()
  (match system-type
    ('darwin "gstat -c '%Y --- %n' ")
    ((or 'gnu 'gnu/linux)
     "stat -c '%Y --- %n' ")))

(defun fserver-handle-project (request)
  (with-slots ((process process)
               (headers headers))
    (match (fserver-parse-project-path (cdr (assoc :GET headers)))
      ((list project root pattern)
       (ws-response-header process 200 '("Content-type" . "text/plain"))
       (let ((files (shell-command-to-string
                     (format "find %s -type f | xargs %s | grep -v \\.git | sort -r " root (fserver-stat-command)))))

I’m checking whether I’m on Darwin in order to call the appropriate version of stat there. We’ve also filtered out the git subdirectory. Now let’s construct links. We are going to want our files to be accessible via the server at

GET /files/<project>/<path>

Where path is relative to the project given. So we want to replace the path returned by our shell call with the appropriate material. And we may as well convert it to HTML while we’re at it. Its more efficient using a regular expression to do this than it is to write it out in Elisp:

(defun fserver-handle-project (request)
  (with-slots ((process process)
               (headers headers))
    (message (format "%S" (cdr (assoc :GET headers))))
    (match (fserver-parse-project-path (cdr (assoc :GET headers)))
      ((list project root pattern)
       (ws-response-header process 200 '("Content-type" . "text/html"))
       (let* ((files (shell-command-to-string
                      (format "find %s -type f | xargs %s %s | grep -v \\.git | sort -r " root (fserver-stat-command)
                              (if pattern (format "| grep %s " pattern) ""))))
              (retargeted (replace-regexp-in-string (regexp-quote (file-truename root))
                                                    (format "/file/%s" project)
              (linkified (replace-regexp-in-string
                          "\\([0-9]+\\) --- \\(.*\\)"
                          "<li><a href=\"\\2\">\\2</a></li>"
                (e 'html nil
                   (e 'head nil
                      (e 'title nil (format "Project: %s" project)))
                   (e 'body nil
                      (e 'h1 nil (format "Project: %s" project))
                      (if pattern
                          (e 'h3 nil (format "Just those matching: %s" pattern))
                      (e 'ol nil linkified))))))))))

Now all we have to do is implement the file serving resource. This is enough to parse and validate the resource for a file request:

(defun fserver-parse-file-path (url)
  (let* ((parts (split-string url "/" t))
         (project (cadr parts))
         (rest-of-path (cdr (cdr parts)))
         (filename (file-truename
                    (replace-regexp-in-string (regexp-quote (format "/file/%s" project))
                                              (fserver-get-project-root project)
    (if (and
         (file-exists-p filename)
         (fserver-project-p project))
        (list project (file-truename (fserver-get-project-root project))
               (replace-regexp-in-string (regexp-quote (format "/file/%s" project))
                                         (fserver-get-project-root project)

Our handler is then relatively straightforward:

(defun fserver-handle-file (request)
  (with-slots ((process process)
               (headers headers))
    (match (fserver-parse-file-path (cdr (assoc :GET headers)))
      ((list project root file)
       (ws-send-file process file)))))

This does the job – but our browser just tries to download any file we grab this way. We need to specify the Mime-Type for at least a few types of files if we want to get the desired behaviors. For instance, for HTML and image types, we want the browser to open them.

This leads, finally, to:

(defun fserver-get-mime-type (file)
  (match (downcase (fserver-file-extension file))
    ((and (or "png" "jpg" "jpeg" "gif" "bmp")
     (format "image/%s" ext))
    ((and (or "html" "htm")
    ((or "txt" "md" "Rmd" "text" "csv" "tsv")

(defun fserver-handle-file (request)
  (with-slots ((process process)
               (headers headers))
    (match (fserver-parse-file-path (cdr (assoc :GET headers)))
      ((list project root file)
       (ws-send-file process file (fserver-get-mime-type file))))))

 (cons :GET
        (rx (and line-start "/file/"
                 (one-or-more (or alphanumeric
                                  (char "_")))
                 (char "/")
                 (one-or-more anything))))

And now we’re done, pretty much. A simple, useful, file server we can control from inside emacs.

Emacs Apologia (2019)

Its 2019. I’ve been using Emacs for more than a decade and I’m not inclined to stop. Sometimes, my colleagues get on my case about it – why not use (for instance) RStudio or Jupyter or whatever other IDEs are floating around out there.

They’ve got a point: if you’re doing something, its hard for Emacs to beat a custom solution which usually has much bigger mind share and corporate support to boot.

But most of the time I’m not doing one thing. I’m doing a few, related, things and its in this context where Emacs shines. I tell my friends that Emacs is a general purpose text-based task thingamajig.

Imagine the scene!

You’ve got a problem you’re working on in R. Because you’re extremely professional, you do your work in a dev environment which is reified as a docker image. You realize you need to add a dependency – so you just say in your *R* buffer (maybe you’re using ESS or maybe not – I don’t).

> install.packages("gbm")

You also add the dependency to your deps.R script which your docker file runs. M-x shell creates a new shell, where you

> docker build . -t ds

And your container is updated in the background. Maybe you find yourself doing this a lot, so you say

(defun do-build ()
   (get-buffer-process (shell "*docker-build*"))
   (format "docker build . -t ds\n")))

And then you can just M-x local-set-key C-c C-c do-build and its just a keystroke away.

While that is happening your trying to figure why some values are turning up NA when you try to read from an sqlite DB into a data frame. You want to inspect the database manually. So you - M-x shell *sqlite* then.

> docker run ... sqlite3 /path/to/sqlite.db

Now, you want to run exactly the sql you’ve got in your R script, so you write the following absolute gem of a function:

(defvar *default-comint-buffer*
(defun region->comint (s e)
  (interactive "r")
  (let* ((bufs (get-buffers-with-processes))
         (dflt (or *default-comint-buffer*
                   (car bufs)))
         (buffer (completing-read "where? " bufs nil t dflt))
         (s (concat (buffer-substring s e)
                    (format "\n"))))    
    (comint-send-string (get-buffer-process (get-buffer buffer))
    (pop-to-buffer (get-buffer buffer))
    (setq *default-comint-buffer* buffer)))

And now you can highlight sql fragments in any buffer and M-x region->comint *sqlite* and you’ll execute that code and jump to the buffer.

And region->comint will do an enormous amount of leg work for you. Suppose your project uses multiple languages: R for one step, Python for another. A hassle if you’re using a Notebook or RStudio, but relatively easy to orchestrate inside Emacs.

Sure, lots of stuff is missing. People really love Tab completion and its not always perfect in Emacs.

But if you do complicated, multi-environment, text based tasks Emacs is still, far and away, the best tool for the job. And the fact that it works over a terminal and can act as a server, which means you can pop in and out as you need, leaving the environment up for months at a time. These days I keep multiple Emaxen running as daemons, one for each active project.

Emacs is indispensible for me especially in 2019.

Goals, Anti-Goals and Multi-player Games

In this article I will try to address Keith Burgun‘s assertion that games should have a single goal and his analysis of certain kinds of goals as trivial or pathological. I will try to demonstrate that multi-player games either reduce to single player games or necessitate multiple goals, some of which are necessarily the sorts of goals which Burgun dismisses as trivial. I’ll try to make the case that such goals are useful ideas for game designers as well as being necessary components of non-trivial multi-player games.

(Note: I find Keith Burgun’s game design work very useful. If you are interested in game design and have the money, I suggest subscribing to his Patreon.)

Notes on Burgun’s Analytical Frame

The Forms

Keith Burgun is a game design philosopher focused on strategy games, which he calls simply games. He divides the world of interactive systems into four useful forms:

  1. toys – an interactive system without goals. Discovery is the primary value of toys.
  2. puzzle – bare interactive system plus a goal. Solving is the primary value of the puzzle.
  3. contests – a toy plus a goal all meant to measure performance.
  4. games – a toy, plus a goal, plus obfuscation of game state. The primary value is in synthesizing decision making heuristics to account for the obfuscation of the game state.

A good, brief, video introduction to the forms is available here. Burgun believes a good way to construct a game is to identify a core mechanism, which is a combination of a core action, a core purpose, and a goal. The action and purpose point together towards the goal. The goal, in turn, gives meaning to the actions the player can take and the states of the interactive system.

On Goals

More should be said on goals, which appear in many of the above definitions. Burgun has a podcast which serves as a good long form explication of many of his ideas. There is an entire episode on goals here. The discussion of goals begins around the fifteen minute mark.

Here Burgun provides a related definition of games: contests of decision making. Goals are prominent in this discussion: the goal gives meaning to actions in the game state.

Burgun raises a critique of games which feature notions of second place. He groups such goals into a category of non-binary goals and gives us an example to clarify the discussion: goals of the form “get the highest score.”

His analysis of the poorness of this goal is that it seems to imply a few strange things:

  1. The player always gets the highest score they are capable of because the universe is deterministic.
  2. These goals imply that the game becomes vague after the previous high score is beaten, since the goal is met and yet the game continues.

The first applies to any interactive system at all, so isn’t a very powerful argument, as I understand it. Take a game with the rules of Tetris except that the board is initialized with a set of blocks already on the board. The player receives a deterministic sequence of blocks and must clear the already present blocks, at which point the game ends. This goal is not of the form “find the highest score” or “survive the longest” but the game’s outcome is already determined by the state of the universe at the beginning of the game. From this analysis we can conclude that if (1) constitutes a downside to the construction of a goal, it doesn’t apply uniquely to “high score” style goals.

(2) is more subtle. While it is true that in the form suggested, these rules leave the player without guidelines after the goal is met, I believe that in many cases a simple rephrasing of the goal in question resolves this problem. Take the goal:

G: Given the rules of Tetris, play for the highest score.

Since Tetris rewards you for clearing more lines at once and since Tetris ends when a block becomes fixed to the board but touches the top of the screen, we can rephrase this goal as:

G': Do not let the blocks reach the top of the screen.

This goal is augmented by secondary goals which enhance play: certain ways of moving away from the negative goal G' are more rewarding than others. Call this secondary goal g: clear lines in the largest groups possible. Call G' and goals like it “anti-goals.”

This terminology implies the definition.

If a goal is a particular game state towards which the player tries to move, an anti-goal is a particular state which the player is trying to avoid. Usually anti-goals are of the form “Do not allow X to occur” Where X is related to a (potentially open ended) goal.

Goals of the “high score” or “survive” variety are (or may be) anti-goals in disguise. Rephrased properly, they can be conceived of in anti-goal language. Of course there are good anti-goals and bad ones, just as there are good goals and bad goals. However, I would argue that the same criteria applies to both types of goals: a good (anti) goal is just one which gives meaning to the actions a person is presented with over an interactive system.

Multi-Player Games and Anti-Goals

I believe anti-goals can be useful game design, even in the single player case. In another essay I may try to make the argument that anti-goals must be augmented with mechanics which tend to move the player towards the anti-goal against which players must do all the sorts of complex decision making which produces value for players.

However, there is a more direct way of demonstrating that anti-goals are unavoidable aspects of games, at least when games are multi-player. This argument also demonstrates that games with multiple goals are in a sense inevitable, at least in the case of multi-player games. First let me describe what I conceive of as a multi-player game.

multi-player game: A game where players interact via an interactive system in order to reach a goal which can only be attained by a single player.

The critical distinction I want to make is that a multi-player game is not just two or more people engaged in separate contests of decision making. If there are not actions mediating the interaction of players via the game state then what is really going on is many players are playing many distinct games. A true multi-player game must allow players to interact (via actions).

In a multi-player game, players are working towards a win state we can call G. However, in the context of the mechanics which allow interaction they are also playing against a (set of) anti-goals {A}, one for each player besides themselves. These goals are of the form “Prevent player X from reaching goal G“. Hence, anti-goals are critical ingredients to successful multi-player game design and are therefore useful ideas for game designers. Therefore, for a game to really be multi-player then there must be actions associated with each anti-goal {A}.

An argument we might make at this point is that if players are playing for {A} and not explicitly for G then our game is not well designed (for instance, it isn’t elegant or minimal). But I believe any multi-player game where a player can pursue G and not concern herself with {A}, even in the presence of game actions which allow interaction, is a set of single player games in disguise. If we follow our urge to make G the true goal for all players at the expense of {A} then we may as well remove the actions which intermediate between players and then we may as well be designing a single player game whose goal is G.

So, if we admit that multi-player games are worth designing, then we also admit that at least a family of anti-goals are worth considering. Note that we must explicitly design the actions which allow the pursuit of {A} in order to design the game. Ideally these will be related and work in accord with the actions which facilitate G but they cannot be identical to those mechanics without our game collapsing to the single player case. We must consider {A} actions as a separate (though ideally related) design space.


I’ve tried to demonstrate that in multi-player games especially, anti-goals, which are goals of the for “Avoid some game state”, are necessary, distinct goal forms worth considering by game designers. The argument depends on demonstrating that a multi-player game must contain such anti-goals or collapse to a single player game played by multiple people but otherwise disconnected.

In a broader context, the idea here is to get a foot in the door for anti-goals as rules which can still do the work of a goal, which is to give meaning to choices and actions in an interactive system. An open question is whether such anti-goals are useful for single player games, whether they are useful but only in conjunction with game-terminating goals, or whether, though useful, we can always find a related normal goal which is superior from a design point of view. Hopefully, this essay provides a good jumping off point for those discussions.

A Critique of The Programming Language J

I’ve spent around a year now fiddling with and eventually doing real
data analytic work in the The Programming Language J. J is one of
those languages which produces a special enthusiasm from its users and
in this way it is similar to other unusual programming languages like
Forth or Lisp. My peculiar interest in the language was due to no
longer having access to a Matlab license, wanting an array oriented
language to do analysis in, and an attraction to brevity and the point
free programming style, two aspects of programming which J emphasizes.

Sorry, Ken.

Sorry, Ken.

I’ve been moderately happy with it, but after about a year of light
work in the language and then a month of work-in-earnest (writing
interfaces to gnuplot and hive and doing Bayesian inference and
spectral clustering) I now feel I am in a good position to offer a
friendly critique of the language.

First, The Good

J is terse to nearly the point of obscurity. While terseness is not a
particularly valuable property in a general purpose programming
language (that is, one meant for Software Engineering), there is a
case to be made for it in a data analytical language. Much of my work
involves interactive exploration of the structure of data and for that sort
of workflow, being able to quickly try a few different ways of
chopping, slicing or reducing some big pile of data is pretty
handy. That you can also just copy and paste these snippets into some
analysis pipeline in a file somewhere is also nice. In other words,
terseness allows an agile sort of development style.

Much of this terseness is enabled by built in support for tacit
programming. What this means is that certain expressions in J are
interpreted at function level. That is, they denote, given a set of
verbs in a particular arrangement, a new verb, without ever explicitly
mentioning values.

For example, we might want a function which adds up all the maximum
values selected from the rows of an array. In J:


J takes considerable experience to read, particularly in Tacit
style. The above denotes, from RIGHT to LEFT: for each row ("1)
reduce (/) that row using the maximum operation >. and then (@:)
reduce (/) the result using addition (+). In english, this means:
find the max of each row and sum the results.

Note that the meaning of this expression is itself a verb, that is
something which operates on data. We may capture that meaning:

sumMax =: +/@:(>./"1)

Or use it directly:

+/@:(>./"1) ? (10 10 $ 10)

Tacit programming is enabled by a few syntactic rules (the so-called
hooks and forks) and by a bunch of function level operators called
adverbs and conjuctions. (For instance, @: is a conjunction rougly
denoting function composition while the expression +/ % # is a fork,
denoting the average operation. The forkness is that it is three
expressions denoting verbs separated by spaces.

The details obscure the value: its nice to program at function level
and it is nice to have a terse denotation of common operations.

J has one other really nice trick up its sleeve called verb
. Rank itself is not an unusual idea in data analytic languages:
it just refers to the length of the shape of the matrix; that is, its

We might want to say a bit about J’s basic evaluation strategy before
explaining rank, since it makes the origin of the idea more clear. All
verbs in J take one or two arguments on the left and the right. Single
argument verbs are called monads, two argument verbs are called dyads.
Verbs can be either monadic or dyadic in which case we call the
invocation itself monadic or dyadic. Most of J’s built-in operators
are both monadic and dyadic, and often the two meanings are unrelated.

NB. monadic and dyadic invocations of <
4 < 3 NB. evaluates to 0
<3 NB. evalutes to 3, but in a box.

Give that the arguments (usually called x and y respectively) are
often matrices it is natural to think of a verb as some sort of matrix
operator, in which case it has, like any matrix operation, an expected
dimensionality on its two sides. This is sort of what verb rank is
like in J: the verb itself carries along some information about how
its logic operates on its operands. For instance, the built-in verb
-: (called match) compares two things structurally. Naturally, it
applies to its operands as a whole. But we might want to compare two
lists of objects via match, resulting in a list of results. We can
do that by modifying the rank of -:

x -:”(1 1) y

The expression -:”(1 1) denotes a version of match which applies to
the elements of x and y, each treated as a list. Rank in J is roughly
analogous the the use of repmat, permute and reshape in Matlab: we can
use rank annotations to quickly describe how verbs operate on their
operands in hopes of pushing looping down into the C engine, where
it can be executed quickly.

To recap: array orientation, terseness, tacit programming and rank are
the really nice parts of the language.

The Bad and the Ugly

As a programming environment J can be productive and efficient, but it
is not without flaws. Most of these have to do with irregularities in
the syntax and semantics which make the language confusing without
offering additional power. These unusual design choices are
particularly apparent when J is compared to more modern programming

Fixed Verb Arities

As indicated above, J verbs, the nearest cousin to functions or
procedures from other programming languages, have arity 1 or
arity 2. A single symbol may denote expressions of both arity, in
which case context determines which function body is executed.

There are two issues here, at least. The first is that we often want
functions of more than two arguments. In J the approach is to pass
boxed arrays to the verb. There is some syntactic sugar to support
this strategy:

multiArgVerb =: monad define
‘arg1 arg2 arg3’ =. y
NB. do stuff

If a string appears as the left operand of the =. operator, then
simple destructuring occurs. Boxed items are unboxed by this
operation, so we typically see invocations like:

multiArgVerb('a string';10;'another string')

But note that the expression on the right (starting with the open
parentheses) just denotes a boxed array.

This solution is fine, but it does short-circuit J’s notion of verb
rank: we may specify the the rank with which the function operates on
its left or right operand as a whole, but not on the individual
“arguments” of a boxed array. But nothing about the concept of rank
demands that it be restricted to one or two argument functions: rank
entirely relates to how arguments are extracted from array valued
primitive arguments and dealt to the verb body. This idea can be
generalized to functions of arbitrary argument count.

Apart from this, there is the minor gripe that denoting such single
use boxed arrays with ; feels clumsy. Call that the Lisper’s bias:
the best separator is the space character.1

A second, related problem is that you can’t have a
zero argument function either. This isn’t the only language where
this happens (Standard ML and OCaml also have this tradition, though I
think it is weird there too). The problem in J is that it would feel
natural to have such functions and to be able to mention them.

Consider the following definitions:

o1 =: 1&-
o2 =: -&1

(o1 (0 1 2 3 4)); (o2 (0 1 2 3 4))
│1 0 _1 _2 _3│_1 0 1 2 3│

So far so good. Apparently using the & conjunction (called “bond”)
we can partially apply a two-argument verb on either the left or the
right. It is natural to ask what would happen if we bonded twice.


Ok, so it produces a verb.

 3 3 $ ''
  ;((o1&1 (0 1 2 3 4))
  ; (o2&1 (0 1 2 3 4))
  ; (1&o1 (0 1 2 3 4))
  ; (1&o2 (0 1 2 3 4)))

│     │o1          │o2          │
│right│1 0 1 0 1   │1 0 _1 _2 _3│
│left │1 0 _1 _2 _3│_1 0 1 2 3  │

I would describe these results as goofy, if not entirely impossible to
understand (though I challenge the reader to do so). However, none of
them really seem right, in my opinion.

I would argue that one of two possibilities would make some sense.

  1. (1&-)&1 -> 0 (eg, 1-1)
  2. (1&-)&1 -> 0″_ (that is, the constant function returning 0)

That many of these combinations evaluate to o1 or o2 is doubly
confusing because it ignores a value AND because we can denote
constant functions (via the rank conjunction), as in the expression


What this is all about is that J doesn’t handle the idea of a
function very well. Instead of having a single, unified abstraction
representing operations on things, it has a variety of different ideas
that are function-like (verbs, conjuctions, adverbs, hooks, forks,
gerunds) which in a way puts it ahead of a lot of old-timey languages
like Java 7 without first order functions, but ultimately this
handful of disparate techniques fails to acheive the conceptual unity
of first order functions with lexical scope.

Furthermore, I suggest that nothing whatsoever would be lost (except
J‘s interesting historical development) by collapsing these ideas
into the more typical idea of closure capturing functions.

Other Warts

Weird Block Syntax

Getting top-level2 semantics right is hard in any
language. Scheme is famously ambiguous on the subject, but at
least for most practical purposes it is comprehensible. Top-level has
the same syntax and semantics as any other body of code in scheme
(with some restrictions about where define can be evaluated) but in
J neither is the same.

We may write block strings in J like so:

blockString =: 0 : 0 
Everything in here is a block string.       

When the evaluator reads 0:0 it switches to sucking up characters
into a string until it encounters a line with a ) as its first
character. The verb 0:3 does the same except the resulting string is
turned into a verb.

plus =: 3 : 0

However, we can’t nest this syntax, so we can’t define non-tacit
functions inside non-tacit functions. That is, this is illegal:

plus =: 3 : 0
  plusHelper =. 3 : 0
  x plusHelper y

This forces to the programmer to do a lot of lambda lifting
manually, which also forces them to bump into the restrictions on
function arity and their poor interaction with rank behavior, for if
we wish to capture parts of the private environment, we are forced to
pass those parts of the environment in as an argument, forcing us to
give up rank behavior or forcing us to jump up a level to verb


Of course, you can define local functions if you do it tacitly:

plus =: 3 : 0
    plusHelper =. +
    x plusHelper y   

But, even if you are defining a conjunction or an adverb, whence you
are able to “return” a verb, you can’t capture any local functions –
they disappear as soon as execution leaves the conjunction or adverb

That is because J is dynamically scoped, so any capture has to be
handled manually, using things like adverbs, conjunctions, or the good
old fashioned fix f., which inserts values from the current scope
directly into the representation of a function. Essentially all modern
languages use lexical scope, which is basically a rule which says: the
value of a variable is exactly what it looks like from reading the
program. Dynamic scope says: the valuable of the variable is whatever
its most recent binding is.


The straight dope, so to speak, is that J is great for a lot of
reasons (terseness, rank) but also a lot of irregular language
features (adverbs, conjunctions, hooks, forks, etc) which could be
folded all down into regular old functions without harming the
benefits of the language, and simplifying it enormously.

If you don’t believe that regular old first order functions with
lexical scope can get us where we need to go, check out my
tacit-programming libraries in R and Javascript. I
even wrote a complete, if ridiculously slow implementation of J‘s
rank feature, literate-style, here.


1 It bears noting that ; in an expression like (a;b;c)
is not a syntactic element, but a semantic one. That is, it is the
verb called “link” which has the effect of linking its arguments into
a boxed list. It is evaluated like this:


(a;b;c) is nice looking but a little strange: In an expression
(x;y) the effect depends on y is boxed already or not: x is always boxed regardless, but y is boxed only if it wasn’t boxed before.

2 Top level? Top-level is the context where everything
“happens,” if anything happens at all. Tricky things about top-level
are like: can functions refer to functions which are not yet defined,
if you read a program from top to bottom? What about values? Can you
redefine functions, and if so, how do the semantics work? Do functions
which call the redefined function change their behavior, or do they
continue to refer to the old version? What if the calling interface
changes? Can you check types if you imagine that functions might be
redefined at any time? If your language has classes, what about
instances created before a change in the class definition. Believe or
not, Common Lisp tries to let you do this – and its confusing!

On the opposite end of the spectrum are really static languages like
Haskell, wherein type enforcement and purity ensure that the top-level
is only meaningful as a monolith, for the most part.

Aping J’s Verb Rank in Puff

This blog post will sketch out some thoughts relating to Puff, a function level programming language I am embedding in Javascript and J‘s notion of operator rank.

Rank as it pertains to nouns in J is fairly easy to understand: it is just the number of dimensions of an array. Scalars, like 10, have rank 0, the empty value (denotable as 0 $ 0) has rank 0, a simple vector (0 0 0) has rank 1, a 2d matrix rank 2, etc.

But J also has rank for verbs. Consider the verb +.

(1 2 3) + (4 5 6)
-> 5 7 9

(For J tyros: + is called a verb in J and furthermore we use it in its dyadic sense, which is to say we pass it arguments on the left and the right.)

Informally we understand from this that in J + operates element-wise on both its left and right operands. This means its left and right rank are both zero and it operates, then, on the rank zero elements of its arguments: the individual scalar values.

But there is more to the story. As a side note, We can denote multi-dimensional arrays in J like so:

]example =. 2 3 $ 1 2 3 4 5 6 
1 2 3
4 5 6

(For the curious, that is “change the shape ($) for array 1 2 3 4 5 6 so that it is 2 3)

J has a box operator which is handy for demonstrating rank issues. It is denoted < and wraps any value into a box, which is a special scalar value type which holds something else.


Operators in J have rank and the rank of < is infinite. This means that it always operates on its argument as a whole.

<(1 2 3 4 5 6)
│1 2 3 4 5 6│

But the smart thing about J is that you can modify verbs with adverbs one of which returns a new verb with a different rank. See if you can guess what all this means:

<"0(1 2 3 4 5 6)

The array denotation 1 2 3 4 5 6 is the same as before, but now we have written <"0 instead of <. " is an adverb which modified its right hand arguments’ rank such that it is the left hand value. The result of<"0 then is a verb with the same meaning as < except that it has rank 0. Verbs with rank 0 operate on the smallest possible cells of the array, so that

<"0(3 2 $ 1 2 3 4 5 6)

each element of the input is boxed regardless of the incoming arrays shape or rank.

If we use a different rank:

<"1(3 2 $ 1 2 3 4 5 6)
│1 2│3 4│5 6│

We get a different result. One-ranked verbs operate 1-cells (that is, the elements of rank 1) of the incoming array, in this case the arrays 1 2, 3 4, and 5 6.

The rules work for dyadic values too – each argument of the verb has a rank (a right rank and a left rank) which determines how the underlying logic represented by the verb is used to join elements from the right and left arguments.

By modifying verb rank you can custom tailor your iteration through argument arrays and avoid most explicit looping.


Puff is mostly aping the function level semantics of J but we can analogize verb rank too. Consider the Puff function map, which has a single argument meaning:

var plus1 = _p(plus,1);
map(plus1)([1,2,3]) -> [2,3,4]

plus1 above would have, in J an infinite rank: it always applies to its whole argument. When we say map(plus1) we have a new function which applies to the N-cells of its argument (in this case integers). In other words, map creates a new function which peels off one layer of its input and applies the original function, collecting the outputs.

What, then, is

var mm_plus1 = map(map(plus1))


(NB, we can denote this in Puff via rep(map,2,plus1))

Here is a hint:

mm_plus1([[1,2,3],[4,5,6]]) -> [[2,3,4],[5,6,7]]

Now we have a function operating on the N-2 cells of the input. Rank in J typically operates bottom up: we start at rank 0 operating on the 0 cells, and increasing rank operates on larger and larger chunks of the input array. In contrast, iterative application of map in Puff moves us from the larger chunks to smaller and smaller chunks, until a number of applications equal to the array rank has us operating on individual items.

J being J we can mimic this behavior using negative rank.

<"_2(3 2 $ 1 2 3 4 5 6)

(_2 denotes the number -2 in J for possibly obscure reasons to do with making the parser simpler.)

Given that 3 2 $ 1 2 3 4 5 6 has rank 2, the verb <"_2 must operate on the 2-2=0 cells.

The J approach of, by default, thinking about rank from 0-cells up works well for that language because matrices in J are regular and they keep track of their rank. If we represent matrices as nested arrays in Javascript (this is not the only option, but it is the most idiomatic) then the real rank of a matrix cannot be known without a full traversal, which is prohibitive.

I might, one day, integrate a multidimensional matrix implementation into Puff and then enable rank modifying functions to work on that representation, but for now I want to focus on the successive use ofmap to simulate ranking down a function from infinite rank.

Consider Rank

Consider the following definition:

function rankedCall(f,n,a){
        return rep(map, n, f)(a);
    } else {
        throw new Error("Positive ranks not yet supported.");

var rank = _c(rankedCall);

Such that:

rank(plusOne,1)([1,2,3]) -> [2,3,4]

Cute. This gets us somewhere. But what really makes rank useful is that each argument carries its own rank and the system resolves the looping for you. In J operators have at most two arguments (to which rank applies, simulating more arguments with lists of boxes bypasses ranked dispatch).

Dealing with multiple argument functions is tricky. Let’s start with two.


// Puff provides a plus function
plus(1,3) -> 4
// but it doesn't work on arrays
plus([1,2,3],[4,5,6]) -> '1,2,34,5,6'

That last line is because Javascript idiotically interprets [1,2,3]+[4,5,6] to mean [1,2,3].toString()+[4,5,6].toString().

For these one dimensional arrays, we can get what we want with map which applies a function f of arity n to the items of n arrays.

map(plus,[1,2,3],[4,5,6]) -> [5,7,9]

(NB. In Puff we can also have said map(plus)([1,2,3],[4,5,6]))

What if we have [1,2,3] and [[4],[5],[6]], that is, the second argument is rank two?

Put aside questions of efficiency for a moment and consider the function:

function nextCellIndex(a, indexes){
    var indexes =; // copy the array
    var delta = indexes.length-1;
    var subIndex = indexes.slice(0,delta);
    var indexedArray = index.apply(null, [a].concat(subIndex));
    var done = indexes[delta]+1 < indexedArray.length;
      delta = delta -1;
          return null;
      } else {
          indexedArray = index.apply(null, [a].concat(indexes.slice(0,delta)));
          done = indexes[delta]+1 < indexedArray.length;

    indexes[delta] = indexes[delta]+1;
    for(var i = delta+1; i<indexes.length; i = i + 1){
      indexes[i] = 0;
    return indexes;

This function takes an array and an array of indexes and finds the next valid index into that array by incrementing the innermost index, checking whether that is in bounds, stopping if it is, or incrementing the next innermost and so on. If there is no valid next index, then null is returned.

If we want what J would call the -2 cells of an array a, we iteratively apply this function to a two element index vector.

var a = [[1],[2,3],[4]]
var indexes = repeatAccumulate(_p(nextCellIndex,a),3,[0,0])

Evaluating to:

[ [ 0, 0 ], [ 1, 0 ], [ 1, 1 ], [ 2, 0 ] ]

that is, the indexes of the -2 cells. We can get these by, for instance,

index.apply(null, a, indexes[0])

Note that a is not a regular matrix (the second item of a has a different length than the first and third – it has no obvious rank, but we can talk about its n-cells if we talk about them from outside in. We can write a function to give us these cells:

function cells(n, a){
      var nn = -n;
      var out = [];
      var indexes = initArray(nn,0);
          out.push(index.apply(null, [a].concat(indexes)));
          indexes = nextCellIndex(a,indexes)
      return out;
    } else if (n===0){
      return a;
    } else {
      throw new Error("Positive cells not yet supported.");

We can then just fall back onto map with the appropriate applications of cells:

-> [ 2, 4, 6 ]

Conceptually we’ve done well for ourselves: we’ve reproduced J‘s ability to change the way that functions join elements of arrays of arbitrary dimension. On top of that, by virtue of the arity of map, which can apply a function of any arity to any number of arrays, we have extended this idea to operators of any number of arguments (J is limited to monadic and dyadic verbs.)

In addition, Puff allows us to write the above function in a point free fashion:

var ex = f(2,au(map, al(plus), n0, r(n1,_p(cells, 2))));
-> [2, 4, 6]

(NB. al returns a function which always returns the argument to al, short for always, n0 returns the first element of a thing, n1 the second, etc. f (short for lambda) returns a function which first collects its arguments into an array and then passes them through its subsequent arguments as if via r (rCompose). Finally, au (short for augment) takes a set of functions and returns a new function which transforms its inputs via functions 1..n and applies function 0 to the that list.)

Positive Ranks

Using negative ranks is much more in-line with idiomatic javascript, since there are no native multidimensional arrays. We can produce a simple implementation of positive ranks if we make a few simple assumptions about usage. First consider:

function guessRank(a){
    var rank = 0;
    var done = false;
        if(typeof a['length'] === 'number'){
          rank = rank + 1;
          a = a[0];
        } else {
          done = true;
    return rank;

Working like:

:Puff:> guessRank(10)
:Puff:> guessRank([1,2,3])
:Puff:> guessRank([[1,2],[2,3],[3,4]])

The assumption we are making here is that the rank of sub-elements is homogeneous (and hence, the first element is a sufficient indicator). Now that we can guess the rank of an array, we can fill in the positive rank branch of our cells function:

function cells(n, a){
      var nn = -n;
      var out = [];
      var indexes = initArray(nn,0);
          out.push(index.apply(null, [a].concat(indexes)));
          indexes = nextCellIndex(a,indexes)
      return out;
    } else {
      var rank = n-guessRank(a);
      return cells(rank, a);

Now we can finally write our implementation of J‘s conjunction ". Our version of " will be called rank and will take a function and a list of ranks and return a new function with the appropriate rank behavior.

function rank(f){
    var ranks =,1,arguments.length);
    return function(){
    return map.apply(null,[f].concat(map(cells, 

We can now say:


And get [5,7,9]. Just like J. Of course, as we’ve written the code here we won’t be anywhere near the efficiency of J – in particular we iterate over each argument array separately, where we could combine all those loops into just one. But performance isn’t everything and we can always optimize the Puff implementation as needed. Rewriting the approprite sequence functions (map,mapcat,crossMap) to handle lazy versions of the sequences and introducing a lazy cells operator would be the most elegant solution. I’m sure I’ll get there eventually.

In the meantime, I hope I’ve at least helped the reader understand J‘s rank concept in greater depth and also showed off some of the nice ways Puff can simulate J style while staying entirely in Javascript.

Compose is Better than Dot (or: Function Level Programming In Javascript)

That great minds think alike is a reflection of the fact that certain ideas have an appeal that is, if not innate, then compelling in context. Hence the use of dot notation in the creation of domain specific languages in Javascript: dot is a handy way of succinctly representing a computation with a context. This is closely related to monads, and many javascript libraries relying heavily on dot for syntactic sugar are very closely related to one monad or another (eg _ to the sequence monad, promises to a sort of continuation monad, etc).

(NB. Much of the code here is inspired by a pointfree/function level programming library I am building for Javascript called puff, available here);

What I aim to prove today is that function composition subsumes the functionality of dot in this regard and that we can embed in Javascript a powerful, succinct function-level programming language in the vein of APL, J, or K/Q based primarily on function composition.

First of all, the traditional definition of function composition:

/** compose functions
 *  return a new function which applies the last argument to compose
 *  to the values passed in and then applies each previous function
 *  to the previous result.  The final result is returned.
function compose(){
    var fs =, 0, arguments.length);
    return function(){
    var inArgs =, 0, arguments.length);
    var i = fs.length-1;
    var res = fs[i].apply(null, inArgs);
    i = i - 1;
    for(; i>=0; i = i - 1){
        res = fs[i](res);
    return res;

This version of compose allows the programmer to pass in an unlimited number of functions and returns a new function which takes any number of arguments and then threads the result through all the previous functions, transforming it each time, starting with the last function and ending with the first.

This order of composition is inspired by the fact that the function in a traditional application expression precedes the argument (on the left), transforming:


“naturally” to


or, if succinctness is of interest:

var c = compose;

In this way we drop a few parentheses and hopefully express more clearly our intent.

Of course we might not wish to apply our composition immediately: we can produce useful functions via composition, after all.

var plusOne(x){ return x + 1 };
var timesTen(x){ return x*10 };

var plusOneTimesTen = c(timesTen, plusOne);

We can now apply plusOneTimesTen to as many values as we wish. Handy. However, now our intuition about naming and the order of the arguments to c are at odds. Hence, we introduce:

function rCompose(){
   return compose.apply(null,, 0, arguments.length).reverse());
var r = rCompose;

So that the above looks a bit nicer:

var plusOne(x){ return x + 1 };
var timesTen(x){ return x*10 };

var plusOneTimesTen = r(plusOne, timesTen);

This reverse composition operator is similar in many respects to dot in javascript except we have abstracted away this, to which each method invocation is implicitly addressed in a dot chain. In addition, instead of simple method names, each element in our r list can be any Javascript expression which evaluates to a function. This means that we can denote any sequence of operations this way without worrying whether or not they have been associated with any particular Javascript object.

With the right set of primitives and combinators, r forms the basis of a powerful, succinct function level programming language in which we can build, among other things, expressive domain specific languages. Or with which we can rapidly denote complex operations in a very small number of characters.

Well, first of all we want the bevy of basic functions:

plus, minus, div, times, split, join, toString, index, call,
apply, etc, array

These behave as you might expect, more or less (eg, some plausible implementations to give you the spirit of the idea):

function plus(){
   var out = arguments[0];, 1, arguments.length)
        out = out + x;
   return out;

function index(o, i){
  return o[i];

function call(f){
  return f.apply(null,, 0, arguments.length));

You get the idea.

Astute readers may realize that our composition function seems to only work with functions of a single argument. Remedying this will be a matter of some interest. The simplest approach is to provide for partial application:

/** partially fix arguments to f (on the right)
function partialRight(f /*... fixedArgs */){
    var fixedArgs =, 1, arguments.length);
    var out = function(/*... unfixedArgs */){
    var unfixedArgs =, 0, arguments.length);
    return f.apply(null,unfixedArgs.concat(fixedArgs));
    out.toString = function(){
    return "partialRight("+f.toString()+","+fixedArgs.join(",")+")";
    return out;

/** partially fix arguments to f (on the left)
function partialLeft(f /*... fixedArgs */){
    var fixedArgs =, 1, arguments.length);
    var out = function(/*... unfixedArgs */){
    var unfixedArgs =, 0, arguments.length);
    return f.apply(null,fixedArgs.concat(unfixedArgs));
    out.toString = function(){
    return "partialLeft("+f.toString()+","+fixedArgs.join(",")+")";
    return out;

These functions (they might be adverbs in J) take a function and a set of values and return a new function, “fixing” the arguments to the left or right of the argument list, depending on whether we’ve usedpartialLeft or partialRight.

Its handy to introduce the following bindings:

var p_ = partialRight;
var _p = partialLeft;

I hope these are relatively mnemonic (Javascript unfortunately isn’t an ideal environment for very short, expressive names).

We can get a surprising amount of mileage out of these ingredients already. For instance, a function to remove break tags from a string and replace them with newlines (sort of contrived):

var remBreaks = r(p_(split,'<br>'),p_(join,'\n'));

compared to

function remBreaks(s){
   return s.split('<br>').join('\n');

(NB. If split and join are written as curried functions, as they are in puff, the above is a little shorter:

var remBreaks = r(split('<br>'),join('\n'));

Providing a meaningful default currying (which args should be applied first) is a little tricky, though.)

Going Further

The above example demonstrates that we can do some handy work with r as long as it involves simply transforming a single value through a set of functions. What may not be obvious here is that a determined programmer can denote any computation whatsoever this way, even if multiple values might need to be kept around and transformed.

Consider the function which I will call augment or au:

/** given a function f and a additional functions gs
 *  return a new function h which applies each g
 *  to its single argument and then applies f to the 
 *  resulting list of values 
function augment(f /*... gs*/){
    var gs =, 1, arguments.length);
    var out = function(a){
    return f.apply(null,{
        return g(a);
    out.toString = function(){
    return "augment("+f.toString()+","", ")+")";
    return out;

And a nice default currying of index:

function index(o, ix){
   if(typeof ix === "undefined"){
     var realIx = o;
     return function(o){
       return o[realIx];
   } else {
     return o[ix];
var ix = index;


var fullName = augment(join(' '), ix('first'), ix('last'));

Such that:

fullName({first:'Ronald',last:'Reagan'}) -> "Ronald Reagan"

What have we just accomplished? We’ve demonstrated that we can take an arbitrary function of any arity and automatically transform it into a function which reads its input arguments from a single object. The sort of single object which we might be passing from function to function via r.

Putting it Together

To go further we need to add just one more utility function: cleave

function cleave(v /*... fs*/){
    var fs =, 1, arguments.length);
    return f(v);

and the shortcut:

function cl_(){
  return function(f){
    return cleave.apply(null, [f].concat(,0,arguments.length)));

(This is just cleave curried on the right, eg in puff: c_(cleave).)

// replaceMiddleName name newMiddleName -> transformedName
var replaceMiddleName = r(args(2),
                          cl_(r(first, split(' ')), second),
                          cl_(r(first,first),second,r(first, third)),
                          au(join(' '), first, second, third));

Let’s go nuts with some of the extra utility functions in puff:

var replaceMiddleName = f(2,
                          cl_(r(n0,split(' ')),n1),
                          cl_(n00, n1, n02),
                          join(' '));

Puff lets you do all of this insanity quite easily. There are browser and nodejs builds. You use it by saying, in node:


Which pollutes the global namespace with the puff functions. Since terseness is the rule I can’t imagine why you’d do otherwise, but you can also say:

var puff = require('puff');

In the browser, add:

<script src="./build/puff-browser.js"></script>

To your head. This will define a global puff object, which you can then use directly or say:


Spectral Clustering in J

With Examples from Quantitative Neuroscience

In my never ending quest to understand hard, but useful stuff, I’ve been learning the Programming Language J. J is an array oriented language in the APL family. Matlab is a related language, but its APL-ness is quite attenuated compared to J. I know Matlab already, having used it quote extensively as a research scientist, so J is not utterly alien, but unlike Matlab, which presents a recognizable syntactic face to the user, J retains an APL-like syntax.

As you will see, this presents challenges. The best way to learn to use a tool is to use it, so I thought I would take the opportunity to investigate a clustering technique which would have been useful to me as a grad student (had I known about it): Spectral Clustering.

Spectral Clustering1 refers to a family of clustering approaches which operate on the assumption that local information around data points is more useful for the purposes of assigning those points clusters than is their global arrangement. Typical clustering techniques, like k-means, make the assumption “points in a cluster will be near to eachother.” Spectral clustering relaxes that assumption to “points in a cluster will be near to other points in their cluster.”

We will develop the spectral clustering algorithm and then apply it to a series of more interesting data sets.

Part 1: 1d Gaussian Clusters

We are going to work on exploring spectral clustering. In order to do that we need something to cluster. Our first data set will be trivial: made by combining two sets drawn from different normal distributions2.

load 'utils.ijs'
load 'km.ijs'
require 'graphics/plot'
require 'numeric'
require 'viewmat'
require 'math/lapack'
require 'math/lapack/geev'

clusterSize1 =: 100
e1d1 =: (0 0.01) gausian clusterSize1
e1d2 =: (13 0.01) gausian clusterSize1

e1labeled =:  (((clusterSize1$0),(clusterSize1$1))) ,: (e1d1, e1d2)
e1data =: 1 { e1labeled
e1labels =: 0 { e1labeled

NB. show the histogram of the data
NB. (_6 6 200) histPlot e1data

The commented line produces the following plot:

Example One Data Histogram

This kind of data is trivial to cluster with an algorithm like k-means3.

kmClustering  =: 2 km (((# , 1:) e1data) $ e1data)

+/ (kmClustering = e1labels)

Should give ~ 200, unless something unusual happens.

As I suggested in the introductory section, the magic of spectral clustering is that it works on a representation of data which favors local information about the relationships between objects. Hence we will need to create some sort of connectivity graph for our data points. There are lots of approaches, but the upshot is that i,j in our connectivity matrix must tell us whether it is likely that point i and point j are in the same cluster.

For this simple example, we will just use the Euclidean distance between points and a threshold: anything lower than the threshold is connected and anything higher is not.

NB. d is the distance between two vectors of equal dimension
d =: %:@:+/@:*:@:([ - ])

NB. ds is the table of all pairwise distances where each argument is a
NB. list of vectors.
ds =: d"(1 1)/

NB. convert a list of points to a list of 1d vectors
l2v =: (#@:],1:) $ ]

NB. ds for lists of points
ds1d =: l2v@:] ds l2v@:]

NB. the table of all distances of x
dof =: ] ds ]

NB. the same except for a list of points
dof1d =: ] ds1d ] 

connections1d =: [ <~ dof1d@:]

e1connections =: 1.0 connections1d e1data

NB. viewmat e1connections

The connections matrix looks like:

Example 1 connections matrix

Spectral clustering operates on a Matrix called the Graph Laplacian which is related to the connectivity matrix. To get the laplacian matrix L we subtract the connectivity matrix from something called the degree matrix. Explained clearly but without interpretation, the degree matrix is just the sum of all connections for a given vertex in our graph, placed on the diagonal.

calcDegreeMat =:  (eye@:#) * +/"1 

The degree matrix just tells you how connected a given vertex is. To get the Laplacian Matrix you subtract the connectivity matrix from the degree matrix. This is conveniently defined in J:

calcLap =: calcDegreeMat - ]

For our first data set, this Laplacian looks like:

Example 1 Laplacian Matrix

Brief meditation will explain why the components of this matrix representing inter-object connections are so faint compared to those on the diagonal: the rank of each vertex in the graph is always larger than its connections, being the sum thereof.

Now that we have the Laplacian I will perform:

Spectral Clustering

The paper A Tutorial Introduction to Spectral Clustering, describes a family of spectral clustering strategies. We will implement just the first.

It involves the steps:

  1. Find the graph Laplacian for your system
  2. Find its eigenvalues and eigenvectors
  3. Sort the latter by the former, in ascending order
  4. Arrange the vectors as columns of a matrix
  5. truncate that matrix to a small number of columns, corresponding to the smaller eigenvalues.
  6. treat the rows of the resulting matrix as vectors.
  7. perform clustering, say k-means, on those vectors.

We can denote these succinctly in J:

eigensystem =: monad define 
  raw =. geev_jlapack_ y
  ii =. /:@:>@:(1&{) raw
  lvecs =. |: (ii { |: (0 unboxFrom raw))
  vals =. ii { (1 unboxFrom raw)
  rvecs =. |: (ii { :| (2 unboxFrom raw))
  lvecs ; vals ; rvecs  

eigenvectors =: >@:(0&{@:eigensystem)
eigenvalues =: >@:(1&{@:eigensystem)

takeEigenvectors =: |:@([ {. |:@:])

e1clustering =: 2 km takeEigenvectors eigenvectors e1Lap

(+/ e1clustering = e1labels) NB. should be about 200

Example 1 Clustered

A boring, if resounding, success.

Example 2: Concentric Clusters

Consider the following data:

rTheta =: (([ * cos@:]) , ([ * sin@:]) )"(0 0)
e2clusterSize =: 100
e2d1 =: ((1,0.1) gausian e2clusterSize) rTheta ((0,2p1) uniform     e2clusterSize)
e2d2 =: ((4,0.1) gausian (2*e2clusterSize)) rTheta ((0,2p1) uniform     (2*e2clusterSize))
e2data =: e2d1, e2d2

Concentric Clusters

This is the sort of data set which is pathological for clustering procedures which work in the original coordinates of the data. Even traditional dimension reducing techniques like principal component analysis will fail to improve the performance (in fact, the principal components of this data set are analogous to x and y, modulo a rotation).

We could, of course, realize by inspection that this problem reduces to the first example if the angular component is thrown away and we cluster on the radial component only.

However, in general, we are working with data in n dimensions and such simplifications may not be apparent to us. Hence, we will proceed with this example set.

Again, we produce the connectivity matrix by simple ci,j = di,j < epsilon.

e2connections =: 1 connections e2data 
e2Lap =: calcLap e2connections
e2ids =:  2 km 2 takeEigenvectors eigenvectors e2Lap

e2 connectivity matrix

As we physicists like to say: now we just turn the crank:

e2Lap =: calcLap e2connections
e2ids =:  2 km 2 takeEigenvectors eigenvectors e2Lap

And we can plot these points labeled by ID:

e2 data clustered

Pretty neat - we have successfully clustered a data set which traditional techniques could not have dealt without, and without manually figuring out a transform the simplified the data set.

Example 3: Neural Spike Trains

My interest in spectral clustering goes back to my graduate research, which involved, among other things, the automatic clustering of neural data. Neurons, in animals like us, anyway, communicate with one another using brief, large depolarizations of the voltage across their cell membranes. Neuroscientists call these events spikes and they literally look like large (~80 mV) spikes of voltage in the recording of the voltage across a cell membrane. These spikes travel down the neuron's axon and are transmitted through synapses to other neurons. There are other forms of communication between neurons, but it is generally assumed that spikes underly much of the computations that neural networks perform.

Spikes are short (1-2 ms) and firing rates are generally low on this time scale, with a typical inter-spike period of 10-100 ms (depending a great deal on the inputs to the neuron). In order to understand how neurons work, neuroscientist repeat a stimulus over and over again and record the spike times relative to the beginning of the experiment.

This results in a data set like:

33.0  55.2  80.0
30.0  81.1
55.2  85.9

Where rows are trials and each number is a spike time for that trial. In J we can read this data as padded arrays:

spikes =: asArray 0 : 0
33.0  55.2  80.0
30.0  81.1
55.2  85.9

And J will pad it out with zeros.

 33   55.2   80
 30   81.1   0
 55.2 85.9   0

Under certain circumstances, during such experiments, neurons generate a small number of distinct spiking patterns. Assuming that these patterns may be of interest, much of my research focused on extracting them automatically. That is: given a data set of spike trains, extract the clusters of similar trains.

Unfortunately, spike trains are not vectors and so there is no convenient euclidean distance measure one can easily apply to them. Luckily Victor and Purpura (1996) developed an edit-distance metric on spike trains by analogy to similar metrics other, non-vector, data (like gene sequences).

The Victor-Purpura Metric defines the distance between two spike trains as the minimum cost of distorting one spike train into the other using just two operations: sliding spikes in time to line up or deleting/adding spikes. The cost of sliding spikes in time is has the parameterized cost q*dt where dt is the time difference, and the cost of adding or removing a spike is one.

The algorithm to find the minimum such distortion can be easily implemented:

vpNaive =: adverb define 
  'q' =. m
  's1' =. nonzeros x 
  's2' =. nonzeros y
  'n1' =. #s1
  'n2' =. #s2 
  if. n1 = 0 do.
  elseif. n2 = 0 do.
  elseif. 1 do.
    'h1' =. {. s1
    'h2' =. {. s2
    'movecost' =. q * abs (h1-h2)
    if. movecost < 1.0 do.
      'restcost' =. ((}. s1) (q vpNaive) (}. s2))
      restcost + movecost 
      'restcost' =. ((}. s1) (q vpNaive) (}. s2))
      restcost + 1

although much more efficient implementations are possible. For our purposes, this will suffice.

With this definition, we can produce the similarity matrix for a given q using this J adverb:

spikeDistances =: adverb define   
  y ((m vpNaive)"(1 1) /) y

The choice of q is an important issue which we will neglect for now. See my doctoral thesis for insight into this question. The one sentence upshot is that q should generally be chosen to maximize the information content in the resulting similarity matrix.

Now let's create an example data set consisting of two distinct firing patterns. A firing pattern consists of a mean time for each "event" in the pattern, plus that events "reliability," which is the probability that a spike actually appears in an event. An event may be localized at a very specific time, but spikes may only sometimes appear on a given trial in that event.


NB. This produces a fake spike data set of Y trials with events
NB. specified by means, stds, and reliabilities, passed in as a
NB. rectangular matrix.
makeSpikes =: dyad define 
  'means stds reliabilities' =. x
  'n' =. y

  spikes =. (n,#means) $ means 
  spikes =. spikes + ((n,#stds) $ stds)*(rnorm (n,#stds))
  spikes =. spikes * ((runif (n,#reliabilities))<((n,#reliabilities) $   reliabilities))
  nonzeros"1 spikes

This function takes a description of the pattern, like:

e3pattern1 =: asArray 0 : 0 
  10 20 30
  1  1.5  1.5
  0.9 0.9 0.6

e3pattern2 =: asArray 0 : 0
  15 40
  1  1
  0.8 0.8

And returns a data set of N trials:

NB. Now we use makeSpikes to generate the actual data sets.
e3spikes1 =: e3pattern1 makeSpikes 50
e3spikes2 =: e3pattern2 makeSpikes 50

NB. and the utility `concatSpikes` to combine them.  This just makes
NB. sure the zero padding works.
e3spikes =: e3spikes1 concatSpikes e3spikes2

We can plot this data in the traditional fashion, as a "rastergram," where each spike is plotted with its time as the x-coordinate and its trial as its y-coordinate:

rastergram e3spikes

Example 3 Rastergram

It is interesting to note how difficult it can be to see these patterns by visual inspection if the trials are not sorted by pattern:

rastergram shuffle e3spikes

Example 3 Rastergram shuffled

Now let's cluster them:

NB. We produce the distance matrix
e3distances =: (0.8 spikeDistances) e3spikes

We use thresholding to generate our connectivity matrix:

e3connections =: e3distances < 2

And we turn the crank:

e3Lap =: calcLap e3connections
e3evs =: eigenvectors e3Lap
e3clustering =: 2 km (2 takeEigenvectors e3evs)

Finally, we can color code our rastergram by clustering:

pd 'reset'
pd 'type dot'
pd 'color blue'
pd 'title trial clustering'
pd spikesForPlotting (e3clustering = 0) # e3spikes

pd 'color red'
pd (#((e3clustering=0)#e3spikes)) spikesForPlotting ((e3clustering =   1) # e3spikes)

pd 'xcaption time'
pd 'ycaption trial'

pd 'show'

Example 3 Color Coded Rastergram



A Note On Simpler Clustering Algorithms

One can appreciate a family of clustering approaches by starting with Expectation Maximization: a two step iterative process which attempts to find the parameters of a probability distribution assumed to underly a given data set. When using this approach for clustering, one assumes that the data has hidden or missing dimensions: in particular, a missing label assigning each point to a cluster. Given some guess at those labels, you can calculate the parameters of the distribution. Using this estimate, you can modify the guess about the hidden labels. Repeat until convergence.

In practice, simplification over the general case can be found by assuming that the distribution is a "Gaussian Mixture Model," just a weighted combination of n-dimensional Gaussian distributions. Further simplifications manifest by applying ad-hoc measures of cluster "belonging" based on just the mean of each cluster (see fuzzy clustering) and finally just by assuming that objects belong to the cluster to which they are closest (as in k-means).

In all of these simplifications clustering refers to some sort of centroid of the clusters as a way of estimating which object belongs to which clusters. But some data sets have clusters which are concentric. For instance, if one cluster is points between 10 and 20 units from the origin and another is points between 50 and 60 units from the origin.

In this case, all such attempts will fail unless the space is transformed. Spectral clustering transforms the space by considering topological information: that is, what objects are near or connected to what other objects, and throwing away large scale information.

In the concentric example above, the thing that all points in each cluster have in common is they are near other points in their cluster. When our distributions have the property that the distance between some two points in cluster A may be greater than the distance between some point in A and some point in B, the spectral clustering gives a nice mechanism for identifying the clusters.



Throughout the above I refer to the occasional utility. Here are the definitions.

load 'stats/distribs'
load 'numeric'
load 'graphics/plot'

gausian =: dyad define
 'm s' =. x
 m + (s*(rnorm y))

uniform =: dyad define 
 'mn mx' =. x
 mn + ((mx-mn)*(runif y))

shuffle =: (#@:] ? #@:]) { ] 

hist =: dyad define 
  'min max steps' =. x
  'data' =. ((y<min) * (y>max)) # y
  'quantized' =. >. (steps * ((data - min) % (max-min)))
  'indexes' =. i. steps
  'stepSize' =. (max-min)%steps
  1 -~ ((indexes, quantized) #/. (indexes, quantized))

curtail =: }: 

histPlot =: dyad define 
  h =. x hist y 
  'min max nsteps' =. x 
  halfStep =. (max-min)%(2*nsteps)
  'plotXs' =. halfStep + steps (min, max, nsteps)
  'xcaption position; ycaption count; title example 1 data' plot (curtail plotXs) ; h 

unboxFrom =: >@:([ { ])

dot =: +/ . *

dir =: monad define 
  list (4 !: 1) 0 1 2 3

xyForPlotting =: (0&{ ; 1&{)@|:

ten =: monad define 



abs =: (+ ` - @. (< & 0))"0

nonzeros =: ((>&0@:abs@:]) # ])

asArray =:  ". ;. _2

concatSpikes =: dyad define
  maxSpikes =. (1 { ($ x)) >. (1{($y))
  pad =. (maxSpikes&{.)"1
  (pad x), (pad y)

spikeIndexes =: |:@:(( 1&{@:$@:], #@:[) $ (i.@:#@:]) )
spikesForPlotting =: verb define 
  0 spikesForPlotting y
  spikes =. ,y
  ii =:  ,spikeIndexes y
  nzii =. -. (spikes = 0)
  (nzii # spikes) ; (x+(nzii # ii))

rastergram =: monad define
  pd 'reset'
  pd 'type dot'
  pd 'color red'
  pd 'title rastergram'
  pd 'xcaption time'
  pd 'ycaption trial number'
  pd spikesForPlotting y
  pd 'show'

offsetY =: ] + ((#@:] , 1&{@:$@:]) $ (0: , [))

eye =: (],]) $ ((1: + ]) take 1:)

asOperator =: adverb define 
  m dot y



An implementation of k-means in J is comically terse, even in an exploded, commented form:

NB. This file contains an implementation of k-means, a clustering
NB. algorithm which takes a set of vectors and a number of clusters
NB. and assigns each vector to a cluster.
NB. Example Data
NB. v1 =: ?  10 2 $ 10 
NB. v2 =: ?  3 2 $ 10
NB. v3 =: ? 100 2 $ 50
NB. trivialV =: 2 2 $ 10
NB. Examples
NB. 3 km v3

NB. produce a permutation of y.
permutation =: (#@[ ? #@[)

NB. shuffle list
NB. shuffle the list
shuffle =: {~ permutation

NB. nClusters drawIds vectors
NB. given a list of vectors and a number of clusters
NB. assign each vector to a cluster randomly
NB. with the constraint that all clusters are
NB. roughly equal in size.
drawIds =: shuffle@(#@] $ i.@[)

NB. distance between two vectors as lists
vd =: +/@:(*:@-)

NB. Give the distance between all vectors in x and y
distances =: (vd"1 1)/

NB. return the list of indices of the minimum values of the rows.
minI =: i. <./

NB. Given x a list of centers and y a list of vectors,
NB. return the labeling of the vectors to each center.
reId =: ((minI"1)@distances)~

NB. canonicalize labels 
NB. given a list of labels, re-label the labels so
NB. that 0 appears first, 1 second, etc
canonicalize =: ] { /:@~.

NB. (minI"1 v1 distances v2 ) 

vsum =: +/"1@|:
vm =: (# %~ (+/"1@|:))

calcCenters =: (/:@~.@[) { (vm/.)

NB. ids kmOnce vectors This performs the heavy lifting for K-means.
NB. Given a list of integral ids, one for each vector in the N x D
NB. vectors, this verb calculates the averages of the groups implied
NB. by those labels and re-assigns each vector to one of those
NB. averages, whichever it is closest to.

kmOnce =: dyad define
  ids =. x 
  vectors =. y
  centers =. ids calcCenters vectors
  centers reId vectors

NB. Use the power adverb to iterate kmOnce until the labels converge
kmConverge =: ((kmOnce~)^:_)~

NB. nClusters km dataset k means.  Given a cluster count on the left
NB. and a vector data set on the right, return an array of IDs
NB. assigning each vector to a cluster.
NB. One can use `calcCenters` to recover the cluster centers.

km =: canonicalize@((drawIds) kmConverge ])

NB. Example Data
NB. v1 =: ?  10 2 $ 10 
NB. v2 =: ?  3 2 $ 10
NB. v3 =: ? 100 2 $ 50
NB. trivialV =: 2 2 $ 10

NB. Examples
NB. 3 km v3

The Mythical Man Moth

As if rot and verdure weren’t enough,
I then became the mythical man moth,
and I fluttered around, that summer, not moon,
but incandescent after incandescent,
each honey-colored orb in the warm night,
each thrumming with the wings of my cohorts,
all in paroxysm for those faux-moons,
encased in glass, or perspex, and humming
with their own dead, but electric, heartbeat.

Elaborations on “You Aren’t Gonna Need It”

The Cunningham & Cunningham Wiki is a wonderful place to get lost in, and it is so (chaotically) packed with useful programming lore that you are bound to come out of a dive a bit more enlightened about what it is programmers actually do.

One of my favorite pages is You Aren’t Gonna Need It from which I pull the following quotation:

Always implement things when you actually need them, never when you just foresee that you need them.

The justification for this is pretty straightforward: adding things you don’t need takes time and energy, plus generates more code, which means more potential bugs and cognitive load for future development. Since your job is to deliver software that actually does something which you presumably have at least a provisional understanding of, speculative development is an obvious waste of time.

To this basic justification I add only the following small elaboration: If you don’t need it now, you probably don’t understand it anyway. Anything you implement speculatively is very likely to be wrong as well as useless.

Why Software Is Hard

There are a lot of reasons software engineering is hard. Probably the primary reason it is hard is that we do not yet have a complete understanding of why it is so hard in the first place. Richard P Gabriel, a software philosopher and progenitor of the Worse is Better meme observes, probably correctly, that one reason for this fundamental ignorance is that software engineering is a comparative tyro among engineering disciplines: it is both extremely young (beginning only in 1945, approximately) and subject to radical change. A developer in 1945 would find the contemporary development environment utterly alien and vice versa, a state of affairs not afflicting, for instance, stone masons, who have, arguably, thousands of years of more or less dependable tradition to inform their work.

With characteristic insight, Dr Gabriel also observes that software engineering is also difficult because it isn’t physically constrained1, which means that humans, the product of 3.5 billion years of evolution in a physical environment, but not a computational one, have very little experience upon which to rely as they build objects in the space of computable functions.

Suffer me a brief, digressive analogy: Hyper Rogue III is a game which takes place in a two-dimensional hyperbolic plane. One implication of such a space is that it is very unlikely that a wanderer will ever return to a particular position unless she almost exactly follows her own trail of bread crumbs. Exploring the space of computable functions is similarly dangerous to wanderers, except more so: we are not well equipped to even identify the hills, valleys, walls and contours of this space.

Hence my elaboration: the wanderer in the landscape of programming is very likely to lose his way. He has neither physical constraints nor well developed intuition to guide him. And despite the existence of tools like git, which promise us the ability to backtrack and refactor with some confidence, software inevitably ossifies as it grows.

Hence my elaboration: you don’t build things you don’t need because, if you don’t need them, you probably don’t really understand what they should be. If you don’t understand what they should be, you’ll probably wander off base as you build them, and you probably won’t even notice that you have wandered off until something concrete impinges upon that code. When that happens, you might find that refactoring to solve the problem you have now is prohibitively difficult, because that feature you thought you would need has subtly impinged on other parts of your code.

Explicit, concrete requirements are the closest thing you have to the physical constraints which make stone masonry a more reliable discipline than software engineering, and nothing but rote experience will ever give you the physical intuition that stone masons can rely on.

So don’t wander off: You Aren’t Gonna Need It anyway. At least until you do.

1: Well, the relationship between the physical constraints on software and the software is, at any rate, not entirely trivial or transparent.

Watch as I Liveblog “Death of the Corpse Wizard” development.

I’ve always wanted to make videogames. I’ve been programming in one way or another for nearly half my life, so you’d think I would have created at least one so far, but usually my scope gets too big and I end grinding to a halt or getting distracted by something else. Today I’m trying something different. For the rest of the day I’ll be liveblogging, at this post, my development of a small game called `Death of the Corpse Wizard.` I’ll be using HTML5/Canvas, Javascript, an Oryx Tileset I bought, and a game design I thought of in the shower yesterday. Here is the inaugural image:

Hello World

Hello World


Very briefly the game intends to be an “Arena Roguelike”.  Your character sits in the center of the screen and enemies approach from all sides.  If an enemy bumps into you, you lose one vitality.  If you bump into an enemy you take one vitality from it (most monsters have only one vitality).  You may also choose, on your turn, to use a vitality to build or reinforce a wall, which costs one vitality.   Monsters can attack walls and when the wall’s vitality is reduced to zero, it disappears.  The goal of the game is to survive as long as possible. 

For now, those are the only rules.

Watch this space for updates.

Update 11:30 am, Strasbourg Time

Tile Sheet Support

Tile Sheet Support

I’ve finished support for Tile Sheets and loaded the Oryx sheet.  Named a few tiles.  

I’m working on the visual aspects first because they provide pleasing feedback.  I will now start the game engine.  

161 LOC so far, not counting libraries.

Update 12:53 pm, Strasbourg Time




I’ve added the entities and systems required for drawing the player, set up the user controls, and implemented Tweening so that the player sprite moves smoothly.  Since this is now “playable”, I also uploaded the game here, so you can play it.

Some tidbits about the development process: I’m using Kran, an awesome, lightweight entity-component system which deserves more attention.  It is the same system I used to generate my clocks.

3:30 pm, Strasbourg Time

Timers, Spawners

Timers, Spawners

I have added turn-based timer logic, spawners, and the ability to create walls.  Plus an HUD that shows the player’s vitality.

You can play the most recent version of the game here.

5:12 pm, Strasbourg Time

The game is now playable and you can even lose, sort of: monsters have AI and can attack you, you can kill them, and they can reduce your health to zero.



6:04 pm, Strasbourg Time

Death Screen

Death Screen

Added a death screen and death condition!  It will probably be the start screen too, at some point.


6:53 pm, Strasbourg Time



Death of the Corpse Wizard is now entirely playable: it has a goal, challenged, and failure conditions.  I’m going to declare my first ever single-day Game Jam a success!