Why your REPL experience sucks

If you’ve been programming in Clojure for any amount of time, especially if you’re working with webservices, Discord bots, or any other long-running process that you may want to update the behavior of without restarting it, you’ve probably had an experience where you re-evaluated some code, tried to run it again, and it acts like it did before. You can evaluate that code as many times as you want to and it just never runs the updated code.

Early on you may have just restarted your REPL every time this happened, and that was frustrating, especially if it takes 20-30 seconds to start up every time! Eventually, you probably figured out that there was something less than restarting the whole REPL, whether that was restarting your Component system map, shutting down your Ring server and starting it back up, or anything else that results in bouncing a long-running process, that would allow you to run your new code, and that was slightly less annoying.

Then you learned about the var quote, and how it can solve this problem. You put that funny little #' right where the Stack Overflow question said you should, and it worked! But not every time. Some things reloaded perfectly, and other things didn’t. Objectively this is better than when you had to bounce your service every time, but it feels way worse, because you know that it could all just reload the way you want it to, but it’s completely unclear when and why it happens.

In this article, I will do my best to demystify the when and why of the var quote when writing reloadable code for long-running processes.

1. The Setup

In order to illustrate the problem, let’s write a tiny, fictional webservice that just hosts static HTML files out of a directory. We’ll assume another namespace, myapp.util, has been written to handle this basic stuff and now we’re writing the entrypoint of the application in myapp.core.

(ns myapp.core
  (:require
   [myapp.util :as util]
   [reitit.ring :as ring]
   [ring.adapter.jetty :refer [run-jetty]])
  (:gen-class))

(defn file-handler
  [file-path]
  (if-let [html-body (util/read-html file-path)]
    {:status 200
     :body html-body}
    {:status 404
     :body (util/read-html "not-found.html")}))

(defn wrap-path-param
  [handler param]
  (fn [request]
    (handler (get-in request [:path-params param]))))

(def router
  (ring/ring-handler
   (ring/router
    [["/not-found" {:get #(assoc (file-handler "not-found.html") :status 404)}]
     ["/:file" {:get (wrap-path-param file-handler :file)}]]
    {:conflicts nil})))

(def server (atom nil))

(defn stop-server!
  []
  (swap! server #(do (.stop %) nil)))

(defn -main
  []
  (reset! server (run-jetty router {:port 8080 :join? false})))

For this application we’re using a few pretty common libraries, Ring’s jetty adapter for the HTTP server, reitit for routing, and reitit-ring to make it easy to put the two together.

What this does is pretty simple, but I’m going to go over it anyway to ensure everybody’s on the same page.

To start with the -main function just starts up an HTTP server that uses router as the handler function for the request. router itself has just two routes, /not-found will just get not-found.html and return a 404 status code, and /:file will take any other path provided, read it in as an HTML file, or returns a 404 if it failed to read any HTML for whatever reason.

There’s also the stop-server! function paired with the server atom which is used to provide a little REPL convenience to shut things down as needed.

This is the start of our application, but the application isn’t done yet. Now that we have a starting point, let’s fire up a REPL, evaluate this code, start the server, and make some changes.

2. The Problem

This webservice so far has hosted only static HTML files for articles we host, but you’ve just had a brilliant idea to add user profiles so that you can put comments on your articles. That means we’ll have to have a new URL scheme to allow files and users to not share the same namespace. So let’s put the files into a /articles prefix to get ready for the /users prefix we’ll have to add later.

This is quite an easy change, let’s just update our router!

(def router
  (ring/ring-handler
   (ring/router
    [["/not-found" {:get #(assoc (file-handler "not-found.html") :status 404)}]
     ["/articles/:file" {:get (wrap-path-param file-handler :file)}]])))

And look at that, we don’t even have to have allow conflicts anymore! This is a real improvement. Let’s re-evaluate that code, and then make a request to the new URL in our browser.

It gave us an internal server error, even though we’ve re-evaluated the router!

3. The Solution

Well unfortunately, in order to progress from here, we need to stop the server using that handy stop-server! function we made earlier. Fortunately, once we’ve done that, making the router reloadable is easy. We just add a var quote to the router when we start the server.

(defn -main
  []
  (reset! server (run-jetty #'router {:port 8080 :join? false})))

This is a very simple change, and once we’ve done it we can change the router to our hearts’ content and it will reflect the changes as soon as we re-evaluate it.

4. Why Though?

If you’re new to Clojure, or even if you’re not but haven’t had a chance to carefully study its evaluation model, this may seem a bit mysterious. Why should putting two little characters in front of the function name suddenly mean that the HTTP server will be updated when we re-evaluate the definition of the router?

In order to answer this question, we’re going to go back to the basics of Clojure, and dive a little deeper. When we evaluate the form (+ 2 2), what happens? In Clojure (and indeed all Lisps I am aware of), lists evaluate to some kind of operation call. It may be a function, it may be a macro, it may be a special form. In order to determine this, first Clojure evaluates the first element of the list, in this case the symbol +.

When Clojure evaluates a symbol in function position like this it first checks if it is a special form, like if, let*, or similar. If it is, it allows that special form to take over. If not, it continues to check if it’s a local variable and uses that value as a function. If it doesn’t refer to a local variable, then it looks up that symbol in the current namespace to determine what var it refers to. Once it finds the var that the symbol refers to, it dereferences that var to get the value, before checking to see if the value is a function or macro, and then saves the function object in either case to complete the evaluation, and then if it’s a function it moves on to evaluate the arguments to the call.

In this simple case of (+ 2 2) all the hard work is done because numbers just evaluate to themselves, and then the saved function object is called and we get the result 4.

This may seem like quite the digression, but let’s now turn our attention to the offending function call. (run-jetty router {:port 8080 :join? false}) is evaluated in exactly the same manner as the addition was, but something slightly more interesting happens when it evaluates the first argument.

When Clojure evaluates the symbol router here, it goes through almost the exact same process as it did for the symbol +, but without checking if it’s a special form. It looks for the var in the current namespace that maps to the symbol router, dereferences it, and saves the function object it retrieves as the first argument before evaluating the second argument, and then calling the function that run-jetty evaluated to.

run-jetty in turn takes that function object and starts up its server. How it does this is more or less irrelevant, but somewhere inside it ends up calling the function object you passed with the request object.

Now imagine we just evaluated some changes to router. Maybe we added that /users/:id route to be able to view a user profile. This constructs a brand new function object that will handle the new route, and then takes the existing var associated with the symbol router and updates it to point at this new function object.

Now think about what happens with run-jetty. It already has a function object that was passed to it, and it doesn’t know about the var associated with the symbol router anymore. There’s no way that it could know that there’s a new function object it should be using instead. If only there was a way that we could pass a function that would look up the current value of the var before calling it with its arguments!

As it turns out, the Clojure devs foresaw this need, and vars implement the IFn interface doing exactly that! So if we passed a var to run-jetty, every time it tried to call the var as a function it would first dereference itself, assume the contained object is a function object, and then call it with the same arguments the var was called with.

Now that we know vars can do this, we just need to know how to pass the var object itself to run-jetty instead of the function object. This is what the #' syntax means in Clojure, and it’s equivalent to calling the var special form on a symbol.

5. Gaining Intuition

Now that we’ve used the var quote (the name for the #' syntax) on the router, we should be home free, right? Not quite. Let’s say that we need to modify file-handler. We’ve determined we’re vulnerable to directory traversal attacks because we’re not validating the path before we read it from disk. Somebody else has already made a handy function to handle these cases, called util/safe-path?, and it returns a truthy value if it’s safe to read the given path as html.

(defn file-handler
  [file-path]
  (if (util/safe-path? file-path)
    (if-let [html-body (util/read-html file-path)]
      {:status 200
       :body html-body}
      {:status 404
       :body (util/read-html "not-found.html")})
    {:status 400
     :body (util/read-html "invalid-path.html")}))

If the body is a safe path, we happily try to read it, returning a 404 status if it’s not found. If it’s not a safe path, we return a 400 invalid request.

Once we evaluate this function we test our routes again and find some very strange behavior. If we make a request to /articles/../admin-ui/index.html, it happily returns this file! This is very bad. Let’s check the other routes that use file-handler.

The person who wrote util/safe-path? did some thinking about 404 errors and similar and decided that not-found.html isn’t a safe path because it wouldn’t make sense to return a 200 status code when you’re trying to get the 404 error page.

So now we make a request to /not-found… and it returns a 400 with the text from invalid-path.html! You should really talk to that coworker who thinks that the util/safe-path? code should worry about response code semantics despite it not being a part of the request handler functions.

Questionable choices about path validation aside, why does one route have the updated code for file-handler and the other doesn’t? Neither one of them is using the var quote, so it seems like both of them should be using the old code if one is.

Let’s take another look at our router definition and think about evaluation model again.

(def router
  (ring/ring-handler
   (ring/router
    [["/not-found" {:get #(assoc (file-handler "not-found.html") :status 404)}]
     ["/articles/:file" {:get (wrap-path-param file-handler :file)}]])))

When we evaluate the arguments to ring/router it evaluates the vector, which in turn evaluates each of its elements before returning itself. This happens recursively with each of the routes. The strings return themselves unaltered, the maps evaluate all of their keys and values before returning themselves. The keywords return themselves unaltered, and now we get to the interesting bit: the values.

Let’s start with the value for the article endpoint. It’s a list, so it evaluates to a function call. It calls the wrap-path-param function with the result of evaluating each of the arguments. The first argument is file-handler, and that works just like it did when we passed router to ring-jetty. It looks up the var, gets the function object out, and uses that as the argument to wrap-path-param. If we use a var quote on file-handler, it will use the new function object, the same way the #'router did with run-jetty.

So that explains why the articles endpoint used the old code, but why did the /not-found endpoint use the new code? The value in the map is a function literal, and here we find our answer. Function literals don’t evaluate their bodies when they are evaluated, they return a function object. That function object, when called, will evaluate its body. So when the router is called and the /not-found endpoint reached, it calls this function object, and only then does the symbol file-handler get evaluated, its associated var dereferenced, and the returned function object called. And because every time the handler function object is called the body is evaluated again, that means that file-handler is looked up each time, getting the new value.

This means that we have to pay attention not just to references to different functions, but we have to pay attention to when those references are evaluated, and that will tell us whether or not we need to use a var quote.

6. When This Applies

Something that you might not have noticed just yet but that may seem obvious when I point it out is that every circumstance where we see stale code being used is the result of a function being used as a value in a context that is only evaluated once. The file-handler function was passed as an argument inside a def for the router, the router itself was used inside the body of the -main function that you called once to start up the server and ideally would not call again.

This pattern is not coincidence. Any time code will be called repeatedly over the course of the runtime of your program or REPL session will have new code reflected the next time that code is run after the evaluation takes place. This means you don’t have to worry about this inside the bodies of most functions besides initialization functions and application loops.

This is also why many types of applications you may work on don’t suffer from reloadability problems at all, and only the types of programs I called out at the beginning of this post are affected.

In general, you will need to ask yourself when writing a piece of code how often that code will be executed. If it will be executed only a small number of times at the start of your application or during re-evaluation and holds onto function objects as we’ve seen in the examples in this article, then you will have to consider where to apply the var quote.

7. Caveats

While everything said above is approximately correct, it’s been framed in terms of the way a Lisp interpreter would work, and not in terms of how a compiler, like Clojure’s, would actually resolve this. The actual semantics should match entirely, but it’s important to know that “evaluation” in Clojure is mostly a conceptual framework that we impose on the language because it matches how Lisp interpreters work, and that the real version works slightly differently. If you’d like to read further about how these things work, you can consult the official documentation on evaluation and vars.

8. Conclusion

Congratulations on making it to the end of my first blog post! I hope you understand the difference between using a var or a function object a little better, and that you now know enough to go and make your existing software more reloadable! If you just use Clojure yourself, I wish you well and I hope to see you back to read more posts! If, however, you use Clojure as a teaching tool, especially in time-constrained environments or to complete beginners, read on to see a magical way to bypass this problem entirely, at the cost of your code becoming somewhat more mysterious.

9. A Teaching Solution

Unfortunately, the fact that you have to know so much about Clojure’s evaluation model in order for it to make sense when to use a bare symbol and when to use a var quoted symbol makes this a real tripping hazard for beginner to intermediate programmers who just want a reasonable reloading experience, and while I recommend learning this for anybody who wants to advance their Clojure knowledge, for someone just learning Clojure from scratch, it might be too much information to dump this onto them right from the beginning just to be able to experience how fun it is to program with a REPL.

In cases where it’s important to be able to work with the full power of the REPL, but it’s not reasonable to dive this deep into the evaluation model, like in an hour long coding camp or a tutorial for complete beginners with Clojure who want to write real software as they learn, it could be worthwhile to introduce a construct which allows you to use function references everywhere and simply not worry about reloadability.

For exactly this purpose, I’ve designed a macro (which you are free to copy and use as you will, consider it to be under an MIT license) which acts like defn but which will always run the latest version of the body that has been evaluated, no matter if you have var-quoted it or not.

(require '[clojure.spec.alpha :as s])

(s/def ::defreloadable-args
  (s/cat :name simple-symbol?
         :doc (s/? string?)
         :attr-map (s/? map?)
         :fn-tails (s/+ any?)))

(defmacro defreloadable
  "Defines a new function as [[defn]], but old references will refer to new versions when reloaded.

  This will construct a phantom var that's used for the lookup, so calls to
  functions defined with this macro will have an additional layer of
  indirection as compared to normal functions. This should also work in
  production environments compiled with direct linking turned on.
  I do not recommend using this macro, but it can be useful for beginners
  who are learning how to write webservers or other persistent applications
  and don't want to worry about having a bad reloadability experience.
  Instead of using this, I recommend learning about Clojure's evaluation
  model, which will allow you to have the same benefits as using this
  macro, but without any magic."
  [& args]
  (let [args (s/conform ::defreloadable-args args)]
    `(let [v# (or (when-let [fn# (binding [*ns* ~*ns*]
                                   (resolve '~(:name args)))]
                    (-> (meta fn#) ::impl))
                  (with-local-vars [v# nil] v#))]
       (alter-var-root v# (constantly (fn ~(:name args) ~@(:fn-tails args))))
       (doto (def ~(:name args) (fn [~'& args#] (apply @v# args#)))
         (alter-meta! merge (assoc (merge {:doc ~(:doc args)}
                                          ~(:attr-map args))
                                   ::impl v#))))))
(s/fdef defreloadable
  :args ::defreloadable-args)

I won’t go into detail about how the internals of this macro work, but I’d be happy to make another post about it if it’s requested.

I also, as mentioned in the docstring, do not recommend that you use this for any code that matters. For one thing it’s less performant, but the far more important thing I think is that for anyone who does understand the Clojure evaluation model as described in this article, usage of the above macro will make code more confusing, with behavior changing at unexpected times during re-evaluation of code because you can come to rely on Clojure’s normal behavior.

Author: Joshua Suskalo

Created: 2022-11-28 Mon 16:49