Functional programming without feeling stupid, part 3: Higher-order functions

Welcome to the third installment of “Functional programming without feeling stupid”! I originally started to describe my own learnings about FP in general, and Clojure in particular, and soon found myself writing a kind of Clojure tutorial or introduction. It may not be as comprehensive as others out there, and I still don’t think of it as a tutorial — it’s more like a description of a process, and the documented evolution of a tool.

I wanted to use Clojure “in anger”, and found out that I was learning new and interesting stuff quickly. I wanted to share what I’ve learned in the hope that others may find it useful.

Some of the stuff I have done and described here might not be the most optimal, but I see nothing obviously wrong with my approach. Maybe you do; if that is the case, tell me about it in the comments, or contact me otherwise. But please be nice and constructive, because…

…in Part 0 I wrote about how some people may feel put off by the air of “smarter than thou” that sometimes floats around functional programming. I’m hoping to present the subject in a friendly way, because much of the techniques are not obvious to someone (like me) conditioned with a couple of decades of imperative, object-oriented programming. Not nearly as funny as Learn You a Haskell For Great Good, and not as zany as Clojure for the Brave and True — just friendly, and hopefully lucid.

xkcd 1270: Functional
xkcd 1270: Functional. Licensed under Creative Commons Attribution-Non-Commercial License. This is a company blog, so it is kind of commercial by definition. Is that a problem?

In Part 1 we played around with the Clojure REPL, and in Part 2 we started making definitions and actually got some useful results. In this third part we’re going to take a look at Clojure functions and how to use them, and create our own — because that’s what functional programming is all about.

We already have a function that turns a map value containing an offset and a character into a string representing a line of output:

(defn character-line [pair]
  (format "%08d: U+%06X %s"
    (:offset pair) (int (:character pair))
    (character-name (:character pair))))

How do we apply that function to each character in the input string? In the imperative style you would whip up a for loop at this point (or maybe a list comprehension in Python).

Clojure does have a for macro (very much like the Python list comprehension), but what we want to achieve is easily done using a higher-order function. This is common to many things that traditionally are done using a for loop.

In functional programming, functions are values just like anything else. You can create them, store them, apply them and pass them around. This opens up a whole new world of flexibility in program construction. Particularly, you can reduce many problems that require “doing something” to every member of a collection into a mapping, where you apply a function that does the “something”. Clojure has a map function which does exactly that:

(def short-test-str "Naïve")
#'user/short-test-str
user=> (map int short-test-str)
(78 97 239 118 101)

We just applied Clojure’s int function to all the characters in the short test string. The result is a sequence of numbers representing the code points of the characters in the string. (For a comparison between for and map, see FOR vs MAP by John Lawrence Aspden.)

Can we do this with one of our functions? Yes we can:

user=> (map character-name short-test-str)
("LATIN CAPITAL LETTER N" "LATIN SMALL LETTER A" 
"LATIN SMALL LETTER I WITH DIAERESIS" "LATIN SMALL LETTER V" 
"LATIN SMALL LETTER E")

Since our character-name function returns a string, we get back a sequence of strings representing the names of the characters in short-test-str.

Clojure’s map function always returns a sequence. (If you look up the documentation using (doc map), you will see that it actually returns a lazy sequence, but we’ll happily gloss over the laziness for now, mainly because I’m also lazy.) A sequence, or seq for short, is actually an abstraction which provides us with a sequential view over some collection. It is quite widely used in Clojure programming: Clojure strings are actually sequences, and you can turn a vector into a sequence with the seq function.

Can we use the map function with our character-line function Well, that function takes a map value, so it’s not going to be that straight-forward. Also, there is a high potential for confusion here: on one hand, we’re working with the map function, but on the other, we’re working with map literals. You need to get these concepts straight in your head!

So, this probably won’t work, but let’s try it anyway:

user=> (map character-line short-test-str)
NullPointerException clojure.lang.RT.intCast (RT.java:1087)

I don’t even know what that means. Well, OK, I sort of know — since the character-line function expects a map, and we’re giving it a string (which is a sequence, really), it’s trying to pull out the offset using the keyword function :offset, and there is no way that will succeed.

But if I could create a map value for each of the characters in the string, put them in a collection, and then map the character-line function to each one of them, I would be much closer to my solution. At least I think so.

By using the Clojure map function together with an anonymous function, I can quickly generate the offset/character pairs I want. But what will the anonymous function do, and how can I define it?

Clojure’s into function creates a new collection by conjoining items from two collections. So if I have two maps — an empty map, and a map with the offset and character values — I can conjoin them like this:

user=> (into {} {:offset 0 :character \N})
{:offset 0, :character \N}

That doesn’t seem like much, but I have neglected to mention something quite important: the map function can take more than one collection. So, if I have two collections, one for the offsets and one for the characters, I can traverse them at the same time, and apply a function which takes the corresponding value from both collections.

Since we still don’t have the real offsets of the characters, let’s construct a collection of dummy offsets:

user=> (repeat (count short-test-str) 0)
(0 0 0 0 0)

This piece of code uses two simple Clojure functions, repeat and count. Whereas count simply counts the number of elements in a collection, repeat creates a new sequence (a lazy one — there’s that word again) with the specified number of given elements. So we get back as many zeros as there are characters in short-test-str. We can use this as the collection of dummy offsets since it has one item for each character.

The two collections we want to traverse at the same time are the dummy offsets and the characters of the string. Our usage of the map function would be something like:

(map something (repeat (count short-test-str) 0) 
  short-test-str)

What is that elusive “something“? It needs to be a function which takes the current items from both collections and conjoins them into a map:

#(into {} {:offset %1 :character %2})

The #() notation stands for an anonymous function, and the %1 and %2 are placeholders for the items from the two collections. When we put this together, we have:

(map #(into {} {:offset %1 :character %2}) 
  (repeat (count short-test-str) 0) short-test-str)

And if we actually execute this in the REPL, here’s what we’ll see:

user=> (map #(into {} {:offset %1 :character %2}) 
  (repeat (count short-test-str) 0) short-test-str)
({:offset 0, :character \N} {:offset 0, :character \a} 
{:offset 0, :character \ï} {:offset 0, :character \v} 
{:offset 0, :character \e})

The offsets are still zeros, but it seems to be working! What if… yes, let’s do it:

user=> (map character-line (map #(into {} {:offset %1 :character %2}) 
  (repeat (count short-test-str) 0) short-test-str))
("00000000: U+00004E LATIN CAPITAL LETTER N" 
"00000000: U+000061 LATIN SMALL LETTER A" 
"00000000: U+0000EF LATIN SMALL LETTER I WITH DIAERESIS" 
"00000000: U+000076 LATIN SMALL LETTER V" 
"00000000: U+000065 LATIN SMALL LETTER E")

Well, well — that’s what we have been looking for: mapping the character-line function to all the characters in the string! But we’re not done yet, oh no.

I bet your head is spinning from the nested functions and the parentheses. I know mine is, but turning this into a function will alleviate that condition. It is also a great opportunity to introduce Clojure’s local bindings.

Our first stab at the function could be something like this:

(defn character-lines [s]
  (map character-line (map #(into {} {:offset %1 :character %2}) 
    (repeat (count s) 0) s)))

How is that better? Well, I’ve made this piece of code reusable for any string, not just the short-test-str defined earlier. Now any string that will be passed in will be known as s inside the function. But we can do better. The dummy offsets are easy to pull out of the mess, and it will be easy to replace it with the actual offsets, if we ever get around to actually computing them… For this we can use a local binding, achieved with let:

(defn character-lines [s]
  (let [offsets (repeat (count s) 0)]
    (map character-line (map #(into {} {:offset %1 :character %2}) 
      offsets s))))

Local bindings in Clojure are vectors, containing pairs of items. The first item in the pair is the name that will be used inside the function, and the second item is whatever the name stands for. So in the local binding above, we say that we want the name offsets to refer to a collection generated using the repeat function — we have bound the name to the value locally, and that’s why it’s called a local binding.

If you throw this function at the REPL, you should be able to use it with whatever string you want:

user=> (defn character-lines [s]
#_=> (let [offsets (repeat (count s) 0)]
#_=> (map character-line (map #(into {} {:offset %1 :character %2}) 
#_=> offsets s))))
#'user/character-lines
user=> (character-lines "Yay!")
("00000000: U+000059 LATIN CAPITAL LETTER Y" 
"00000000: U+000061 LATIN SMALL LETTER A" 
"00000000: U+000079 LATIN SMALL LETTER Y" 
"00000000: U+000021 EXCLAMATION MARK")

We can make this even cleaner by introducing another local binding for the offset/character pairs. Notice that you can just redefine the function in the REPL by pasting it in. The new definition will replace the old one.

(defn character-lines [s]
  (let [offsets (repeat (count s) 0)
        pairs (map #(into {} {:offset %1 :character %2}) 
          offsets s)]
    (map character-line pairs)))

Now there are two local bindings, offsets and pairs, and the actual body of the function is reduced to (map character-line pairs).

By the way, we can make our original character-line function a little better by using a local binding. We have (:character pair) twice in there, so let’s make it once only:

(defn character-line [pair]
  (let [ch (:character pair)]
    (format "%08d: U+%06X %s" (:offset pair) (int ch) 
      (character-name ch))))

Or is that better? I’ll let you decide.

After that, all that is really missing are the offsets, but this is more than enough for one post.

UPDATE 2014-11-21: If you want to read more about Clojure’s higher-order functions or “HOFs”, try Christopher Maier’s Writing Elegant Clojure Code Using Higher-Order Functions.

Also, earlier I seem to have occasionally used a misleading term out of negligence: higher-order functions or HOFs are not “high-order” functions (those would be functions with a high order, I guess). They are indeed higher-order functions, or “functions of a higher order”.