Learning Clojure

About one year ago I wrote a multi-part tutorial on Clojure programming, describing how I wrote a small utility called ucdump (available on GitHub).

Here are links to all the parts:

However, Carin Meier’s Living Clojure is excellent in many ways. Get it from O’Reilly (we’re an affiliate):
Living Clojure

My little tutorial started with part zero, in which I lamented how functional programming is made to appear unlearnable by mere mortals, and it kind of snowballed from there. Hope you like it and/or find it useful!

 

Functional programming without feeling stupid, part 5: Project

In the last four installments of Functional programming without feeling stupid I’ve slowly built up a small utility called ucdump with Clojure. Experimentiing and developing with the Clojure REPL is fun, but now it’s time to give some structure to the utility. I’ll package it up as a Leiningen project and create a standalone JAR for executing with the Java runtime.

Creating a new project with Leiningen

You can use Leiningen to create a skeleton project quickly. In my project’s root directory, I’ll say:

lein new app ucdump

Leiningen will respond with:

Generating a project called ucdump based on the 'app' template.

The result is a directory called ucdump, which contains:

.gitignore   README.md    project.clj  src/
LICENSE      doc/         resources/   test/

For now I’m are most interested in the project file, project.clj, which is actually a Clojure source file, and the src directory, which is intended for the app’s actual source files.

Leiningen creates a directory called src/ucdump and seeds it with a core.clj file, but that’s not what actually what I want, for two reasons:

  • I want ucdump to be a good Clojure citizen, so I’m going to put it in a namespace
    called com.coniferproductions.ucdump.
  • My Git repository for ucdump also contains the original Python version of the application, which is in <project-root>/python, and I want the Clojure version to live in <project-root>/clojure.

Continue reading

Functional programming without feeling stupid, part 4: Logic

In the previous parts of “Functional programming without feeling stupid” we have slowly been building ucdump, a utility program for listing the Unicode codepoints and character names of characters in a string. In actual use, the string will be read from a UTF-8 encoded text file.

We don’t know yet how to read a text file in Clojure (well, you may know, but I only have a foggy idea), so we have been working with a single string. This is what we have so far:

(def test-str 
  "Na\u00EFve r\u00E9sum\u00E9s... for 0 \u20AC? Not bad!")
(def test-ch { :offset 0 :character \u20ac })
(def short-test-str "Na\u00EFve")

(defn character-name [x]
  (java.lang.Character/getName (int x)))

(defn character-line [pair]
  (let [ch (:character pair)]
    (format "%08d: U+%06X %s"
      (:offset pair) (int ch)
      (character-name ch))))
    
(defn character-lines [s]
  (let [offsets (repeat (count s) 0)
        pairs (map #(into {} {:offset %1 :character %2}) 
          offsets s)]
    (map character-line pairs)))

I’ve reformatted the code a bit to keep the lines short. You can copy and paste all of that in the Clojure REPL, and start looking at some strings in a new way:

user=> (character-lines "résumé")
("00000000: U+000072 LATIN SMALL LETTER R" 
"00000000: U+0000E9 LATIN SMALL LETTER E WITH ACUTE" 
"00000000: U+000073 LATIN SMALL LETTER S" 
"00000000: U+000075 LATIN SMALL LETTER U" 
"00000000: U+00006D LATIN SMALL LETTER M" 
"00000000: U+0000E9 LATIN SMALL LETTER E WITH ACUTE")

But we are still missing the actual offsets. Let’s fix that now.

Continue reading

Functional programming without feeling stupid, part 3: Higher-order functions

Welcome to the third installment of “Functional programming without feeling stupid”! I originally started to describe my own learnings about FP in general, and Clojure in particular, and soon found myself writing a kind of Clojure tutorial or introduction. It may not be as comprehensive as others out there, and I still don’t think of it as a tutorial — it’s more like a description of a process, and the documented evolution of a tool.

I wanted to use Clojure “in anger”, and found out that I was learning new and interesting stuff quickly. I wanted to share what I’ve learned in the hope that others may find it useful.

Some of the stuff I have done and described here might not be the most optimal, but I see nothing obviously wrong with my approach. Maybe you do; if that is the case, tell me about it in the comments, or contact me otherwise. But please be nice and constructive, because…

…in Part 0 I wrote about how some people may feel put off by the air of “smarter than thou” that sometimes floats around functional programming. I’m hoping to present the subject in a friendly way, because much of the techniques are not obvious to someone (like me) conditioned with a couple of decades of imperative, object-oriented programming. Not nearly as funny as Learn You a Haskell For Great Good, and not as zany as Clojure for the Brave and True — just friendly, and hopefully lucid.

xkcd 1270: Functional
xkcd 1270: Functional. Licensed under Creative Commons Attribution-Non-Commercial License. This is a company blog, so it is kind of commercial by definition. Is that a problem?

In Part 1 we played around with the Clojure REPL, and in Part 2 we started making definitions and actually got some useful results. In this third part we’re going to take a look at Clojure functions and how to use them, and create our own — because that’s what functional programming is all about.

Continue reading

Functional programming without feeling stupid, part 2: Definitions

In this installment of “Functional programming without feeling stupid” I would like to show you how to define things in Clojure. Values and function applications are all well and good, but if we can’t give them symbolic names, we need to keep repeating them over and over.

Before we start naming things, let’s have a look at how Clojure integrates with Java. I’m assuming you are still in the REPL, or have started it again with lein repl.

Track 1: “If anyone should ask / We are mated”

As it happens, Clojure’s core library is lean and focused on manipulating the data structures of the language, so many things are deferred to the underlying Java machinery as a rule. For example, mathematical computation is typically done using the static methods in the java.lang.Math class:

user=> (java.lang.Math/sqrt 5.0)
2.23606797749979

As you can see, this is a function application like we have already seen in Part 1, but this time the function we are using is the sqrt static method in the java.lang.Math class.

Java 7 acquired Unicode character names, and they are accessed through
the getName method in the java.lang.Character class. This is no mean feat, since there are over 110,000 characters in the Unicode standard, and each of them has a name (although some of them are algorithmically generated). To find out the canonical character name of a Unicode character, such as the euro currency symbol, you would use the getName static method:

user=> (java.lang.Character/getName \u20ac)
ClassCastException java.lang.Character cannot be cast to java.lang.Number user/eval703 (NO_SOURCE_FILE:1)

Hey, what’s wrong? Well, if you look up the documentation of java.lang.Character.getName, you will find out that it takes an int value as an argument, not a character. You can actually do this check from inside the REPL:

user=> (javadoc java.lang.Character)
true

The REPL doesn’t seem to do much, but you should now have a new web browser window or tab open, with the JavaDoc of the java.lang.Character class loaded up. That’s what the REPL meant when it said

Javadoc: (javadoc java-object-or-class-here)

when it started up. The getName method does need an int value, so let’s try something else:

user=> (java.lang.Character/getName (int \u20ac))
"EURO SIGN"

All right! How about another one:

user=> (java.lang.Character/getName 67)
"LATIN CAPITAL LETTER C"

Well, some say that Clojure is the new C.

Continue reading

Functional programming without feeling stupid, part 1: The Clojure REPL

In my recent post Functional Programming Without Feeling Stupid I took a quick look at how functional programming can be a little off-putting for the non-initiated. I promised to provide some examples of my own first steps with FP, and now I would like to present some to you.

Advocates of functional programming often refer to increased programmer productivity. At least some of that can be attributed to the REPL, or the Read-Evaluate-Print Loop. We are basically talking about an environment which accepts and parses any code you type in, and gives you a place to experiment and see results quickly. Before interpreted or semi-interpreted languages like Python, Java and JavaScript became mainstream, the typical repeating cycle in software development was Compile-Link-Execute, and debugging meant observing special output on the console. In the 1990s integrated debuggers with watches and breakpoints became the norm, but long before that Lisp-like languages already had a REPL, and Python also acquired one.

If you are thinking about getting intimate with Clojure, you will need to get to know the REPL. It is your playground, and will always be, even if you later start packaging and organizing your code.

Clojure depends on the Java Virtual Machine (JVM) and is actually distributed as a normal JAR file (Java ARchive), like most Java libraries are. You can start Clojure from the JAR, but you will save yourself some trouble and prepare for the future if you install Leiningen, the dependency management tool for Clojure. It is simple to install and run, and I will assume that you will follow the instructions on the Leiningen web site sooner or later. Now would be a good time.

When you’re done with the installation, you only need to say

lein repl

to start a Clojure REPL. I’m using OS X, so what I describe here was done from Terminal. You don’t need to create a project with Leiningen if you just want to play around in the REPL.

Of course, if you don’t have Java installed, you need to get it first. Refer to the Java web site of Oracle for details as necessary. Furthermore, some of the things I will describe require Java version 7 or later.

Continue reading

The FIFA World Cup is not internationalized

At the time of this writing, the FIFA World Cup is almost at its final stage. Throughout the whole month of the tournament I have been amazed to find out that while UEFA has consistently made an effort to have the players’ names written as authentically as possible, FIFA hasn’t. Information conveyed to viewers in televised matches is transliterated, making many players’ names appear irritatingly different than their national, conventional spellings.

It can be argued that FIFA has a tougher job with various languages and characters. After all, the 2014 World Cup had Japan, South Korea, Iran, Russia, and Greece, among others. These countries and their languages alone represent half a dozen character sets. There is a point in trying to achieve some sort of baseline transliteration, so that all names are expressed in the Latin character set. In practice this means that Russian, Greek, and other names are transliterated by default. However, names which can be perfectly written in extended Latin should not be transliterated. Now it seems like everything has been folded down to basic ASCII.

There is no technical reason to limit the displays on television screens to what is essentially a 7-bit character set, but this is what the result amounts to. For example, one of the finalists in this World Cup is Germany, and many of the players have characters with umlauts in their names. Here are a few examples:

Oezil should be Özil
Mueller should be Müller
Goetze should be Götze
Hoewedes should be Höwedes

EDIT: Also, the German coach is Joachim Löw, not Loew (or Low, as the content on the official FIFA apps would have us believe).

The players’ jerseys have the truth anyway, and it’s in direct contrast with what you see when somebody scores a goal, gets booked or gets sent off.

So please, FIFA, take a leaf from UEFA’s book and find out how to use modern broadcasting technology to the advantage of all football fans around the world (even if they might call it soccer). You might also start with a good book like “Unicode Explained” by Jukka K. Korpela.

Unicode character dump in Python

Sometimes you just need to see what characters are lurking inside a Unicode encoded text file. Your garden variety dump utility (like the venerable od in UNIX systems and the Windows standard hex dump (though I don’t think there is one) only shows you the plain bytes, so you have to head over to unicode.org to find out what they mean. But first you need to decode UTF-8 to get the actual code points, or grok UTF-16 LE or BE, and so on. It’s fun, but it’s not for everyone.

The udump utility shows you a nice list of character names, together with their offsets in the file. Currently it only handles UTF-8, so the offset is calculated based on the UTF-8 length of the character.

Continue reading

Unicode is in Your Now

If this blog entry was written 10–15 years ago, the title would have been “Unicode is in Your Future“. Luckily, the Unicode standard has been widely adopted during the last decade, so much so that it has almost become a part of the process and not something that you need to expend very much extra effort on. It is here Now, and has been for some time now.

However, Unicode still isn’t quite as widely understood as it needs to be, and it is often adopted as a black box that nobody can really fix when something goes wrong. Therefore it is not at all bad to try and bring it into perspective.

You need to understand at least why Unicode should be used, and how not to make it more complicated than it is (even though it can still be quite complicated). So keep reading.

Continue reading