Functional programming without feeling stupid, part 5: Project

In the last four installments of Functional programming without feeling stupid I’ve slowly built up a small utility called ucdump with Clojure. Experimentiing and developing with the Clojure REPL is fun, but now it’s time to give some structure to the utility. I’ll package it up as a Leiningen project and create a standalone JAR for executing with the Java runtime.

Creating a new project with Leiningen

You can use Leiningen to create a skeleton project quickly. In my project’s root directory, I’ll say:

lein new app ucdump

Leiningen will respond with:

Generating a project called ucdump based on the 'app' template.

The result is a directory called ucdump, which contains:

.gitignore   README.md    project.clj  src/
LICENSE      doc/         resources/   test/

For now I’m are most interested in the project file, project.clj, which is actually a Clojure source file, and the src directory, which is intended for the app’s actual source files.

Leiningen creates a directory called src/ucdump and seeds it with a core.clj file, but that’s not what actually what I want, for two reasons:

  • I want ucdump to be a good Clojure citizen, so I’m going to put it in a namespace

    called com.coniferproductions.ucdump.

  • My Git repository for ucdump also contains the original Python version of the application, which is in <_project-root_>/python, and I want the Clojure version to live in <<em>project-root</em>>/clojure.

[Read More]

Functional programming without feeling stupid, part 4: Logic

In the previous parts of “Functional programming without feeling stupid” we have slowly been building ucdump, a utility program for listing the Unicode codepoints and character names of characters in a string. In actual use, the string will be read from a UTF-8 encoded text file.

We don’t know yet how to read a text file in Clojure (well, you may know, but I only have a foggy idea), so we have been working with a single string. This is what we have so far:

(def test-str 
  "Na\u00EFve r\u00E9sum\u00E9s... for 0 \u20AC? Not bad!")
(def test-ch { :offset 0 :character \u20ac })
(def short-test-str "Na\u00EFve")

(defn character-name [x]
  (java.lang.Character/getName (int x)))

(defn character-line [pair]
  (let [ch (:character pair)]
    (format "%08d: U+%06X %s"
      (:offset pair) (int ch)
      (character-name ch))))
    
(defn character-lines [s]
  (let [offsets (repeat (count s) 0)
        pairs (map #(into {} {:offset %1 :character %2}) 
          offsets s)]
    (map character-line pairs)))

I’ve reformatted the code a bit to keep the lines short. You can copy and paste all of that in the Clojure REPL, and start looking at some strings in a new way:

user=> (character-lines "résumé")
("00000000: U+000072 LATIN SMALL LETTER R" 
"00000000: U+0000E9 LATIN SMALL LETTER E WITH ACUTE" 
"00000000: U+000073 LATIN SMALL LETTER S" 
"00000000: U+000075 LATIN SMALL LETTER U" 
"00000000: U+00006D LATIN SMALL LETTER M" 
"00000000: U+0000E9 LATIN SMALL LETTER E WITH ACUTE")

But we are still missing the actual offsets. Let’s fix that now.

[Read More]