Goodbye vCard, hello jCard!

The vCard format has served us well for encoding and exchanging contact information, but there is a better alternative – jCard. In this post I’ll describe why I think jCard can be better than vCard, and should be adopted by every software vendor who deals with contacts.

Both vCard and jCard are text-based formats, but whereas vCard has a unique legacy grammar, jCard is based on JSON. They both represent essentially the same data model that describes contact information: name, phone number, address, e-mail, and so on.

While vCard is widely supported, it is difficult to deal with. Being JSON-based, jCard has the advantage of being ready for use as soon as you parse it. You can even pluck values straight from the JSON object tree you get from the parser. Like vCard versions 3.0 and 4.0, jCard is an IETF proposed standard, RFC 7095.

Both of them represent a very complex data model, and both are semantically quite complicated. Still, I maintain that it is time to leave vCard behind and move on, because… well, read on.

vCard – hard to parse, hard to generate, almost easy to read

On the surface, vCard seems simple enough. It is a line-based format, where each line has a delimited format. Here is a simple sample of a vCard from randomuser.me, converted to vCard 3.0 by hand:

BEGIN:VCARD
VERSION:3.0
FN:Marilou Lam
N:Lam;Marilou;;Ms;
ADR:;;3892 Duke St;Oakville;NJ;79279;U.S.A.
EMAIL:marilou.lam@example.com
BDAY:1967-06-09T06:59:48-05:00
END:VCARD

Doesn’t seem to bad, does it? Well, I’ve written my share of ad hoc vCard parsers and generators in the past (in Objective-C, Java, and C#), and never went all the way, because when you start looking into the corner cases and the special handling required for many fields, you want to run away and hide.

First of all, the lines are not just lines. They could continue on to the next, and the process of continuation is complex, but not as complex as joining them back together, so most naïve vCard handlers just ignore continuation. By the way, according to the specification the line delimiter is a CR-LF pair.

The properties are separated from the values with a colon. Some properties may have multiple fields, so those need to be separated with a semicolon. Some properties may have multiple values, so those need to be separated with a comma. The fun begins if you need to have any of those separators in a value (it could happen, you know). They need to be escaped with a backslash. If you need to have a backslash, it’s time for another backslash. And newlines in values need to be expressed just like in many programming languages, with \n. It’s complicated.

So vCard may be quite human-readable, but that is neither here nor there, because vCards are not meant for eyeballing. They are meant for information exchange between computer systems. That also brings up the not insignificant issue of character encodings.

Y U NO I18N?!

There are several versions of vCard out there. The latest version, 4.0 (which jCard is also based on), sensibly mandates UTF-8 as the character encoding of the vCard. Unfortunately nobody seems to support vCard 4.0.

Version 3.0 of the vCard specification says the character encoding must be specified in the Content-Type MIME header field, using the charset parameter. That is fine, unless you don’t use HTTP to transmit the vCard, at which point you either agree between parties or guess.

Poor old version 2.1 from 1996, still in active duty, has the ASCII character encoding as the default, unless you specify something else with the CHARSET parameter. For every property. As you can imagine, nobody does that, and then we get stuff like “Käpyaho” for my last name (and I really resent that).

The default character encoding for JSON is UTF-8, which is really nice. If you don’t trust your endpoints, you can always \uXXXX escape everything beyond ASCII in your JSON string values.

jCard – easy to parse, fairly easy to generate, hard to read (except when pretty-printed)

Consider the jCard equivalent of the vCard above. Because it is JSON, it has a uniform syntax, so that you can feed it into any JSON parser and reap the rewards. It looks something like this:

["vcard",
  [
    ["version", {}, "text", "4.0"],
    ["fn", {}, "text", "Marilou Lam"],
    ["n",
      {},
      "text",
      ["Lam", "Marilou", "", "", "Ms"]
    ],
    ["adr", {}, "text",
      ["", "", "3892 Duke St", "Oakville", "NJ", "79279", "U.S.A."]
    ],
    ["email", { }, "text", "marilou.lam@example.com"],
    ["bday", {}, "date-and-or-time", "1967-06-09T06:59:48-05:00"]
  ]
]

(Sorry if the formatting is a mess, I’m trying to figure out how to present it. If you see code tags, they are not part of the jCard.)

As you can see, the jCard has a standard JSON syntax, but the content itself is not typical JSON. It is a polymorphic array, containing string and arrays. The top-level object is an array, so that you can have multiple contacts in one payload.

All the vCard properties are arrays where the first item is a string identifying the property. The rest are objects, simple values or arrays, based on the property. The jCard specification goes into great detail explaining how they are constructed.

The JSON example above is verbose, because it has been pretty-printed to be eyeballed by a human. If you were to condense it for online transmission, it would be like this:

["vcard",[["version",{},"text","4.0"],["fn",{},"text","Marilou Lam"],
["n",{},"text",["Lam","Marilou","","","Ms"]],["adr",{},"text",
["","","3892 Duke St","Oakville","NJ","79279","U.S.A."]],
["email",{},"text","marilou.lam@example.com"],["bday",{},
"date-and-or-time","1967-06-09T06:59:48-05:00"]]]

I only cheated a little by inserting a couple of newlines, but otherwise it’s tight, no-nonsense JSON.

The devil is in the details

So, vCard is difficult to parse, but jCard is not, at least for a JSON parser. However, semantically jCard is still just as complicated.

There are obviously inherent difficulties in representing contact information, and while jCard sidesteps the obvious issues of parsing and generating, the semantics are still there to be dealt with. Some might argue that it would have been better to create a whole new data model for jCard, instead of defining a JSON-based equivalent, but I don’t think it would have made much difference, going by the low adoption rate of jCard by major vendors.

Depending on the programming language, it is relatively easy or quite painful to deal with the results from a parsed jCard. If you have dynamic typing, you can just skate away. But if you have a strongly typed language like Java or Swift, you will need to be creative.

I want/need to parse jCards in an iOS application that I’m in the process of converting to Swift from Objective-C. Last time I got all the way to Swift 2.2, but then the work on the app stalled for a while, and in the meantime, Swift 3 happened. Earlier I had written some helpers to get from jCard to the iOS Contacts framework, but they did some naughty things which are not allowed in Swift 3 anymore.

So I decided to bite the bullet and create a proper library for dealing with jCards in Swift, called ContactCard. It is open sourced under the MIT License, and is up on GitHub if you would like to have a look or contribute. It’s very early alpha, so I expect it to improve a lot in a short time as I prepare it for the app. I intend to eventually publish it on CocoaPods too, and I hope to write some more about how I intend to solve the problems of polymorphic values in jCard properties.

 

Functional programming without feeling stupid, part 2: Definitions

In this installment of “Functional programming without feeling stupid” I would like to show you how to define things in Clojure. Values and function applications are all well and good, but if we can’t give them symbolic names, we need to keep repeating them over and over.

Before we start naming things, let’s have a look at how Clojure integrates with Java. I’m assuming you are still in the REPL, or have started it again with lein repl.

Track 1: “If anyone should ask / We are mated”

As it happens, Clojure’s core library is lean and focused on manipulating the data structures of the language, so many things are deferred to the underlying Java machinery as a rule. For example, mathematical computation is typically done using the static methods in the java.lang.Math class:

user=> (java.lang.Math/sqrt 5.0)
2.23606797749979

As you can see, this is a function application like we have already seen in Part 1, but this time the function we are using is the sqrt static method in the java.lang.Math class.

Java 7 acquired Unicode character names, and they are accessed through
the getName method in the java.lang.Character class. This is no mean feat, since there are over 110,000 characters in the Unicode standard, and each of them has a name (although some of them are algorithmically generated). To find out the canonical character name of a Unicode character, such as the euro currency symbol, you would use the getName static method:

user=> (java.lang.Character/getName \u20ac)
ClassCastException java.lang.Character cannot be cast to java.lang.Number user/eval703 (NO_SOURCE_FILE:1)

Hey, what’s wrong? Well, if you look up the documentation of java.lang.Character.getName, you will find out that it takes an int value as an argument, not a character. You can actually do this check from inside the REPL:

user=> (javadoc java.lang.Character)
true

The REPL doesn’t seem to do much, but you should now have a new web browser window or tab open, with the JavaDoc of the java.lang.Character class loaded up. That’s what the REPL meant when it said

Javadoc: (javadoc java-object-or-class-here)

when it started up. The getName method does need an int value, so let’s try something else:

user=> (java.lang.Character/getName (int \u20ac))
"EURO SIGN"

All right! How about another one:

user=> (java.lang.Character/getName 67)
"LATIN CAPITAL LETTER C"

Well, some say that Clojure is the new C.

Continue reading

Functional programming without feeling stupid, part 1: The Clojure REPL

In my recent post Functional Programming Without Feeling Stupid I took a quick look at how functional programming can be a little off-putting for the non-initiated. I promised to provide some examples of my own first steps with FP, and now I would like to present some to you.

Advocates of functional programming often refer to increased programmer productivity. At least some of that can be attributed to the REPL, or the Read-Evaluate-Print Loop. We are basically talking about an environment which accepts and parses any code you type in, and gives you a place to experiment and see results quickly. Before interpreted or semi-interpreted languages like Python, Java and JavaScript became mainstream, the typical repeating cycle in software development was Compile-Link-Execute, and debugging meant observing special output on the console. In the 1990s integrated debuggers with watches and breakpoints became the norm, but long before that Lisp-like languages already had a REPL, and Python also acquired one.

If you are thinking about getting intimate with Clojure, you will need to get to know the REPL. It is your playground, and will always be, even if you later start packaging and organizing your code.

Clojure depends on the Java Virtual Machine (JVM) and is actually distributed as a normal JAR file (Java ARchive), like most Java libraries are. You can start Clojure from the JAR, but you will save yourself some trouble and prepare for the future if you install Leiningen, the dependency management tool for Clojure. It is simple to install and run, and I will assume that you will follow the instructions on the Leiningen web site sooner or later. Now would be a good time.

When you’re done with the installation, you only need to say

lein repl

to start a Clojure REPL. I’m using OS X, so what I describe here was done from Terminal. You don’t need to create a project with Leiningen if you just want to play around in the REPL.

Of course, if you don’t have Java installed, you need to get it first. Refer to the Java web site of Oracle for details as necessary. Furthermore, some of the things I will describe require Java version 7 or later.

Continue reading

The Magic of Replaceable Parameters

Apart from hard-coding “Hello, world!”, another bad habit you may have picked up when you learned programming is constructing user-visible messages from parts: strings, numbers and other data, concatenated together.

For example, say that you had to show the user how many unread messages there are in a given mailbox. Let’s assume that mailboxName contains the name of the mailbox, and messageCount holds the number of unread messages. In Java, you might be tempted to whip up a user-visible message like this:

String message = "There are " + messageCount + " unread messages in mailbox '" + mailboxName + "'";

This is not the way to do it in an international application. Let’s find out why, and have a look at a better and much more future-proof way of doing it.
Continue reading

Introduction to Programming, Revisited

Many introductory programming courses start with a very simple first program, the traditional “hello world”. It is intended to teach how to edit, compile and run a program in a new programming environment using a simplified context, and it serves its purpose quite well. Unfortunately, the simple practice used in storing and displaying the text is quite contrary to a basic principle of software internationalization. Continue reading