Goodbye vCard, hello jCard!

The vCard format has served us well for encoding and exchanging contact information, but there is a better alternative – jCard. In this post I’ll describe why I think jCard can be better than vCard, and should be adopted by every software vendor who deals with contacts.

Both vCard and jCard are text-based formats, but whereas vCard has a unique legacy grammar, jCard is based on JSON. They both represent essentially the same data model that describes contact information: name, phone number, address, e-mail, and so on.

While vCard is widely supported, it is difficult to deal with. Being JSON-based, jCard has the advantage of being ready for use as soon as you parse it. You can even pluck values straight from the JSON object tree you get from the parser. Like vCard versions 3.0 and 4.0, jCard is an IETF proposed standard, RFC 7095.

Both of them represent a very complex data model, and both are semantically quite complicated. Still, I maintain that it is time to leave vCard behind and move on, because… well, read on.

vCard – hard to parse, hard to generate, almost easy to read

On the surface, vCard seems simple enough. It is a line-based format, where each line has a delimited format. Here is a simple sample of a vCard from randomuser.me, converted to vCard 3.0 by hand:

BEGIN:VCARD
VERSION:3.0
FN:Marilou Lam
N:Lam;Marilou;;Ms;
ADR:;;3892 Duke St;Oakville;NJ;79279;U.S.A.
EMAIL:marilou.lam@example.com
BDAY:1967-06-09T06:59:48-05:00
END:VCARD

Doesn’t seem to bad, does it? Well, I’ve written my share of ad hoc vCard parsers and generators in the past (in Objective-C, Java, and C#), and never went all the way, because when you start looking into the corner cases and the special handling required for many fields, you want to run away and hide.

First of all, the lines are not just lines. They could continue on to the next, and the process of continuation is complex, but not as complex as joining them back together, so most naïve vCard handlers just ignore continuation. By the way, according to the specification the line delimiter is a CR-LF pair.

The properties are separated from the values with a colon. Some properties may have multiple fields, so those need to be separated with a semicolon. Some properties may have multiple values, so those need to be separated with a comma. The fun begins if you need to have any of those separators in a value (it could happen, you know). They need to be escaped with a backslash. If you need to have a backslash, it’s time for another backslash. And newlines in values need to be expressed just like in many programming languages, with \n. It’s complicated.

So vCard may be quite human-readable, but that is neither here nor there, because vCards are not meant for eyeballing. They are meant for information exchange between computer systems. That also brings up the not insignificant issue of character encodings.

Y U NO I18N?!

There are several versions of vCard out there. The latest version, 4.0 (which jCard is also based on), sensibly mandates UTF-8 as the character encoding of the vCard. Unfortunately nobody seems to support vCard 4.0.

Version 3.0 of the vCard specification says the character encoding must be specified in the Content-Type MIME header field, using the charset parameter. That is fine, unless you don’t use HTTP to transmit the vCard, at which point you either agree between parties or guess.

Poor old version 2.1 from 1996, still in active duty, has the ASCII character encoding as the default, unless you specify something else with the CHARSET parameter. For every property. As you can imagine, nobody does that, and then we get stuff like “Käpyaho” for my last name (and I really resent that).

The default character encoding for JSON is UTF-8, which is really nice. If you don’t trust your endpoints, you can always \uXXXX escape everything beyond ASCII in your JSON string values.

jCard – easy to parse, fairly easy to generate, hard to read (except when pretty-printed)

Consider the jCard equivalent of the vCard above. Because it is JSON, it has a uniform syntax, so that you can feed it into any JSON parser and reap the rewards. It looks something like this:

["vcard",
  [
    ["version", {}, "text", "4.0"],
    ["fn", {}, "text", "Marilou Lam"],
    ["n",
      {},
      "text",
      ["Lam", "Marilou", "", "", "Ms"]
    ],
    ["adr", {}, "text",
      ["", "", "3892 Duke St", "Oakville", "NJ", "79279", "U.S.A."]
    ],
    ["email", { }, "text", "marilou.lam@example.com"],
    ["bday", {}, "date-and-or-time", "1967-06-09T06:59:48-05:00"]
  ]
]

(Sorry if the formatting is a mess, I’m trying to figure out how to present it. If you see code tags, they are not part of the jCard.)

As you can see, the jCard has a standard JSON syntax, but the content itself is not typical JSON. It is a polymorphic array, containing string and arrays. The top-level object is an array, so that you can have multiple contacts in one payload.

All the vCard properties are arrays where the first item is a string identifying the property. The rest are objects, simple values or arrays, based on the property. The jCard specification goes into great detail explaining how they are constructed.

The JSON example above is verbose, because it has been pretty-printed to be eyeballed by a human. If you were to condense it for online transmission, it would be like this:

["vcard",[["version",{},"text","4.0"],["fn",{},"text","Marilou Lam"],
["n",{},"text",["Lam","Marilou","","","Ms"]],["adr",{},"text",
["","","3892 Duke St","Oakville","NJ","79279","U.S.A."]],
["email",{},"text","marilou.lam@example.com"],["bday",{},
"date-and-or-time","1967-06-09T06:59:48-05:00"]]]

I only cheated a little by inserting a couple of newlines, but otherwise it’s tight, no-nonsense JSON.

The devil is in the details

So, vCard is difficult to parse, but jCard is not, at least for a JSON parser. However, semantically jCard is still just as complicated.

There are obviously inherent difficulties in representing contact information, and while jCard sidesteps the obvious issues of parsing and generating, the semantics are still there to be dealt with. Some might argue that it would have been better to create a whole new data model for jCard, instead of defining a JSON-based equivalent, but I don’t think it would have made much difference, going by the low adoption rate of jCard by major vendors.

Depending on the programming language, it is relatively easy or quite painful to deal with the results from a parsed jCard. If you have dynamic typing, you can just skate away. But if you have a strongly typed language like Java or Swift, you will need to be creative.

I want/need to parse jCards in an iOS application that I’m in the process of converting to Swift from Objective-C. Last time I got all the way to Swift 2.2, but then the work on the app stalled for a while, and in the meantime, Swift 3 happened. Earlier I had written some helpers to get from jCard to the iOS Contacts framework, but they did some naughty things which are not allowed in Swift 3 anymore.

So I decided to bite the bullet and create a proper library for dealing with jCards in Swift, called ContactCard. It is open sourced under the MIT License, and is up on GitHub if you would like to have a look or contribute. It’s very early alpha, so I expect it to improve a lot in a short time as I prepare it for the app. I intend to eventually publish it on CocoaPods too, and I hope to write some more about how I intend to solve the problems of polymorphic values in jCard properties.

 

Book review: Data Science at the Command Line

No matter how handy graphical user interfaces are, the good old command line remains a useful tool for performing various low-level data manipulation and system administration tasks. It is the fallback when you need to do something that has no way of graphical control. Being much more expressive and open-ended than a predefined set of controls, the command shell is the ultimate control environment for your computer.

Data science has become one of the most intensely practised computer applications, so it is no wonder that it also benefits greatly from the hands-on control approach of the command line shell. Data scientist Jeroen Janssens has had the foresight to combine the fundamental operations of data science and the most suitable command line tools into a book that collects many useful practices, tips and tricks for processing and preparing data, called “Data Science at the Command Line” (O’Reilly, 2014).

Data Science at the Command Line

At its highest abstraction levels, data science involves using models and machine learning to extract patterns from data and extrapolate results from data sets that are often much larger than fits in memory at any one time. At a lower level, it involves multiple file formats and just plain hard work to get the data in a fit shape to be analysed, and this is where the command line comes in.

There is only so much you can do with canned tools like text editors, but a world of possibilities opens for you when you have the power can chain simple commands together, forming pipelines of data where one command’s output becomes another one’s input. You can also redirect input from a file to a command, and from a command to a file.

Even though Linux and macOS installations have various command shells, apart from the defaults, Janssens shows you how to use a set of tools called the Data Science Toolbox, which actually uses VirtualBox or Vagrant to plant a self contained GNU/Linux environment with Python, R and various other tools of the trade on your local machine, without disturbing the host operating system too much.

With real-life examples, Janssens shows you how to use classic Linux command line tools like cut, grep, tr, uniq and sort to your advantage. You will also learn how to get data from the Internet, from databases and even Microsoft Excel spreadsheets, where most of the world’s operational data lies hidden from plain sight.

From this book I learned completely new and interesting ways to work with CSV (Comma Separated Value) files, and it introduced me to the excellent csvkit, with its collection of power tools to cut, merge and reorder columns in CSV files, perform SQL-style queries on the lines, and grep through them.

Among other things you get information on Drake, described as “make for data” – which, if you’re familiar with the classic software development tool make (and of course you are) should whet your appetite. There is also a chapter about how to make your data pipelines run faster by parallelising them and running commands on remote machines.

Scrubbing the data is less than half the fun, but usually more than half of the work in data science. You will learn to write executable scripts in Python and R with their comprehensive data science and statistics libraries, and learn to explore your data using visualisations that consist of statistical diagrams like bar charts and box plots. So the command line is not just text; even though the images are generated using commands, they are obviously shown in a GUI window.

Finally, there is a chapter on modelling data using both supervised and unsupervised learning methods, which serves as a cursory introduction to machine learning, although you are referred to more comprehensive texts on the algorithms involved.

At the back of the book there is a handy reference for all the commands discussed in the book, which include many of the old UNIX stalwarts found in Linux, but also newer tools like jq for processing JSON.

If you need to do data preparation for a data science project, you owe it to yourself to become good friends with the command line, and utilise the many tools described in Janssens’ book in your daily work. Even if you don’t “automate all the things“, you will benefit from the pipeline approach to data processing.

Buy the e-book at the O’Reiily web shop:
Data Science at the Command Line

The book also has a website, datascienceatthecommandline.com, where you can preview some of its content.

For the history and philosophy of the command line, you should read Neal Stephenson’s In the Beginning Was the Command Line.

Conifer Coaching – koodarit uusille urille

Taittuuko ohjelmointi sinulta mennen tullen, mutta osaaminen ei juuri nyt vastaa tämän päivän tarpeisiin? Ohjelmistoalalla teknologiat tulevat ja menevät, mutta aina pääsee kärryille.

Anna Conifer Coaching -palvelun oraakkelin kertoa sinulle mikä on nyt hip ja pop, ja millä saa oikeasti aikaan:

  • Web-palveluja
  • Mobiilisovelluksia
  • Serverisoftaa
  • Data-analyysiä
  • Internet of Thingsiä

– eli Mitä Softantekijän Kannattaa Opetella Juuri Nyt, Ja Miten!

Ei hypeä, vaan vastaamista todellisiin tarpeisiin. Palkattavuutta CV:si ja kiinnostustesi perusteella. Uraneuvonnaksikin sitä voisi sanoa. Teknologistin pitkällä kokemuksella ja jäätävällä näkemyksellä, juuri sinulle.

Oletko valmis suuntaamaan urasi kärjen uuteen suuntaan? Kysy lisää meilitse: info (at) coniferproductions (dot) com

#‎ohjelmistokehitys‬ ‪#‎Microsoft‬ ‪#‎Nokia‬ ‪#‎muuntokoulutus‬ ‪#‎urakehitys‬ ‪#‎rekry‬ ‪#‎softa‬ ‪#‎neuvonta‬ ‪#‎coaching‬ ‪#valmennus #‎devaaja‬ ‪#‎developer‬ ‪#‎kehittäjä‬

C# and F# on the Mac with Mono

Mono is the open source .NET runtime for Windows, Linux, and OS X. It consists of the Mono runtime environment, libraries, and C# and F# compilers. Recently Mono has gained extra popularity due to Microsoft’s purchase of Xamarin, the makers of a cross-platform toolkit of the same name.

If you just want to create command-line .NET applications on the Mac, and don’t need Xamarin.Forms or the mobile tools, you can just install Mono and start hacking away.

The Mono Project home page advises you to download and install Mono as a Mac package, but you also do a a Homebrew-based installation. If you don’t yet have Homebrew (“the missing package manager for OS X”), install it by following the instructions on its home page.

Once you have Homebrew installed, you can install Mono:

$ brew install mono
==> Downloading https://homebrew.bintray.com/bottles/mono-4.2.3.4.yosemite.bottl
######################################################################## 100.0%
==> Pouring mono-4.2.3.4.yosemite.bottle.tar.gz
==> Caveats
To use the assemblies from other formulae you need to set:
export MONO_GAC_PREFIX="/usr/local"
Note that the 'mono' formula now includes F#. If you have
the 'fsharp' formula installed, remove it with 'brew uninstall fsharp'.
==> Summary
🍺 /usr/local/Cellar/mono/4.2.3.4: 1,280 files, 205.2M

You can probably pick up that I’m still using OS X Yosemite on this machine, but there shouldn’t be any difference with El Capitan. If you upgraded from Yosemite to El Capitan, and had Homebrew installed, you may have run into an issue with the OS X security restrictions – read the solution.

C# support for Visual Studio Code

Microsoft Visual Studio Code, or VSCode for short, is a relatively new programmer’s text editor, but already quite mature. Typically I use it for Python, Clojure and JavaScript. Now I wanted to use it to edit C# source files on the Mac, but surprisingly it does not have C# syntax highlighting support out of the box. You need to install an extension and restart VSCode.

Hello, .NET world!

Just a simple C# source file to get you started:

using System;

namespace Hello
{
    class Hello
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello, .NET world!");
        }
    }
}

Save it as Hello.cs. Compile with:

mcs Hello.cs

Here, mcs is the Mono C# compiler. You should get a file named Hello.exe, but you can’t execute it directly. Instead, use the Mono runtime:

mono Hello.exe

You should see the greeting printed out by Console.WriteLine.

Why C#?

I’m dusting off the C# tools on my Mac because I envision that C# and .NET will become more important on OS X because of the Xamarin acquisition. I like C#, sometimes better than Java, and have programmed many applications for Windows Phone with it.

Why F#?

F# intrigues me as a language that embraces many of the good things about functional programming, but lets you leverage the .NET ecosystem. I’ve started to learn F# in earnest several times during the last few years, but have not made a concentrated attempt yet. Hopefully soon.

Semi-Autonomous, Programmable Drones Incoming

Drones, or Unmanned Aerial Vehicles (UAVs), be they quadcopters or other type of flyer, will become more “intelligent” as themselves or by forming swarms, as this TED Talk by Vijay Kumar at the University of Pennsylvania shows:

My interest in drones lies not in flying them myself live, because I’m a lousy pilot and don’t play games much anyway, but in making them follow a predetermined route and return back to the starting point – for example, surveying an object or estate, or even carrying cargo between waypoints. The gorgeous aerial shots you get with many drones these days are great, of course, but I’ll let others play the director, and instead concentrate on the programming.

I recently got a Parrot AR.Drone 2.0 Elite Edition, mostly because it was the cheapest quadcopter that has an SDK, allowing you to create your own applications on top of it, or extend and customise some sample applications. (AR.Drone 2.0 SDK)

I did some web searches on the programmability of the AR.Drone, and it seems that the biggest craze has faded a little bit. Many of the libraries for Python and Node.js are not seeing as active development as I would have thought, and groups like NodeCopter are not too active either.

It also seems that some active members have moved on to do greater things, like Fleye, a personal flying robot – the result of work by Laurent Eschenauer and Dimitri Arendt:

The Fleye Kickstarter campaign is still ongoing, with delivery scheduled for September 2016.

Ecshenauer is the author of the Node.js library ardrone-autonomy, which itself is based on node-ar-drone by Felix Geisendörfer.

There is also the python-ardrone library for Python, which I would prefer over Node.js.

I have tested both node-ar-drone and python-ardrone quickly with the AR.Drone 2.0, and it is an exhilarating experience to see your quadcopter come to life and rise up to hover, just by entering a few commands in the Node or Python REPL. (Just make sure you can quickly call the land() function, especially if you are experimenting indoors.)

There are also some Clojure libraries for controlling the AR.Drone, such as clj-drone and turboshrimp, but I’m not sure if I would want to add JVM to the mix.

My inspiration for programming drones actually got sparked by the O’Reilly Programming Newsletter, which featured a recent article by Greg on The Yhat Blog titled “Building a (semi) Autonomous Drone with Python“. It had a lot of tips about how to start with this kind of activity, and extending it to involve computer vision using OpenCV.

I intend to develop some applications that fly the AR.Drone automatically along the perimeter of a large object, such as a house, or along some predetermined line, like the side of a field. I hope to document some of the results in this blog.

If you’re interested in programming semi-autonomous drones, drop me a line with any ideas, tips, questions, or collaborations.

LCD-like banners in Python

Back in 1998 or so, I wrote a CD player application for Microsoft Windows in Borland Delphi. It was for a magazine tutorial article, and I wanted a cool LCD-like display to show track elapsed and remaining time. There was a good one available for Delphi, called LCDLabel, written by Peter Czidlina (if you’re reading this, thanks once more for your cooperation).

I’ve been thinking about doing a modern version of the LCD display component for several times over the years, and I even got pretty far with one for OS X in 2010, but then abandoned it because of other projects. A few years ago I did some experiments with the LCD font file and wrote a small Python app to test it.

My most recent idea involving simulated LCD displays is to create a custom component for iOS and OS X in Swift. For that, I dug up the most recent Python project and tried to nail down the LCD font file format, so that I could later use it in Swift. I decided to use JSON.

The LCD font consists of character matrices, typically 5 columns by 7 rows, each describing a character on the LCD panel. The value of a matrix cell is one if the dot should be on, and zero if it should be off. I decided to store each cell value as an integer, even if it is a bit wasteful – but it is easy to maintain, and if you squint a bit, you can see the shape of the LCD character.

So the digit zero would be represented as a 2-dimensional matrix like this:

[
[0,1,1,1,0],
[1,0,0,0,1],
[1,0,0,1,1],
[1,0,1,0,1],
[1,1,0,0,1],
[1,0,0,0,1],
[0,1,1,1,0]
]

The font consists of as many characters as you like, but you need to identify them somehow. In JSON, you can do this with one-character strings, where the sole character is the Unicode code point of the character. So, with some additional useful information, a font with just the numeric digits 0, 1, and 2 would be represented in JSON like this:

{
"name": "Hitachi",
"columncount": 5,
"rowcount": 7,
"characters": {
"\u0030": [
[0,1,1,1,0],
[1,0,0,0,1],
[1,0,0,1,1],
[1,0,1,0,1],
[1,1,0,0,1],
[1,0,0,0,1],
[0,1,1,1,0]
],
"\u0031": [
[0,0,1,0,0],
[0,1,1,0,0],
[0,0,1,0,0],
[0,0,1,0,0],
[0,0,1,0,0],
[0,0,1,0,0],
[0,1,1,1,0]
],
"\u0032": [
[0,1,1,1,0],
[1,0,0,0,1],
[0,0,0,0,1],
[0,0,0,1,0],
[0,0,1,0,0],
[0,1,0,0,0],
[1,1,1,1,1]
]
}

With the font coming along nicely, I wrote a Python script to exercise it, by printing a banner-like message:

import json

def banner(message):
    mats = []
    for ch in message:
        mats.append(characters[ch])

    output = ''
    num_rows = len(mats[0])
    num_cols = len(mats[0][0])

    for r in range(0, num_rows):
        for m in mats:
            for c in range(0, num_cols):
                if m[r] == 1:
                    output += 'X'
                else:
                    output += '.'
            output += ' '
        output += '\n'
        
    return output

font_data = None
with open('lcd-font-hitachi.json') as json_file:
  font_data = json.load(json_file)
  characters = font_data['characters']

print(banner('012'))

Running this Python script would print out a banner like this one:

.XXX. ..X.. .XXX.
X...X .XX.. X...X
X..XX ..X.. ....X
X.X.X ..X.. ...X.
XX..X ..X.. ..X..
X...X ..X.. .X...
.XXX. .XXX. XXXXX

By adding characters to the JSON font file it becomes possible to print text messages instead of just numbers:

X...X ..... .XX.. .XX.. ..... ..X..
X...X ..... ..X.. ..X.. ..... ..X..
X...X .XXX. ..X.. ..X.. .XXX. ..X..
XXXXX X...X ..X.. ..X.. X...X ..X..
X...X XXXXX ..X.. ..X.. X...X ..X..
X...X X.... ..X.. ..X.. X...X .....
X...X .XXX. .XXX. .XXX. .XXX. ..X..

But I think that a custom control for iOS in Swift would see most use in games or applications displaying numeric parameters like volume level, geographical coordinates or RPM.

If you want to learn Python, here is a good book:

Learning Python
Learning Python
by Mark Lutz

Learning Clojure

About one year ago I wrote a multi-part tutorial on Clojure programming, describing how I wrote a small utility called ucdump (available on GitHub).

Here are links to all the parts:

However, Carin Meier’s Living Clojure is excellent in many ways. Get it from O’Reilly (we’re an affiliate):
Living Clojure

My little tutorial started with part zero, in which I lamented how functional programming is made to appear unlearnable by mere mortals, and it kind of snowballed from there. Hope you like it and/or find it useful!

 

BusMonTRE-sovelluksen tulevaisuus

BusMonTRE on pysäkkiavustinohjelma älypuhelimille. Se on tehty tarjoamaan helppo tapa selvittää milloin kulkee bussi käyttäjää lähellä olevalta pysäkiltä Tampereen joukkoliikenteen toiminta-alueella (johon kuuluu Tampereen lisäksi useita ympäristökuntia). Tähän tarkoitukseen se käyttää Tampereen kaupungin tarjoamaa avointa dataa sekä Tampereen joukkoliikenteen ylläpitämää sinistä virtuaalimonitoria. BusMonTRE on tällä hetkellä saatavana iOS-, Android- ja Windows Phone -älypuhelimille. Tässä kirjoituksessa kerron hieman ohjelman taustoista sekä tulevaisuuden kehitysnäkymistä.

Virtuaalimonitori omaan tarpeeseen

Nettisivupalveluna toimiva sininen virtuaalimonitori on ollut käytössä Tampereella jo pitkään. Tavaratalot ja pienemmät liikkeet ovat käyttäneet sitä tiloissaan esimerkiksi näyttämällä isolla televisioruudulla asiakkailleen lähimmältä pysäkiltä kulkevien bussien tietoja. Kaikiilla Tampereen joukkoliikenteen pysäkeillä on sekä nimi että numerokoodi, ja virtuaalimonitorin saa helposti näyttämään yhtä tai useampaa haluttua pysäkkiä tämän numerokoodin avulla.

Sain ajatuksen BusMonTRE:sta jo vuonna 2011, mutta silloin en vielä ryhtynyt suunnittelemaan sovellusta, vaan käytin omissa laitteissani muutamaa valmiiksi leivottua nettiosoitetta, joilla sain kätevästi haluamani monitorinäytön nettiselaimeen. Osallistuin vuosina 2012-2013 useaan avoimen datan tapahtumaan Tampereella, ollessani muutenkin aktiivisesti mukana Tampereen startup-yhteisössä. Eräässä näistä tapahtumista Tampereen joukkoliikenteen edustaja antoi vahvan signaalin siitä, että heidän intresseissään ei ole kehittää sovelluksia itse, mutta he tarjoavat datan ilmaiseksi sovelluskehittäjien käyttöön.

Continue reading

Functional programming without feeling stupid, part 5: Project

In the last four installments of Functional programming without feeling stupid I’ve slowly built up a small utility called ucdump with Clojure. Experimentiing and developing with the Clojure REPL is fun, but now it’s time to give some structure to the utility. I’ll package it up as a Leiningen project and create a standalone JAR for executing with the Java runtime.

Creating a new project with Leiningen

You can use Leiningen to create a skeleton project quickly. In my project’s root directory, I’ll say:

lein new app ucdump

Leiningen will respond with:

Generating a project called ucdump based on the 'app' template.

The result is a directory called ucdump, which contains:

.gitignore   README.md    project.clj  src/
LICENSE      doc/         resources/   test/

For now I’m are most interested in the project file, project.clj, which is actually a Clojure source file, and the src directory, which is intended for the app’s actual source files.

Leiningen creates a directory called src/ucdump and seeds it with a core.clj file, but that’s not what actually what I want, for two reasons:

  • I want ucdump to be a good Clojure citizen, so I’m going to put it in a namespace
    called com.coniferproductions.ucdump.
  • My Git repository for ucdump also contains the original Python version of the application, which is in <project-root>/python, and I want the Clojure version to live in <project-root>/clojure.

Continue reading