Hashtables in Mathematica

I fondly think of Mathematica as a "kitchen sink language": other than that proverbial kitchen sink, it has functions for pretty much anything you can think of.

Why then does it not have a hashtable data type?

It turns out that it doesn't need one. Hashtables are built into the language at a fundamental level. Just start typing:

 h[foo] = 1;
 h[bar] = 2;
 h[foo]
 1
 h[bar]
 2

 

And you have a hashtable!

It's not quite that simple though. What if you want to list all the keys used in the hashtable? This function (from a handy StackOverflow answer) takes care of that:

 keys = DownValues[#][[All, 1, 1, 1]] &;
 keys[h]
 { bar, foo }

 

Recently I was playing with NOAA earthquake data in Mathematica provided in the form of a TSV (Tab Separated Values) file. Mathematica easily parses it into a list:

 ed = Import[NotebookDirectory[] <> "EarthquakeData", "TSV"]
 Take[ed, 5] // TableForm
 ID FLAG_TSUNAMI YEAR MONTH DAY ...
 8204 Tsu 2009 1 3 ...
 8211 Tsu 2009 1 3 ...
 8210 2009 1 8 ...
 8250 Tsu 2009 1 15 ...
 ...

 

This was a good start but the data wasn't in a very useful form. What I wanted was to be able to address the data by column name and row number, so I wrote this helper function:

 MakeHash[hash_, a_] := Module[
  {keys = First[a]},
  Clear[hash];
  Do[
   hash[keys[[i]], j - 1] = a[[j, i]],
   {i, 1, Length[keys]}, {j, 2, Length[a]}];
  hash[Dim] = {Length[a] - 1, Length[keys]};
  hash[Rows] = Length[a] - 1;
  hash[Cols] = Length[keys];
  hash[Keys] = keys;
  ]

 

The first parameter is the name of the hash to create, the second is the array to parse (assuming the first row represents column headers). It's now easy to access the elements you want.

 MakeHash[ehash, ed]
 ehash["ID",2]
 8211

 

You'll notice MakeHash adds some convenience entries in the hashtable. I even included one for keys, despite the function we defined earlier on. It ensures MakeHash is self contained and also deals with a limitation of the keys function as it stands. As we're dealing with a two dimensional hashtable, the keys function considers each key (i.e. ID,1 and ID,2 etc.) as distinct, so returns way too many of them.

 Length[ehash[Keys]]
 46 (* expected *)
 Length[keys[ehash]]
 2074 (* woah! *)
 (* Let's fix this by eliminating dupes with Union *)
 keys = Union[DownValues[#][[All, 1, 1, 1]]] &;
 Length[keys[ehash]]
 50

Why 50 keys and not 46? Because MakeHash added four more: Dim, Rows, Cols, and Keys.