Mathematica, Pi, Mandelbrot, OpenCL, and More

As usual I was looking for something completely different (pages devoted to the game Hex, but that's another story) and came across this intriguing finding by Dave Brol.

Dave was investigating whether the "neck" of the Mandelbrot set was infinitely thin, and in the process realized that pi was showing up in the number of iterations he was performing. What the?!

Curious, I whipped up a quick Mathematica function to test his observation (and as another excuse to play with Mathematica :-)

First I define Mandelbrot's function, here called mandelIter. Nice, I can actually use complex numbers in Mathematica. This function is usually written to include a stopping condition based on a max number of iterations. I chose to omit it here since the points we're investigating are supposed to eventually escape the set.

The point in question is (-0.75 + x i) where x -> 0. The table shows what occurs for ever more "precise" (i.e. closer to 0) values of x. The right hand column lists pairs of numbers. The left one is the time it took to execute mandelIter in seconds. The right one is the number of iterations before escaping the set.

Notice anything special regarding those iteration counts? Yup, we're re-discovering pi.

The other thing you may have noticed is that it's going to take a long long time to calculate pi this way. Each subsequent finding takes 10 times longer to compute.

Naturally, I wondered how to make mandelIter faster.

Mathematica includes a Compile[] function, which basically turns a function into an optimized version, though it's still in pcode, not machine code.

My first attempt was just to compile the Abs[] and z = z^2 + c functions, but this resulted in trivial speed improvements. The solution was to let Mathematica compile the whole function.

Much faster! Unfortunately, it runs into problems and never returns when I go to 10^-8:

My assumption is that compilation uses machine-level precision instead of the arbitrary precision that's Mathematica's default. Since we're squaring very small numbers, we're rapidly exceeding this precision. It's pretty cool that Mathematica realizes this and switches back.

Still, I was hoping for more.

One of the features new to Mathematica 8 is the ability to leverage your computer's GPU, i.e. your graphics processor. Modern graphics chips actually have a massive amount of computing power and are highly parallelized. You typically program these chips with either CUDA (which is NVidia specific) or OpenCL (an industry standard). As of Snow Leopard, Apple includes OpenCL drivers with its Macs.

Re-implementing the algorithm was pretty easy: you're writing C code with an OpenCL flavor. Previous versions calculated each row sequentially. This version calculates them in parallel. After all, there are 48 (!) cores on my laptop's GPU :-)

mandelCLsrc takes 5 arguments: two input vectors, one output vector, the length of the vectors, and the max iterations we want to compute. Then I compile it with OpenCLFunctionLoad[]. Notice the "8" parameter: that means we want to leverage 8 cores to process this function.

If you're doing any OpenCL programming, definitely use OpenCLFunctionLoad[]'s ShellOutputFunction -> Print option, it'll help you find errors. While I'm on the subject: if Mathematica stops compiling OpenCL functions or behaves strangely, just restart it. OpenCL gives you very low level memory access, it's not hard to corrupt things.

So how did this solution fare? Disappointingly :-(

I define my two input vectors, xt and yt, and one output vector nt. Sadly, while the first few iterations are promising, it appears that we quickly run out of precision here too. With one thousandth as input, the function should return in roughly 31,420 iterations. Clearly it does not.

I tried switching from floats to double, even though my chip apparently doesn't support them, this was no help.

Where to go from here? Mathematica's ParallelEvaluation[] function, which splits jobs across multiple Mathematica kernel, won't really help. Another option could be to switch to integer arithmetic for Mandelbrot's function though I'm not convinced this would improve Mathematica's speed. There's always a chance that I may have done something wrong in my OpenCL code, or that further optimizations are possible in the Mathematica code. A final option that I couldn't get to work was running the OpenCL code on my CPU instead of my GPU. This is theoretically possible and would leverage the CPU's ability to handle higher precision doubles.

Implementation in a different language may be the ultimate solution. C, C++, or Java make most sense to me given they have excellent optimizing compilers but, just for kicks, I picked one of my favorite programming languages: ruby.

Here are the results running on ruby 1.9.2 (ruby 1.8.7 is about 1/2 as fast):

1,          3,        2.0e-05
1/10,       33,       4.4e-05
1/100,      315,      0.000312
1/1000,     3143,     0.003746
1/10000,    31417,    0.031083
1/100000,   314160,   0.300709
1/1000000,  3141593,  3.02154
1/10000000, 31415927, 30.608127

To my surprise modern ruby is significantly slower than compiled Mathematica at this task. I would have expected ruby to fare better.

Interestingly, both ruby and Mathematica bog down significantly when computing 10^-8. They don't return in a meaningful timeframe, i.e. I killed both calculations after 30 minutes. Presumably this is the point where both switch to arbitrary length floating point routines in software.