Transforming the CLR to numerical applications, and the nullspace that results.

Monday, January 21, 2008

Swapping byte encoding order of multi-byte numeric types in .Net (C# and VB.Net) to convert from little-endian to big-endian and back

I was looking to replace the mechanism of the swapByteOrder(Double) method in GeoAPI, since it used the BitConverter.GetBytes()/Array.Reverse()/BitConverter.ToDouble() approach. I had a feeling that this was horribly slow, given that the Array.Reverse() made me take a GC hit in a tight loop (which is how this function is often used).

Having just come off of reworking some NTS code which manipulated Double values using the bits in the number, I had gotten a much better feel of the IEEE 754 double-precision 64 bit layout. I could use BitConverter.DoubleToInt64Bits() without going pale. So, my first step was to do this - convert the Double to an Int64. I was going to use the approach I employed swapping the bytes of an Int32 (bit-shifting and or'ing) to handle Int64 values, when I had the idea that I'd check if anyone had a better idea of reversing the bytes of an Int64.

So, I was Live Searching1 for various methods of doing byte-order reversal in multi-byte numeric types, and came across this post: It surprised me that people were using pointers and Array.Reverse() over bit-shifting. Someone noted that the pointer version would be more highly performing, which was likely quite true, but that forces you to add the /unsafe switch to the C# compiler, which turns off verification checking and throws partial trust out the window.

So, I decided to test it.

The Methods

I found 4 methods to swap bytes in pure managed code, and created one function per method which, when given an Int64, returns that Int64 with the bytes order reversed. These might not be the best methods for all data, but it is a good sampling of what is out there.

The 4 methods and corresponding functions are:

Bit ShiftingC#:
private static Int64 swapBitShift(Int64 value)
UInt64 uvalue = (UInt64)value;
UInt64 swapped = ((0x00000000000000FF) & (uvalue >> 56)
(0x000000000000FF00) & (uvalue >> 40)
(0x0000000000FF0000) & (uvalue >> 24)
(0x00000000FF000000) & (uvalue >> 8)
(0x000000FF00000000) & (uvalue << 8)
(0x0000FF0000000000) & (uvalue << 24)
(0x00FF000000000000) & (uvalue << 40)
(0xFF00000000000000) & (uvalue << 56));
return (Int64)swapped;
Function swapBitShift(ByVal value As Int64) As Int64
Dim uvalue As UInt64 = CULng(value)
Dim swapped As UInt64 = ((&HFF) And (uvalue >> 56) _
Or (&HFF00) And (uvalue >> 40) _
Or (&HFF0000) And (uvalue >> 24) _
Or (&HFF000000) And (uvalue >> 8) _
Or (&HFF00000000) And (uvalue << 8) _
Or (&HFF0000000000) And (uvalue << 24) _
Or (&HFF000000000000) And (uvalue << 40) _
Or (&HFF00000000000000) And (uvalue << 56))
Return CLng(swapped)
End Function
Pointer Byte Swapping
private static Int64 swapPointerLoop(Int64 value)
Int64 newValue = 0;
Byte* from = (byte*)&value;
Byte* to = (byte*)&newValue;

Int32 len = sizeof(Int64);

for (int i = 0; i < len; i++)
to[len - i] = from[i];

return newValue;


private static Int64 swapIpAddress(Int64 value)
return System.Net.IPAddress.HostToNetworkOrder(value);
Function swapIpAddress(ByVal value As Int64) As Int64
Return System.Net.IPAddress.HostToNetworkOrder(value)
End Function
private static Int64 swapArrayReverse(Int64 value)
Byte[] bytes = BitConverter.GetBytes(value);
return BitConverter.ToInt64(bytes, 0);
Function swapArrayReverse(ByVal value As Int64) As Int64
Dim bytes As Byte() = BitConverter.GetBytes(value)
Return BitConverter.ToInt64(bytes, 0)
End Function

I ran each test 20 times and averaged the data. I was using the test computer throughout which simulates a system experiencing taxed resources2.

The Data

1,000 Int64s graphed on a linear scale 1,000 Int64s graphed on a logarithmic scale
10,000 Int64s graphed on a linear scale 10,000 Int64s graphed on a logarithmic scale
100,000 Int64s graphed on a linear scale 100,000 Int64s graphed on a logarithmic scale

Some Conclusions

It looks like bit shifting wins out, although it's probably negligibly more efficient than IPAddress.HostToNetworkOrder. The performance of this method surprised me - not that I thought that it was implemented less efficiently, but that the method call would cost more than it does. I was further surprised to see that IPAddress.HostToNetworkOrder(Int64) makes 6 method calls per invocation, in order to delegate flipping the bytes to IPAddress.HostToNetworkOrder(Int32) twice which further delegates to IPAddress.HostToNetworkOrder(Int16) twice each. I'm not sure how much of this the JIT compiler inlines, as there is some difference between the x86 code which it generates for IPAddress.HostToNetworkOrder and the bit shifting implementation - but whatever the difference, it isn't much.

Another surprise was the pointer method. It was much slower than I thought. Perhaps it was the implementation (which, granted, is naive when we know we are looking at arrays of bytes). Perhaps it was some overhead which the C# compiler puts in when running with the usual debug settings of Visual Studio. Perhaps it was the permission asserts which the security portion of the runtime has to put on the stack when running unsafe code. If anything, since I've considered using unsafe code to speed up numeric operations, this exercise was helpful in showing me that there are some pitfalls in using it which should be investigated beforehand.

Finally, the Array.Reverse method was much worse in terms of performance that I thought it was going to be. It was a full order of magnitude worse. With the two best methods - bit shifting and IPAddress.HostToNetworkOrder so accessible, I can think of no reason to continue to use this method.

1) Yes, I'm giving it another shot, although so far, it's... not bad, but not great.

2) Ok, it doesn't simulate it exactly, but I didn't want to stop using the computer. The biggest impact will be memory usage, and make Array.Reverse() look worse than if I hadn't been using it. This is probably closer to what we want to see, since memory usage from other processes on a machine is rarely zero.

Wednesday, January 16, 2008

SharpMap v2.0, GeoAPI v2.0, Proj.Net and NTS v2.0 progress

The next release of SharpMap is taking longer than imagined, since it wasn't quite as easy to transform all those JTS javaisms to .Netisms, especially when trying to be considerate of the whole functional-construct infusion that is LINQ, IronPython and F#. There is quite a legacy in numerical computing looking back through the annals of Python as well as the functional language side of the house, so no doubt ideas (and code) will step over from all that work into what could be fairly decent geometry stack in .Net: SharpMap, GeoAPI, Proj.Net and NTS.

However, recent check-ins to the v2.0 branches of GeoAPI.Net and NTS have begun to compile again, and the basic functionality should be up and running. SharpMap v2.0 Beta 2 will be released once all tests are passing, code coverage is up to 60% or so.

Tuesday, January 15, 2008

Quick Intro

This is codekaizen. I work on SharpMap, GeoAPI.Net, NPack, and, in order to support SharpMap, a (temporary?) fork of Proj.Net.

I have had to learn a lot about linear algebra, numerical methods and their implementations, computational geometry and associated algorithms, spatial data structures and indexing, limits of machine representation of vector components and real numbers, and the architecture of "vector processors" like the SSE instruction set or GPUs to speed up, sometimes by an order of magnitude, the core computation primitives in reasoning and displaying spatial data.

With all this swirling around in my head, I need an outlet. So it is upon you, oh gentle reader, that I sling the clay of awareness and hope to mold it into the art of understanding and craft of proficiency. I'll admit that I don't really have a deep and abiding grasp of the math. I also don't have breadth in the application of the algorithms and data structures to a variety of situations. Nonetheless, I do have a firm grasp of the Microsoft CLR v2.0 and the underlying machine, and I need to explore how to apply the CLR to the problems of spatial reasoning, data visualization, and various numerical and heuristic methods to assist in finding patterns in data and helping to arrange and render the visualizations. As the graphics capability of commodity hardware becomes powerful enough to easily render any 2D and most 3D scenes, and processing becomes more distributed and abundant, real-time, everywhere-available applications which help us see and interpret our work and our lives more efficiently, more comprehensively and with greater insights. Computation is a natural extension of our minds, and we are compelled to greater heights of understanding - our .Net applications must, and will, accommodate. I hope to fill in the gaps between numerical and algorithmic theory and practice on the way.