There are 10 types of developers; those who work in a higher level and depend on abstractions, and those who work with the bits and bytes alongside the bare metal.
As a C# developer, I am one of the former, and I usually try to abstract everything I do. A few weeks ago, I figured out that I can use some of these abstraction techniques to solve “low-level” challenges too. To make a long story short, I was able to utilize LINQ in order to eliminate a rather very complicated bitwise operation.
The mission
A framework for communicating with FPGA cards (Field-Programmable Gate Array), where it sends and receives messages to and from those cards.
One of the framework’s abilities is to serialize and deserialize messages that were sent or received from a FPGA card to and from byte arrays, according to an application-specific protocol.
The structure and the parts each message is composed of vary, and are only determined at runtime.
This looks easy enough
Each message is composed of different parts with different lengths.
By serializing a message, we concatenate each part’s value according to its length in bits.
The following is an example of a message that is composed of 3 parts.
Message | ||
Part 0 | Part 1 | Part 2 |
001 | 1010 | 00111110000 |
3 Bits | 4 Bits | 11 Bits |
In this example, “Part 0” has the value of 0x1, “Part 1” has the value of “0xA” and “Part 2” has the value of “0x1F0”.
The framework doesn’t really care about the content of the messages, but rather cares for their structure.
When deserializing a message, we only need the structure or the schema of the message. For each message-part we need to specify its name, index (its order in the message) and length in bits.
I have therefore introduced the following class:
public class LayoutPart { public string Name { get; set; } public int Index { get; set; } public int BitCount { get; set; } public LayoutPart(string name, int index, int bitCount) { Name = name; Index = index; BitCount = bitCount; } }
When serializing a message, each message-part has to specify four properties. The three properties of LayoutPart, in addition to the Value of that part.
Here are the classes I have added:
public class MessagePart : LayoutPart { public int Value { get; set; } public MessagePart(string name, int index, int bitCount, int value) : base(name, index, bitCount) { Value = value } }
In reality, the content’s type wasn’t known beforehand, and it surly wasn’t always an int, but I decided to use an int in all of my examples to keep them simple.
The relevant API is really simple, and it has two methods; Serialize and Deserialize:
public interface IMessageSerializer { byte[] Serialize(IEnumerable<MessagePart> messageValues); IEnumerable<MessagePart> Deserialize(IEnumerabe<LayoutPart> layout, byte[] bytes); }
Bitwise operations: This is really ugly
So far so good 😀
After defining the API, I went on and started implementing it.
The following is an example of serializing the message from the example above
Name | BitCount | Value | Value in Bits | Value in Bytes |
---|---|---|---|---|
Part 0 | 3 | 1 | 001 | 00000001 |
Part 1 | 4 | 0xA | 1010 | 00001010 |
Part 2 | 11 | 0x1F0 | 00111110000 | 00000001 11110000 |
The byte array should look like this
Byte 2 | Byte 1 | Byte 0 |
---|---|---|
00000000 | 11111000 | 01010001 |
The first byte is composed of Part 0 (3 bits), Part 1 (4 bits) and the first bit of Part 2 (11 bits). The second byte is composed of bits 1-8 of Part 2, and the third byte is composed of bits 9-10 of Part 2 with 6 more trailing zeros to complete a byte of 8 bits.
I approached it using the standard-naive way: bitwise operations.
I needed the ability to extract bits from different bytes and bit indexes, then concatenate them together for composing a byte array.
Assuming we are working with System.Int32, how would I extract the n first bits? Easy enough!
value & ((1 << n) - 1)
And to extract the n last bits?
(value >> (32 - n)) & ((1 << n) - 1)
What about extracting n bits that are in the “middle”? This is starting to be more complicated.
((value >> startIndex) & ((1 << n) - 1));
I did a little refactoring and defined them in methods.
long ExtractBits(uint value, int n, int startIndex) => ((value >> startIndex) & ((1 << n) - 1)); long ExtractFirstBits(uint value, int n) => ExtractBits(value, n, 0); long ExtractLastBits(uint value, int n) => ExtractBits(value, n, 32 - n);
This looks a little bit better. I had to use System.Int64 and system.UInt32 for overflow/underflow safety when performing bitwise operations.
Most importantly, this is an over simplified version of what I needed to do. I also had to consider:
- Concatenating the results
- Messaging parts of length exceeding 8 bytes (sizeof(long)). Current implementation won’t work
- Transfer the results into a byte array
- Implementing deserialization, which is even harder
I went on and finished the rest of the implementation. It took me a full day of work (implementing + testing), and it wasn’t easy, at least not as I thought it would be. I kept telling myself that there has to be a better way.
The code wasn’t maintainable, even with the 90% code coverage I had of tests. Not by the version of me from next month or even from next week, and of course not by another dev from my team. This code will become an unmaintainable, ‘don’t touch’, legacy code as soon as I’ll push the changes to the repo and mark the task as ‘Done’.
This code is ugly.
Oh, much better
I took a step back and tackled the problem from a different angle. I figured that my goal is pretty simple; I have a collection of values and all I have to do for the Serialize method is to:
- Project each message part into bits
- Concatenate the bits of all values
- Project the result into a byte array
And all I have to do for the deserialize method is to:
- Project the byte array into bits.
- Slice them according to the length of each message part
- Project each slice into the corresponding message part object
I quickly thought about LINQ! Isn’t this one of the scenarios that are easily solved by LINQ? I went on and explored this idea further.
I decided to use the Type-Safe Enum Pattern to represent a Bit (Bit.cs). I also used some LINQ operations from Jon Skeet’s MoreLinq library.
For the Serialize method I started by writing the following extension methods:
//Projects a byte into IEnumerable<Bit> public static IEnumerable<Bit> ToBits(this byte @byte) { for(var index = 0; index < 8; ++index, @byte >>= 1) { yield return 1 == (@byte & 1); } } //Projects IEnumerable<Bit> into a byte public static byte ToByte(this IEnumerable<Bit> bits) => bits.Reverse().Aggregate((byte)0, (currentValue, bit) => (byte)((currentValue << 1 | bit) )); //Projects a byte[] into IEnumerable<Bit> public static IEnumerable<Bit> ToBits(this IEnumerable<byte> bytes) => bytes.SelectMany(ToBits);
The following are the steps for the Serialize method:
- Project each value into bits, according to the BitCount property of its message-part object
- Project the value into bytes
- Project the bytes into bits
- Group the bits into a group of 8-bit each
- Project each group into a byte
- Project the IEnumerable of bytes into a byte array
byte[] Serialize(IEnumerable<MessagePart> messageParts) { return messageParts.OrderBy(part => part.Index) .SelectMany(part => BitConverter.GetBytes(part.Value) .ToBits()) .Batch(8) .Select(slice => slice.ToByte()) .ToArray(); }
(Did you notice the bug? I’ll discuss it later)
For implementing the deserialize method, I created the following extension methods:
//Projects an array of 4 bytes into an int public static int ToInt(this byte[] bytes) => BitConverter.ToInt32(bytes, 0); //Projects a bunch of 8-bit IEnumerables into an IEnumerable of bytes public static IEnumerable<byte> ToBytes(this IEnumerable<IEnumerable<Bit>> bits) => bits.Select(ToByte);
The following are the steps for the deserialize method:
- Project the byte array into IEnumerable of bits
- Slice the bits according to each message-part length and order
- Project each slice into 4 bytes (sizeof(System.Int32))
- Project each 4 bytes into a System.Int32
IEnumerable<MessagePart> Deserialize(IEnumerable<LayoutPart> layout, byte[] bytes) { var bits = bytes.ToBits(); return layout.OrderBy(part => part.Index).Select(part => { var slice = bits .Take(part.BitCount) .Batch(8) .ToBytes() .Pad(sizeof(int), (byte)0) .Take(sizeof(int)) .ToArray() .ToInt(); bits = bits.Skip(part.BitCount); int value = BitConverter.ToInt32(slice, 0); return new MessagePart(part.Name, value); }).ToArray(); }
Looks better, Eh?
Testing the implementation
It isn’t always easy to debug LINQ operations, thus in order to debug, test and visualize my examples, I used OzCode which is an awesome debugging extension for Visual Studio. I installed the EAP (Early Access Program) edition, which includes amazing LINQ debugging capabilities, and put it to work.
This is the code according to the example above
var layout = new[] { new LayoutPart("Part 0", 0, 3), new LayoutPart("Part 1", 1, 4), new LayoutPart("Part 2", 2, 11), }; var message = new[] { new MessagePart("Part 0", 0, 3, 1), new MessagePart("Part 1", 1, 4, 0xA), new MessagePart("Part 2", 2, 11, 0x1F0), }; var serializer = new MessageSerializer(); var bytes = serializer.Serialize(message); var messageParts = serializer.Deserialize(layout, bytes);
I used OzCode’s LINQ analyser for visualizing and tracking the LINQ operations.
Here we can see what the actual bits are, and their order in the concatenated message-parts.
But according to the BitCount property of Part 0, its value should be projected only to 3 bits.
Fixing the bug:
byte[] Serialize(IEnumerable<MessagePart> messageParts) { return messageParts.OrderBy(part => part.Index) .SelectMany(part => BitConverter.GetBytes(part.Value) .ToBits() .Pad(part.BitCount, Bit.Off) //Added .Take(part.BitCount)) //Added .Batch(8) .Select(slice => slice.ToByte()) .ToArray(); }
I added Pad and Take to be sure I project exactly the needed count of bits.
This looks much better.
Here we can see the value of each byte:
Testing the Deserialize method, we can see the value of each message part:
Summary
Even though actions that require bitwise operations are usually considered “low-level”, I was able to use LINQ, which is considered a “high-level” API, to perform the same thing. This kind of abstraction saved me time and effort, and made my code more maintainable, readable and clean. I suggest to use abstractions whenever you can, and whenever it make sense.
What I really learnt though, is that we can use skills we acquire in one domain to solve challenges in another.
There is a saying that polyglot programmers write better code than those who specializes only in one language. I think it is yet another example of the same idea – each skill you acquire, will make you better at what you do, regardless how unrelated that skill might look like at that time.
Note: this post is published also at OzCode’s blog.