Protocol Design

Mike Radin, 2014/06/19

Introduction

Bashing other people's ideas and ideals is always fun, but not very productive. This article will contribute constructive thoughts on binary protocol design. In the long-standing tradition of "not invented here," we'll roll our own⁷ binary messaging protocol. It'll be amusing and educational along the way.

Sweet Nothings

The network listener will be the first piece of code an incoming message will encounter, for performance reasons a separate, likely multi-threaded, parser will manage conversion of incoming bytes into usable message objects. Listening for incoming traffic is a synchronous operation, to make it efficient, the critical section should be short. Especially with streaming parsers, message size is the primary piece of the data that should be sent. Let the first eight bytes¹¹ be the size and type of the message. With just this information the listener can hand off a chunk of the buffer to a parser without doing significant processing itself. Feel free to use a short instead if it will be adequate. For the first take, this is not a bad header. These 8 or 4 bytes will then be followed by the message body. Remember to include the header size on the overall message length.

Let's consider simple messages that are just collections of primitives. This will require nothing more than a few function calls in java.nio.ByteBuffer or System.IO.BinaryWriter & BinaryReader in C#.

using System;
using System.IO;

namespace com.projecttree.quick {
    public class Message {
        protected byte[] _bytes;
        protected MemoryStream _wrapper;
        protected BinaryReader _reader;
        protected BinaryWriter _writer;
        protected int _type = -1;

        public Message(int size, int type){ // serializing constructor
           _bytes = new byte[size];
           _wrapper = new MemoryStream(bytes);
           _reader = new BinaryReader(wrapper, new System.Text.ASCIIEncoding());
           _writer = new BinaryWriter(wrapper, new System.Text.ASCIIEncoding());
           _writer.BaseStream.Seek(0, SeekOrigin.Begin);
           _writer.Write(size);
           _writer.BaseStream.Seek(4, SeekOrigin.Begin);
           _writer.Write(type);
        }

        public Message(byte[] msg) { // deserializing constructor
           _bytes = new byte[msg.Length];
           Array.Copy(msg, 0, _bytes, 0, msg.Length);
           _wrapper = new MemoryStream(bytes);
           _reader = new BinaryReader(wrapper, new System.Text.ASCIIEncoding());
           _writer = new BinaryWriter(wrapper, new System.Text.ASCIIEncoding());
        }

        public int Size { get { return _bytes.Length; } }
        public int Type {
            get {
               if(type < 0) {
                  _reader.BaseStream.Seek(4, SeekOrigin.Begin);
                  type = _reader.ReadInt32();
               }
               return type;
            }
        }
    }
}

package com.projecttree.quick;
public class Message {
   private byte[] bytes;
   private java.nio.ByteBuffer wrapper;
   private int _type = -1;

   public Message(int size, int type) {
       bytes = new byte[size];
       wrapper = ByteBuffer.wrap(bytes);
       wrapper.putInt(0, size);
       wrapper.putInt(4, type);
   }

   public Message(byte[] msg) {
       bytes = new byte[msg.length];
       System.arraycopy(msg, 0, bytes, 0, msg.length);
   }

   public int getSize() { return bytes.length; }
   public int getType() {
      if(_type < 0) { _type = wrapper.getInt(4); }
      return _type;
}

The first thing to get out of the way is arraycopy call in the constructor. Remember, arrays are pointers and we might want to preserve the actual message. This call can be avoided depending on the way the network code reads incoming data before it attempts to parse it into messages. Furthermore, such a constructor facilitates lazy decoding. There are likely to be multiple threads processing messages, there is no reason to burden the network handling thread with parsing. All that thread needs to do is produce byte arrays as messages into some message queue as quickly as possible. At some later point when the message is received by code that cares about the contents, individual fields will be decoded on demand and cached for future use if needed. If a message is only read once, then there is no reason to do the assignment you see in the Type accessor above. At the same time, fields that are present, but not used will never be read and there will be no translation overhead for them. Lazy-loading an int might not improve performance much, but with long strings it will. For that matter, if a micro-optimization is desired, it may make sense to pre-load all primitive numeric fields in the constructor rather than paying for the if on every read.

While this column does not explicitly discuss serializer design, it is apparent from the code that messages are not meant to be deserialized by external entities. There are different implementations of messaging libraries that use the Factory pattern for (de)serializing and parsing.

Binary math

Only rarely does the content of any particular computer science class from college come in handy, this is one of those times¹. Decoding boolean and integer values is straight forward as is translating standard decimals. Below is a sample using binary shift operations, ByteBuffer and BinaryReader & BinaryWriter would accomplish the same thing. This kind of fields can be sent as is once converted into binary representation. With complex fields things are slightly more complicated.

protected static int toInt(byte[] bytes){
    if(bytes.length != 4){ return -1; }
    int value = (int)(0xFF & bytes[3]);
    value = (int)(((0xFF & bytes[2]) << 8) | value);
    value = (int)(((0xFF & bytes[1]) << 16) | value);
    value = (int)(((0xFF & bytes[0]) << 24) | value);
    return value;
}

protected static byte[] toBytes(int value){
    byte[] bytes = new byte[4];
    bytes[0] = (byte)(0xFF & (value >> 24));
    bytes[1] = (byte)(0xFF & (value >> 16));
    bytes[2] = (byte)(0xFF & (value >> 8));
    bytes[3] = (byte)(0xFF & (value));
    return bytes;
}

Dates require several considerations. The simple solution for sending YYYY/MM/DD would be four shorts. Compressing a date into a single unsigned int is also a possibility where the upper limit of 30,000-odd¹² years is not a concern, December 31, 32,767 is 80-00-04-CF. Time can be encoded the same way as long as there is agreement on resolution be it seconds, 100ns ticks or anything else. Lastly, time zones, it is increasingly difficult to avoid accounting for time zones in connected applications. Whether it's a toy order that came from across borders or a stock trade on a foreign exchange, GMT offset is needed¹⁰.

protected static byte[] toBytes(short year, byte month, byte day){
   byte[] bytes = new byte[4];
   bytes[0] = (byte)(0xFF & (year >> 8));
   bytes[1] = (byte)(0xFF & year);
   bytes[2] = (byte)(0xFF & --month); // Java months are 0-indexed, December = 11
   bytes[3] = (byte)(0xFF & day);
   return bytes;
}

protected static java.util.Date toDate(byte[] bytes){
    if(bytes.length != 4){ return null; }
    byte day = (byte)(0xFF & bytes[3]);
    byte month = (byte)(0xFF & bytes[2]);
    short year = (short)(0xFF & bytes[1]);
    year = (short)(((0xFF & bytes[0]) << 8) | year);

    java.util.Calendar c = java.util.Calendar.getInstance();
    c.set((int)year, (int)month, (int)day); return c.getTime();
}

Strings present two problems. Length should be encoded in the first field, it could be an int or a short, depending on what the author of the protocol considers to be reasonable. Please use uint and ushort, text with fewer than zero characters is not readable on modern computer systems. Following the size should be the contents of the text encoded in some reasonable fashion. "Reasonable" probably means ASCII or something similar based on locale, for example, KOI8-R. UTF8 is unnecessary, unless it is. If using ASCII and feeling especially hardcore, rob the least-significant bit for something, just like paper tape⁵ or steganography⁶.

protected static byte[] toBytes(String text){
    byte[] textbytes = text.getBytes(java.nio.charset.Charset.forName("US-ASCII"));
    byte[] size = toBytes((short)textbytes.length);
    byte[] bytes = new byte[textbytes.length + 2];
    bytes[0] = size[0];
    bytes[1] = size[1];
    System.arraycopy(textbytes, 0, bytes, 2, textbytes.length);
    return bytes;
}

protected static String toString(byte[] bytes){ return null; }

Lists work in the same general way as strings, the first field is size, followed by the stream of content. Note that if the size field has value zero then there is no reason to reserve any space for content, the message deserializer will just skip it. This also holds true for lists of complex objects.

enums, easy - just send a short. The designer of the protocol is in complete control of the how the message is interpreted. In the end these are all just bytes, the message structure defines how they are to be interpreted.

The special cases of null and invalid values just need to be agreed upon. There is no need for a general solution, just the one that works in the given case. Primitives aren't even allowed null values, so developers can agree that when sending an order size, -1 means it's missing. If we're talking about stock orders, negative values could mean a short, but it's also possible to send trade direction in a separate field, so why not.

Cheap things are not free

Some C# developers will argue that structs should be used for messages. They live in the stack³ so they are cheap to create, they are passed by copy so you don't have to deal with concurrency-related synchronization issues. That's all great, until the whole "passed by copy" part. Memory operations are the second slowest action a computer can perform, followed by disk, network, and disc operations. Copying a couple of ints is one thing, a full blown message is quite another. A meaningful message can be large, copying kilobytes every time something has to be done to it will become expensive very quickly. In addition, consider that anything except the basic primitives will go on the heap anyway, like arrays of primitives, unless it is specifically forced not to with the fixed keyword. That of course requires the use of unmanaged code and while the author agrees that all software should be written in assembly or better yet in microcode², it's just unrealistic as it would dramatically limit the pool of employable programmers. Strings of course are another issue and should be interned⁸ anyway, but maybe not everywhere⁹. Retaining things in the stack is senseless since there are limits to how much memory is available in that space¹³. Past that, there is auto-boxing which will load the object into the heap anyway unless the code is really careful to avoid such a possibility. Debugging threading issues in a DI⁴ application is fun, debugging manual memory allocation is probably the close second.

Depending on the type of serializer, using the stack to store messages immediately prior integration into the application data layer is reasonable. Specifically, when the message comes in from the wire this serializer will read each field and set it in the instance of a struct the message represents. There is no arraycopy step. After this the message struct can be enqueued for some message processor to read. That will in turn update the application state before discarding the message.

There is a different possible optimization using static methods for read-once messages.

package com.projecttree.quick;
    public static class SomeReadOnceMessage : BaseStaticMessage {
        public static int getSomeField(final byte[] data){ return getInt(data, 8);
    }
 }

The above assumes some function that will return an int value from an array using the provided offset. Structure like this allows the individual message implementation to be very light decoration on top of the byte array that represents them. The only instance variable is the byte array, all functions are static. If a message will be read more than once, it starts to make sense to cache the value as demonstrated earlier.

I!=O

It is instrumental to delineate the difference between messages and commands. Messages are received, deserialized; responses, commands or requests are sent, serialized. Because of this difference objects on either side only need to include serialization or deserialization code. This also creates a cleaner, object graph and simpler use-case diagrams.

Addendum

Having a neat message structure and a well-organized class inheritance tree is one thing. Achieving high throughput on the wire is quite another. An article concerning serializer design is next. In the meantime, find the source to all the code discussed here. Interesting contributions to the code will be integrated into the library, snarky remarks will be reposted and commented on.

Protocol design methods described here will result in a fairly rigid message structure that does not lend itself to easy portability or extensibility that XML messaging might provide. There is a cost associated with designing something like this &endash; modifying the client and server libraries when changes are made. Protocol versioning may be possible depending on validation rules the messages implement. For example if a specific message does not require the incoming byte array to be of a set size then new fields can be added to the end of the message without modifying legacy libraries where new functionality is not required. The speed and simplicity of these protocols however far outweighs any theoretical benefit that extensibility might provide.

Finally it would be inappropriate not to mention reflection. There is a large cadre of developers who believe that using reflection in messaging libraries is acceptable. If there was such a thing as a programmers license, it would get revoked for such things. There is example of an optimized dynamic message processor. WCF generates code to handle XML messages the first time they are see, but even here the suggestion is to pre-compile this code.

References

OpenCourseWare can help you brush up.
For some people, assembly is a high-level programming language.
The structs live in the stack; these developers live in lala land.
Dependency injection represents the death of computer programming as a science.
Used for parity.
Used for security.
On the other hand, there is the Protocol Buffers project, but whatever.
In Java and C#.
Reading the "Performance Considerations" section of the C# link in reference 8 brings up an interesting point about how interned strings may stick around in memory for a long time. This will cost memory, but in modern systems it should be a reasonable tradeoff.
This is for illustrative purposes only, there should be a class-level instance of Calendar to deal with such things.
Here and elsewhere in this series of articles, unless otherwise specified, int is considered to be 4 bytes in size.
No unsigned types in Java, until Java 8, so 32,767, not 65535.
As of .net4, the default stack size if 1MB per thread.