Objects from C# to Python [3/3]

Protocol Buffers (or Protobuf) are the serialization format developed by Google. One of the key advantages of this format is its availability in almost any language, like ZeroMQ.

To serialize structured data in this format, you need to start by defining a message type. This is usually done in a .proto file, but in C# it is also possible to simply add attributes to an existing class, similarly to an XML serializer.

In this post we will cover a simple data structure with three fields:

  • TimeStamp: a 64 bits integer used to store the date and time as ticks
  • Key: a string used to describe the content of the message
  • Data: a double storing the data itself

It is easy to see how this can be used to stream financial market data for example, with only a few changes. Protocol Buffers also support more complex message types, such as lists and messages of messages.

Publishing in C#

For the C# serialization, we will be using the protobuf-net library. Build it and add the full net 3.0 version as a reference.

Then let’s define our class:

using System;
using ProtoBuf;

namespace CSPublish
{
 [ProtoContract]
 class TimestampData
 {
  [ProtoMember(1)]
  public Int64 Timestamp { get; set; }

  [ProtoMember(2)]
  public string Key { get; set; }

  [ProtoMember(3)]
  public double Data { get; set; }
 }
}

The only constraint is to have a unique number for each field.

After updating our previous ZeroMQ publish code, we can send a serialized class rather than a string:

using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;

using ZeroMQ;
using ProtoBuf;

namespace CSPublish
{
 class Program
 {
  static void Main(string[] args)
  {
   bool publish = true;
   Task.Run(async () =>
   {
    Random random = new Random();
    using (var context = ZmqContext.Create())
    {
     using (var socket = context.CreateSocket(SocketType.PUB))
     {
      socket.Bind("tcp://127.0.0.1:5000");
      using (var stream = new MemoryStream())
      {
       while (publish)
       {
        var data = new TimestampData
	{
	 Timestamp = DateTime.Now.Ticks,
	 Key = "SYM",
	 Data = random.NextDouble()
	};
        Serializer.Serialize(stream, data);
        byte[] bytes = stream.ToArray();
	socket.Send(bytes);

	stream.SetLength(0);
	await Task.Delay(1);
       }
      }
     }
    }
   });
   Console.ReadLine();
   publish = false;
  }
 }
}

We use a random generator for the data, and serialization is done as usual in a memory stream.

Subscribing in Python

In Python we cannot simply decorate a structure or class: we need to define a .proto file. Following our C# class definition, let’s create one as follow in a TimestampData.proto file:

package pysub;

message TimestampData {
	required int64 Timestamp = 1;
	required string Key = 2;
	required double Data = 3;
}

The package name can be really anything. Here we use pysub as our message will be used in the Python subscriber.

We can now compile this message using protoc, which you can download from https://code.google.com/p/protobuf/downloads/list (I use the win32 binary). Open a command prompt in your working directory and enter:

path\to\protoc.exe –python_out=. TimestampData.proto

The will produce a TimestampData_pb2.py file which contains all the class definition and methods to deserialize the message. If you open it you will see the required modules to use Protocol Buffers in Python. As described in part 1 simply install protobuf.

Note: if you see TimestampData.proto:1:1: Expected top-level statement (e.g. “message”). , this probably means your file is not in ANSI, which might happen if you created it with Visual Studio. You can change the encoding with Notepad++.

Now that our modules are installed and our message is properly compiled, we can update our Python Subscriber to read our serialized data:

import sys
import zmq
import datetime
import TimestampData_pb2

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://127.0.0.1:5000")
socket.setsockopt(zmq.SUBSCRIBE, "")

while True:
    rawdata = socket.recv()
    data = TimestampData_pb2.TimestampData()
    data.ParseFromString(rawdata)
    ts = datetime.datetime(1, 1, 1) + datetime.timedelta(microseconds = data.Timestamp / 10)
    print , data.Key, data.Data

You can now run the C# Publisher and the Python Subscriber, and see data flow…

All the source files in this post are available on github.

Objects from C# to Python [0/3]

There are multiple ways to integrate C# and Python. The easiest of all is by far IronPython, but there are some limitations as some libraries have not been ported yet, and it is limited to the same process.To make things more portable and scalable, the most logical path is using ZeroMQ and Protocol Buffers.

Part 1 covers the installation of ZeroMQ and Protocol Buffers.
Part 2 shows how to write a string publisher in C# and a string subscriber in Python.
Part 3 puts everything together and demonstrates how to serialize structured data to exchange objects between the publisher and the subscriber.