Objects from C# to Python [3/3]

Protocol Buffers (or Protobuf) are the serialization format developed by Google. One of the key advantages of this format is its availability in almost any language, like ZeroMQ.

To serialize structured data in this format, you need to start by defining a message type. This is usually done in a .proto file, but in C# it is also possible to simply add attributes to an existing class, similarly to an XML serializer.

In this post we will cover a simple data structure with three fields:

  • TimeStamp: a 64 bits integer used to store the date and time as ticks
  • Key: a string used to describe the content of the message
  • Data: a double storing the data itself

It is easy to see how this can be used to stream financial market data for example, with only a few changes. Protocol Buffers also support more complex message types, such as lists and messages of messages.

Publishing in C#

For the C# serialization, we will be using the protobuf-net library. Build it and add the full net 3.0 version as a reference.

Then let’s define our class:

using System;
using ProtoBuf;

namespace CSPublish
{
 [ProtoContract]
 class TimestampData
 {
  [ProtoMember(1)]
  public Int64 Timestamp { get; set; }

  [ProtoMember(2)]
  public string Key { get; set; }

  [ProtoMember(3)]
  public double Data { get; set; }
 }
}

The only constraint is to have a unique number for each field.

After updating our previous ZeroMQ publish code, we can send a serialized class rather than a string:

using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;

using ZeroMQ;
using ProtoBuf;

namespace CSPublish
{
 class Program
 {
  static void Main(string[] args)
  {
   bool publish = true;
   Task.Run(async () =>
   {
    Random random = new Random();
    using (var context = ZmqContext.Create())
    {
     using (var socket = context.CreateSocket(SocketType.PUB))
     {
      socket.Bind("tcp://127.0.0.1:5000");
      using (var stream = new MemoryStream())
      {
       while (publish)
       {
        var data = new TimestampData
	{
	 Timestamp = DateTime.Now.Ticks,
	 Key = "SYM",
	 Data = random.NextDouble()
	};
        Serializer.Serialize(stream, data);
        byte[] bytes = stream.ToArray();
	socket.Send(bytes);

	stream.SetLength(0);
	await Task.Delay(1);
       }
      }
     }
    }
   });
   Console.ReadLine();
   publish = false;
  }
 }
}

We use a random generator for the data, and serialization is done as usual in a memory stream.

Subscribing in Python

In Python we cannot simply decorate a structure or class: we need to define a .proto file. Following our C# class definition, let’s create one as follow in a TimestampData.proto file:

package pysub;

message TimestampData {
	required int64 Timestamp = 1;
	required string Key = 2;
	required double Data = 3;
}

The package name can be really anything. Here we use pysub as our message will be used in the Python subscriber.

We can now compile this message using protoc, which you can download from https://code.google.com/p/protobuf/downloads/list (I use the win32 binary). Open a command prompt in your working directory and enter:

path\to\protoc.exe –python_out=. TimestampData.proto

The will produce a TimestampData_pb2.py file which contains all the class definition and methods to deserialize the message. If you open it you will see the required modules to use Protocol Buffers in Python. As described in part 1 simply install protobuf.

Note: if you see TimestampData.proto:1:1: Expected top-level statement (e.g. “message”). , this probably means your file is not in ANSI, which might happen if you created it with Visual Studio. You can change the encoding with Notepad++.

Now that our modules are installed and our message is properly compiled, we can update our Python Subscriber to read our serialized data:

import sys
import zmq
import datetime
import TimestampData_pb2

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://127.0.0.1:5000")
socket.setsockopt(zmq.SUBSCRIBE, "")

while True:
    rawdata = socket.recv()
    data = TimestampData_pb2.TimestampData()
    data.ParseFromString(rawdata)
    ts = datetime.datetime(1, 1, 1) + datetime.timedelta(microseconds = data.Timestamp / 10)
    print , data.Key, data.Data

You can now run the C# Publisher and the Python Subscriber, and see data flow…

All the source files in this post are available on github.

Objects from C# to Python [2/3]

At this point, if you are not familiar with zeromq (also called zmq or 0mq), I suggest you read zguide which includes detailed explanations and samples in various languages on usage and patterns using zeromq.

Let’s start by creating a new Visual Studio console project called CSharpStringPublisher, and use the code below to publish the time on regular intervals:

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using System.Text;
using ZeroMQ;

namespace CSharpStringPublisher
{
    class Program
    {
        private static ZmqContext context_ = null;
        private static ZmqSocket socket_ = null;

        static void Main(string[] args)
        {
            context_ = ZmqContext.Create();
            socket_ = context_.CreateSocket(SocketType.PUB);
            socket_.Bind("tcp://127.0.0.1:5000");

            bool publish = true;
            Task.Run(() => {
                while (publish)
                {
                    string timestring = DateTime.Now.ToString("u");
                    Console.WriteLine("Sending '{0}' to subscribers", timestring);
                    socket_.Send(timestring, ASCIIEncoding.ASCII);
                    Thread.Sleep(1000);
                }
            });
            Console.ReadLine();
            publish = false;
        }
    }
}

We now have a simple publisher (SocketType.PUB) which will be able to publish byte arrays to any zeromq subscriber, regardless of the language you are subscribing with.
If you have ever used raw sockets in the past, one thing is noticeable as soon as you run this sample: you do not have to worry whether there are subscribers or not. At this point, regardless of the performance, this is in my opinion a huge gain already: focus on the data you want to send and let the framework handle the connections.

Let’s now write a very basic subscriber in Python. Open your favorite text editor to create PythonStringSubscriber.py and paste the following code:

import zmq

context = zmq.Context()
socket = context.socket(zmq.SUB)

socket.setsockopt(zmq.SUBSCRIBE, "")
socket.connect("tcp://127.0.0.1:5000")

while True:
    timestring = socket.recv_string()
    print timestring

If you now run both the CSharpStringPublisher and the PythonStringSubscriber, the subscriber will print every string received from the C# program. The interesting part is that if does not matter in which order you run the samples, if you start the subscriber before the publisher, or if you run multiple subscribers: data will just flows.

string_publish_subscribe

The code used in this post is available on the git repository.

The next part of the tutorial shows how to use Google’s protocol buffers to stream objects from C# to Python.

Objects from C# to Python [1/3]

Setting up the development environment is fairly straightforward under Windows 8.
We will not cover the installation of each component as I did not find anything special worth mentioning.
So here is my shopping list:

IDEs

For Python, I am using the Python(x,y) distribution which includes Numpy, Scipy, Matplotlib, Qt amongst others, under the Spyder GUI which I find more lightweight than Eclipse.
For C#, Microsoft Visual Sudio 2012 Express for Windows Desktop (or higher) is more than enough if you do not own a full license.

Libraries

Protocol Buffers

For Python, simply run

pip install protobuf

For C# download protobuf-net

ZeroMQ

Install the ZeroMQ binaries: this will provide libzmq.dll which will need to be in the path when running ZeroMQ from Python or C#.

Then for Python, run

pip install pyzmq

Finally, for C#, just download clrzmq.

Objects from C# to Python [0/3]

There are multiple ways to integrate C# and Python. The easiest of all is by far IronPython, but there are some limitations as some libraries have not been ported yet, and it is limited to the same process.To make things more portable and scalable, the most logical path is using ZeroMQ and Protocol Buffers.

Part 1 covers the installation of ZeroMQ and Protocol Buffers.
Part 2 shows how to write a string publisher in C# and a string subscriber in Python.
Part 3 puts everything together and demonstrates how to serialize structured data to exchange objects between the publisher and the subscriber.