Skip to content

Protocol Buffers

Subtitle: The good, the bad, and the… no, wait; this is a Google project.

XML and Java have the same sort of flavor to them: they’re reasonably good and very widely used; they’re the sort of product that design committees everywhere aspire to create. Their flaws only really become visible after something better comes along. In Java’s case, Python demonstrated that a whole lot of the structure and required text that gives Java code its rigidity can be stripped away, leaving a language that’s a joy to develop in. However, there hasn’t been an analogous improvement on XML.

Until yesterday.

Protocol Buffers have a non-descriptive name; I had no idea what to expect when I clicked the link to the announcement that Google put out. As it turns out, they’re a generic data serialization format (much like XML), except without all the human-readability business that so bloats actual XML. From the announcement:

Protocol Buffers allow you to define simple data structures in a special definition language, then compile them to produce classes to represent those structures in the language of your choice. These classes come complete with heavily-optimized code to parse and serialize your message in an extremely compact format. Best of all, the classes are easy to use: each field has simple “get” and “set” methods, and once you’re ready, serializing the whole thing to – or parsing it from – a byte array or an I/O stream just takes a single method call.

In case you missed that, all you have to write is the schema. All the encoding and decoding crap that you have to wade through in XML has already been abstracted away; they generate classes to do that for you. This is, in fact, cooler than sliced bread.

Of course, there do exist times when XML might better serve your needs:

However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

In my experience, the human-readability and self-documentation inherent in XML have always been bonus features not essential to the core mission, which was getting data from Point A to Point B. However, I’ve had to spend countless hours wrangling with DOM and SAX, dealing with the problem of getting the data into and out of that intermediate form.

There is one wart that I noticed: you still have to create and read the Messages entirely distinctly from your own native class structure. The natural thing to do, if you want to use this to serialize and deserialize a class, would be just to put all the members into the Message definition and put the methods into a subclass of the generated class. However, that is expressly forbidden. All is not lost, though: all you really need, at simplest, is a pair of methods like this:

class AClass(object): 
     def toPBuff(self):
          out = AClassPBuff()
          for member in dir(self):
               if not (callable(member) or '__' in member or member in self.__excludeFromSerialize):
                    setattr(out, member, getattr(self, member))
          return out

     def fromPBuff(cls, pBuff):
          out = AClass()
          for member in dir(out):
               if not (callable(member) or '__' in member or member in self.__excludeFromSerialize):
                    setattr(out, member, getattr(pBuff, member))
          return out

In short, even if only in terms of making efficient use of developer time, this is already an awesome project. Once you count in that it is also faster and slimmer than the alternatives, this becomes astonishingly cool. Expect it to be making appearances in my code from now on.

RSS feed


Comment by Nat Windows XP Mozilla Firefox 3.0
2008-07-10 06:50:07

In Python, I wouldn’t even copy fields to/from your generated Message class. I’d just make your application class contain the Message object, with transparent delegation via the __getattr__() method.

This article talks about simple delegation. Its rationale is a bit dated, but the approach is still very current.

(The Python Cookbook is a great resource. I even read the printed copy: it was my intro book to Python.)

Comment by coriolinus Windows XP Mozilla Firefox 3.0
2008-07-10 20:59:03

I like that idea. You’d probably still want a classmethod in place to construct an instance when given a Message, but it does make things simpler otherwise.

Comment by Chris Windows XP Mozilla Firefox Subscribed to comments via email
2008-07-14 18:03:11

You should check out Thrift…its an open sourced facebook project, and pretty much does the same thing. Its been around for over a year and has more community development into it.


Sorry, the comment form is closed at this time.