[serialization] Biggest differences of Thrift vs Protocol Buffers?

What are the biggest pros and cons of Apache Thrift vs Google's Protocol Buffers?

This question is related to serialization protocol-buffers thrift

The answer is


There are some excellent points here and I'm going to add another one in case someones' path crosses here.

Thrift gives you an option to choose between thrift-binary and thrift-compact (de)serializer, thrift-binary will have an excellent performance but bigger packet size, while thrift-compact will give you good compression but needs more processing power. This is handy because you can always switch between these two modes as easily as changing a line of code (heck, even make it configurable). So if you are not sure how much your application should be optimized for packet size or in processing power, thrift can be an interesting choice.

PS: See this excellent benchmark project by thekvs which compares many serializers including thrift-binary, thrift-compact, and protobuf: https://github.com/thekvs/cpp-serializers

PS: There is another serializer named YAS which gives this option too but it is schema-less see the link above.


It's also important to note that not all supported languages compair consistently with thrift or protobuf. At this point it's a matter of the modules implementation in addition to the underlying serialization. Take care to check benchmarks for whatever language you plan to use.


One obvious thing not yet mentioned is that can be both a pro or con (and is same for both) is that they are binary protocols. This allows for more compact representation and possibly more performance (pros), but with reduced readability (or rather, debuggability), a con.

Also, both have bit less tool support than standard formats like xml (and maybe even json).

(EDIT) Here's an Interesting comparison that tackles both size & performance differences, and includes numbers for some other formats (xml, json) as well.


ProtocolBuffers is FASTER.
There is a nice benchmark here:
http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

You might also want to look into Avro, as Avro is even faster.
Microsoft has a package here:
http://www.nuget.org/packages/Microsoft.Hadoop.Avro

By the way, the fastest I've ever seen is Cap'nProto;
A C# implementation can be found at the Github-repository of Marc Gravell.


I think the basic data structure is different

  1. Protocol Buffer use variable-length integee which refers to variable-length digital encoding, turning a fixed-length number into a variable-length number to save space.
  2. Thrift proposed different types of serialization formats (called "protocols"). In fact, Thrift has two different JSON encodings, and no less than three different binary encoding methods.

In conclusion,these two libraries are completely different. Thrift likes a one-stop shop, giving you the entire integrated RPC framework and many options (supporting cross-language), while Protocol Buffers is more inclined to "just do one thing and do it well".


I was able to get better performance with a text based protocol as compared to protobuff on python. However, no type checking or other fancy utf8 conversion, etc... which protobuff offers.

So, if serialization/deserialization is all you need, then you can probably use something else.

http://dhruvbird.blogspot.com/2010/05/protocol-buffers-vs-http.html


They both offer many of the same features; however, there are some differences:

  • Thrift supports 'exceptions'
  • Protocol Buffers have much better documentation/examples
  • Thrift has a builtin Set type
  • Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
  • I find Protocol Buffers much easier to read

Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).


I was able to get better performance with a text based protocol as compared to protobuff on python. However, no type checking or other fancy utf8 conversion, etc... which protobuff offers.

So, if serialization/deserialization is all you need, then you can probably use something else.

http://dhruvbird.blogspot.com/2010/05/protocol-buffers-vs-http.html


I think the basic data structure is different

  1. Protocol Buffer use variable-length integee which refers to variable-length digital encoding, turning a fixed-length number into a variable-length number to save space.
  2. Thrift proposed different types of serialization formats (called "protocols"). In fact, Thrift has two different JSON encodings, and no less than three different binary encoding methods.

In conclusion,these two libraries are completely different. Thrift likes a one-stop shop, giving you the entire integrated RPC framework and many options (supporting cross-language), while Protocol Buffers is more inclined to "just do one thing and do it well".


I think most of these points have missed the basic fact that Thrift is an RPC framework, which happens to have the ability to serialize data using a variety of methods (binary, XML, etc).

Protocol Buffers are designed purely for serialization, it's not a framework like Thrift.


As I've said as "Thrift vs Protocol buffers" topic :

Referring to Thrift vs Protobuf vs JSON comparison :

Additionally, there are plenty of interesting additional tools available for those solutions, which might decide. Here are examples for Protobuf: Protobuf-wireshark , protobufeditor.


  • Protobuf serialized objects are about 30% smaller than Thrift.
  • Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
  • Thrift has richer data structures (Map, Set)
  • Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
  • Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.

For a closer look at the differences, check out the source code diffs at this open source project.


For one, protobuf isn't a full RPC implementation. It requires something like gRPC to go with it.

gPRC is very slow compared to Thrift:

http://szelei.me/rpc-benchmark-part1/


It's also important to note that not all supported languages compair consistently with thrift or protobuf. At this point it's a matter of the modules implementation in addition to the underlying serialization. Take care to check benchmarks for whatever language you plan to use.


They both offer many of the same features; however, there are some differences:

  • Thrift supports 'exceptions'
  • Protocol Buffers have much better documentation/examples
  • Thrift has a builtin Set type
  • Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
  • I find Protocol Buffers much easier to read

Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).


And according to the wiki the Thrift runtime doesn't run on Windows.


They both offer many of the same features; however, there are some differences:

  • Thrift supports 'exceptions'
  • Protocol Buffers have much better documentation/examples
  • Thrift has a builtin Set type
  • Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
  • I find Protocol Buffers much easier to read

Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).


Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:

We decided against some extreme storage optimizations (i.e. packing small integers into ASCII or using a 7-bit continuation format) for the sake of simplicity and clarity in the code. These alterations can easily be made if and when we encounter a performance-critical use case that demands them.

Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.


I think most of these points have missed the basic fact that Thrift is an RPC framework, which happens to have the ability to serialize data using a variety of methods (binary, XML, etc).

Protocol Buffers are designed purely for serialization, it's not a framework like Thrift.


  • Protobuf serialized objects are about 30% smaller than Thrift.
  • Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
  • Thrift has richer data structures (Map, Set)
  • Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
  • Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.

For a closer look at the differences, check out the source code diffs at this open source project.


Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:

We decided against some extreme storage optimizations (i.e. packing small integers into ASCII or using a 7-bit continuation format) for the sake of simplicity and clarity in the code. These alterations can easily be made if and when we encounter a performance-critical use case that demands them.

Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.


Another important difference are the languages supported by default.

  • Protocol Buffers: Java, Android Java, C++, Python, Ruby, C#, Go, Objective-C, Node.js
  • Thrift: Java, C++, Python, Ruby, C#, Go, Objective-C, JavaScript, Node.js, Erlang, PHP, Perl, Haskell, Smalltalk, OCaml, Delphi, D, Haxe

Both could be extended to other platforms, but these are the languages bindings available out-of-the-box.


  • Protobuf serialized objects are about 30% smaller than Thrift.
  • Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
  • Thrift has richer data structures (Map, Set)
  • Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
  • Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.

For a closer look at the differences, check out the source code diffs at this open source project.


There are some excellent points here and I'm going to add another one in case someones' path crosses here.

Thrift gives you an option to choose between thrift-binary and thrift-compact (de)serializer, thrift-binary will have an excellent performance but bigger packet size, while thrift-compact will give you good compression but needs more processing power. This is handy because you can always switch between these two modes as easily as changing a line of code (heck, even make it configurable). So if you are not sure how much your application should be optimized for packet size or in processing power, thrift can be an interesting choice.

PS: See this excellent benchmark project by thekvs which compares many serializers including thrift-binary, thrift-compact, and protobuf: https://github.com/thekvs/cpp-serializers

PS: There is another serializer named YAS which gives this option too but it is schema-less see the link above.


For one, protobuf isn't a full RPC implementation. It requires something like gRPC to go with it.

gPRC is very slow compared to Thrift:

http://szelei.me/rpc-benchmark-part1/


One obvious thing not yet mentioned is that can be both a pro or con (and is same for both) is that they are binary protocols. This allows for more compact representation and possibly more performance (pros), but with reduced readability (or rather, debuggability), a con.

Also, both have bit less tool support than standard formats like xml (and maybe even json).

(EDIT) Here's an Interesting comparison that tackles both size & performance differences, and includes numbers for some other formats (xml, json) as well.


RPC is another key difference. Thrift generates code to implement RPC clients and servers wheres Protocol Buffers seems mostly designed as a data-interchange format alone.


They both offer many of the same features; however, there are some differences:

  • Thrift supports 'exceptions'
  • Protocol Buffers have much better documentation/examples
  • Thrift has a builtin Set type
  • Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
  • I find Protocol Buffers much easier to read

Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).


  • Protobuf serialized objects are about 30% smaller than Thrift.
  • Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
  • Thrift has richer data structures (Map, Set)
  • Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
  • Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.

For a closer look at the differences, check out the source code diffs at this open source project.


Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:

We decided against some extreme storage optimizations (i.e. packing small integers into ASCII or using a 7-bit continuation format) for the sake of simplicity and clarity in the code. These alterations can easily be made if and when we encounter a performance-critical use case that demands them.

Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.


RPC is another key difference. Thrift generates code to implement RPC clients and servers wheres Protocol Buffers seems mostly designed as a data-interchange format alone.


ProtocolBuffers is FASTER.
There is a nice benchmark here:
http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

You might also want to look into Avro, as Avro is even faster.
Microsoft has a package here:
http://www.nuget.org/packages/Microsoft.Hadoop.Avro

By the way, the fastest I've ever seen is Cap'nProto;
A C# implementation can be found at the Github-repository of Marc Gravell.


Another important difference are the languages supported by default.

  • Protocol Buffers: Java, Android Java, C++, Python, Ruby, C#, Go, Objective-C, Node.js
  • Thrift: Java, C++, Python, Ruby, C#, Go, Objective-C, JavaScript, Node.js, Erlang, PHP, Perl, Haskell, Smalltalk, OCaml, Delphi, D, Haxe

Both could be extended to other platforms, but these are the languages bindings available out-of-the-box.


As I've said as "Thrift vs Protocol buffers" topic :

Referring to Thrift vs Protobuf vs JSON comparison :

Additionally, there are plenty of interesting additional tools available for those solutions, which might decide. Here are examples for Protobuf: Protobuf-wireshark , protobufeditor.