Increase ZeroMQ performance by up to 2400%

This is just a simple PoC to compare ZeroMQ performance (with defaults) and show how with the use of software acceleration of communications we can improve transparently and without any ad-hoc optimization the performance by up to 2400%.


ØMQ (also spelled ZeroMQ, 0MQ or ZMQ) is a high-performance asynchronous messaging library aimed at use in scalable distributed or concurrent applications. It provides a message queue, but unlike message-oriented middleware, a ØMQ system can run without a dedicated message broker. The library is designed to have a familiar socket-style API.


The ØMQ API provides sockets (a kind of generalization over the traditional IP and Unix domain sockets), each of which can represent a many-to-many connection between endpoints. Operating with a message-wise granularity, they require that a messaging pattern be used, and are particularly optimized for that kind of pattern. The basic ØMQ patterns are:


  • Request-reply: Connects a set of clients to a set of services.
  • Publish-subscribe: Connects a set of publishers to a set of subscribers.
  • Push-pull (pipeline): Connects nodes in a fan-out / fan-in pattern that can have multiple steps, and loops.
  • Exclusive pair: Connects two sockets in an exclusive pair.


Each pattern defines a particular network topology. Request-reply defines so-called “service bus”, publish-subscribe defines “data distribution tree”, push-pull defines “parallelised pipeline”. All the patterns are deliberately designed in such a way as to be infinitely scalable and thus usable on Internet scale.


Any message through the socket is treated as an opaque blob of data. Delivery to a subscriber can be automatically filtered by the blob leading string. Available message transports include TCP, PGM (reliable multicast), inter-process communication (IPC) and inter-thread communication (ITC).


The ØMQ core library performs very well due to its internal threading model, and can outperform conventional TCP applications in terms of throughput by utilizing an automatic message batching technique.


ØMQ implements ZMTP, the ZeroMQ Message Transfer Protocol. ZMTP defines rules for backward interoperability, extensible security mechanisms, command and message framing, connection metadata, and other transport-level functionality. A growing number of projects implement ZMTP directly as an alternative to using the full ØMQ implementations.

Time to try ZeroMQ performance

Nowadays when we talk about deploy a cluster we find the problem that we have access to a variety of hardware configurations with multiple features and leading us to analyze in detail the different options that we have to obtain a solution with the best performance/price ratio.


Since we are testing a solution where the lowest latency is crucial we will focus our analysis on the different communications solutions that we can find in the market (1GbE, InfiniBand, Universal Fast Sockets, etc.).


To carry out performance tests we will use the performance benchmarks included with the source (latency and throughput performance).


Latency performance


Latency performance results (us) (click to open)

Throuhgput performance


Throuhgput performance results (messages/s) (click to open)

  • Message-count/Roundtrip-count: 50000
  • CPU Model: 2 x Intel Xeon E5-2660 Sandy Bridge-EP 2.20GHz
  • RAM: 64 GB DDR3 1600 Mhz
  • Network: Mellanox Technologies MT27500 Family [ConnectX-3] & Intel Corporation I350 Gigabit Network Connection (rev 01)


As can be observed, communications are a bottleneck in ZeroMQ’s performance. Therefore, if we want to increase the performance we must use software acceleration of communications like Torus software, which integrated seamlessly without any ad-hoc optimization increases performance by up to 2400% being able more efficiently leverage the infrastructure we have, improve response times and finally provide a higher quality of service.

Jose Manuel Santorum