After having shared our MQTT messaging architecture feedback with this previous article (Feedback on a critical messaging architecture with MQTT), we are now on time for choosing and validating an efficient broker solution. Several candidates have been selected, only two major open source projects have been selected: apache activemq & rabbitmq. You can find an official list of brokers on the official mqtt.rog website : http://mqtt.org/wiki/doku.php/brokers
Our need here is to have a robust solution, known for its quality, with an active community, that can be managed with clustering tools and if possible is accessible via Java API. The aim is to handle between 5000 and 10000 devices with a global rate of 180 messages per second and check how the broker can react with different kind of usage.
We want to test these solutions against the following parameters :
- Number of clients
- QOS (0,1 and 2)
- Disconnection between send
- Session Management
- Payload size
To build test scenarios, we quickly check if there were any tool available (jmeter, gatling …). In fine, the simplest thing was to write a tool which can launch several Mqtt Clients (based on Fuse API). This way we master all the clients’ behaviors, directly at protocol level. The hardest thing was to collect testing data, compute and display usefull reports. Due to the number of devices to simulate, the test tools can’t be executed on only one computer. We need to make a distributed test solution to help us bring and observe thousands of devices.
Heavy load testing solution
The solution is composed of 4 parts :
- Server : host for broker
- Sender : host for simulating devices which send messages
- Receiver : host for simulating devices which receive message
- Reporter: host for gathering test data
To build such a solution, we use Hazelcast (great in memory data grid technology). This is the backbone of our solution:
- no time to learn a new complex technology
- easy integratation in our java application
- don’t pollute the main MQTT communication
“As easy as using a simple queue”, that’s what we need. Hazelcast made it and helped us handle the huge amount of data from our device simulation (I will write an article about how we used Hazelcast).
The following setup has been used :
- Broker Host (rabbitmq and activemq server) : Centos Linux x64 2.6 / 4 CPU – 4GB RAM / Java 7u51 x64 / Compiled erlang for rabbitmq
- Simulation Host (Sender/Receiver/Reporter hazelcast nodes) : Ubuntu Linux x64 3.11 / Core i7 (8 CPU) – 6 GB Ram / Java 7u51 x64
The section below shows up each scenario :
- Left graphic shows send and receive times (in milliseconds)
- Right graphic, number of messages send/received
For each scenario :
- 7200 clients are sending messages every 40 seconds (180 message per second in average)
- 10 clients are receiving the messages in asynchronous way
- reporting system is checking each send and received message (messages are acked or expired) and compute statistics in realtime
- each test has a duration of 5 minutes
The tests results below are just an extract from our complete study.
#1 – Send and receive with MQTT QOS 1
At QOS 1, every message must be received at least once. On ActiveMQ, the first wave of messages have been received in a maximum amount of time of 35 seconds, whereas rabbitmq received them in a maximum of 4 seconds. The average send time is less than 5 seconds for activemq and less than 500 milliseconds for RabbitMQ.
The second type of graphs shows how much messages are received in time. In activemq, the graph is more scattered, showing that the broker has some difficulties to handle the overall messages. Rabbitmq is not disturbed by such amount of messages. The load of messages is more easily absorbed by RabbitMQ.
#2 – Send and disconnect, receive with MQTT QOS 0
At QOS 0, every message is just sent. And after every sent, the sender disconnect from the broker. In ActiveMQ, the first wave of messages have been received in a maximum amount of time of 40 seconds, whereas rabbitmq received them in a maximum of 16 seconds. The average send time is less than 5 seconds for activemq and less than 2 seconds for RabbitMQ.
Again, the second type of graphs shows how much messages are received in time. Here the tendency are closer for the two brokers, even if we can see a big spike for rabbitmq.
Once again, RabbitMQ seems to get better performances on the situation.
As I said before, these results are just an extract of all of our study. These graphs shows an important point : the better capacity of RabbitMQ to handle heavy load and its good response capacity under this kind of situation. Both situation (QOS 1 and disconnection of senders) are comparable in term of results.
In our global study, we observed the same tendancy between the two solution. But we also observed some regular puzzling hangs on ActiveMQ.
Lastly, some words about what we encountered before making such tests :
- configured linux kernel (ubuntu server) to allow high IO and lots of opened files
- activated ActiveMQ NIO connector for MQTT, to allow better performances
- choose representative test scenario
The main important part of this experiment was to take time to adjust our focus and find the real value of our benchmark tests (QOS, network quality drop …). You can spend hours testing brokers without any real need or without expecting any usable data from the tests.
Even if our results show a better tendency for RabbitMQ, the choice is not as easy. Performance is not the only criteria to choose a solution, others (like used technologies, documentation, support …) must be taken into account to make your choice.Google+