[rsyslog] ElasticSearch Bulk Indexing (was: Load balancing for rsyslog aggregators)

Vlad Grigorescu vladg at illinois.edu
Wed Feb 8 16:03:23 CET 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

After the recent discussion of rsyslog sending logs to ElasticSearch, using the bulk indexing API, I did some playing around with the current plugin. First, let me just say that I really appreciate the work that Nathan did on the omelasticsearch plugin, and that it will work fine under many use cases. However, there are a few fundamental limitations with the current omelasticsearch/rsyslog integration:

- - omelasticsearch uses curl to make the API calls to ES. The downside of this is that you have to specify a hostname. ES supports auto-discovering a cluster, as well as fail-over. If the host omelasticsearch is using goes down, the cluster may still be fully functional, but omelasticsearch won't be able to find it. Of course, you could go in and add other cluster members as failover actions, but this would mean a config change every time you change your ES topology.

- - curl has a default of only returning 16KB of the HTTP response. This response contains the information of which messages were successfully inserted into ES, and which failed. For a large batch of messages, one could easily get a response over the 16KB limit. This would require running a custom-compiled version of curl, that ups this limit.

- - "Pushing" to ES seems to work much less reliably than having ES "pull" messages. For similarly small-sized batches (~250 messages), ES would often take 6-8ms for the bulk insert. However, it would occasionally spike up to 6000ms, which would cause quite a backlog in the queue. Having ES "pull" messages instead (more on this later) seemed to work much more consistently.

- - Finally, I'm a bit confused on how rsyslog receives commit errors with the new transactional plugin system. If there's a batch of 5 messages, and only message 4 is successfully committed during endTransaction, how would one convey that information back to rsyslog? I know Radu mentioned calling a program with omprog, and sending messages to ES from there, but in my setup, data integrity is paramount, and I don't want to re-implement rsyslog's reliable method delivery and failover systems.

The method that I'm currently stress-testing is using the ElasticSearch River[1] with a RabbitMQ[2] type. With this setup, rsyslog sends messages to a RabbitMQ queue. ElasticSearch is configured with the queue's information, and then it periodically pulls messages from that queue. Once it has the messages, it proceeds to bulk index them. If the master ES node goes down, the new master starts pulling messages from the queue. Overall, it seems to work well, and the indexing throughput seems higher, due to not pushing messages to ES when it's very busy.

Unfortunately, I can't find any rsyslog plugin for RabbitMQ, so I'm currently bouncing my messages through a logstash[3] server. Does anyone know of any plugin? I suspect the zeromq plugins might be a good starting point; I'm not sure how much would have to be rewritten to send to RabbitMQ instead.

Those were my experiences - I hope some of that proves useful to others looking into ElasticSearch.

- --
Vlad Grigorescu | IT Security Engineer
University of Illinois at Urbana-Champaign
Office of Privacy and Information Assurance

[1] - <http://www.elasticsearch.org/guide/reference/river/>
[2] - <https://github.com/elasticsearch/elasticsearch-river-rabbitmq>
[3] - <http://logstash.net/docs/1.1.0/outputs/elasticsearch_river>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)

iQIcBAEBAgAGBQJPMo66AAoJEMEVj6tjLlJyW28P/1QgQjTvADUCzG7ljohnK0xq
CS1V8lUGU8Q+oZ6RUbc546mMyYGABuRvEr0nKXSY1r9vTIS2OeaUt4EFgdJWP8mO
pQXNuFqhmtQCUXqflIUQHuY7y4d6EBmuz5b5sXbYWqLVVVQ5hpb96A4LqTzgkecT
XRtYXtU+P5N4kOdKTpgDH80MsFIbHkEFa1NusuuCyBRx0p0b6ZYuOqr13QZV3gGn
3UUbiS6qAi8+3Tw6KhRZ5fpAWw0vdCJP0etyTkR264CgrFQMUM8eFaTrdscK6eHV
akDtkM9vCiOeDZucUCo5XIW4nnLXZcR4lGVAS50a/J2IrHUGoe5fV/SYsd2hRHMm
veUF18ggH7UCjV91HkQ3TBJtQABjGhdhNPW5o74D0neR7ngSbs3j/sbF0NKZmbHa
+XQarL6ba1pJXApLlNIzn3CUWZGnCi65j1UcOkK6HGEbIK3Sa/q550CjuZDWShTF
is02ubxm29XP2VkSrWkab2CwIlM7CGtghaaoEbAxJdz0zJJs93MejUKJ0nRBEOPH
5bExCYfUgao9x+41XIw5Zw8X783MMD1PcS6wgJ+5WOGIWdHQZNHsfrRXNeoM++uu
uHW7aWk+SkExNP/JhLLXFgv5mmhnA7NePrFRV/CaCZPrB8THwN2D6G2MTFTCSA5C
Y3rJ63TeNKF4hSAVhss5
=v2BA
-----END PGP SIGNATURE-----



More information about the rsyslog mailing list