[rsyslog] rsyslog performance as receiver, heavily using regex in templates
bbradleyuk at gmail.com
Thu Jan 31 17:34:31 CET 2013
On Thu, 31 Jan 2013 13:44:03 +0000
Rainer Gerhards <rgerhards at hq.adiscon.com> wrote:
> On Thu, 2013-01-31 at 14:51 +0200, Radu Gheorghe wrote:
> > Hi Ben,
> > 2013/1/31 Ben Bradley <bbradleyuk at gmail.com>
> > > Hi everyone
> > >
> > > I'm currently using logstash as the log collector from a few rsyslog
> > > sender clients. I'd like to use rsyslog to receive the remote logs instead
> > > of logstash. This means I'm keeping things simple and can possibly also use
> > > RELP.
> > >
> > > If the rsyslog receiver is doing alot of regex parsing on each message
> > > received (i.e. parsing Apache logs into ElasticSearch fields) at what sort
> > > of volume of log messages would I start to notice performance problems?
> > >
> > > Eventually I'm expecting about 5-10GB per day to be received by our
> > > centralised rsyslog log server.
> > >
> > I guess it all comes down to performance testing, but 10GB would probably
> > mean ~20M logs or something like that. If the majority of those will be
> > sent during the day (say 10 hours), my poor math says if you handle 500-600
> > logs/sec you should be fine.
> seeing that number, I'd say it requires quite some regexpes to get
> rsyslog to sweat. HOWEVER... do we really need regexpes? Can you post a
> couple of samples?
On a slightly related note. With regular expressions is there a way to extract the sub-matches into separate positions within the template?
For example, here's my test template to use with omelasticsearch (broken onto new lines for readability)...
If you look at the http.usec and http.vhost fields in the template, is there a way I can have a single regex with submatch 2 going in to http.usec and submatch 1 going in to http.vhost?
And submatch 3,4,5,6 etc going into their own fields in the JSON output of the template?
More information about the rsyslog