[rsyslog] rsyslog performance as receiver, heavily using regex in templates

Ben Bradley bbradleyuk at gmail.com
Thu Jan 31 17:34:31 CET 2013


On Thu, 31 Jan 2013 13:44:03 +0000
Rainer Gerhards <rgerhards at hq.adiscon.com> wrote:

> On Thu, 2013-01-31 at 14:51 +0200, Radu Gheorghe wrote:
> > Hi Ben,
> > 
> > 2013/1/31 Ben Bradley <bbradleyuk at gmail.com>
> > 
> > > Hi everyone
> > >
> > > I'm currently using logstash as the log collector from a few rsyslog
> > > sender clients. I'd like to use rsyslog to receive the remote logs instead
> > > of logstash. This means I'm keeping things simple and can possibly also use
> > > RELP.
> > >
> > > If the rsyslog receiver is doing alot of regex parsing on each message
> > > received (i.e. parsing Apache logs into ElasticSearch fields) at what sort
> > > of volume of log messages would I start to notice performance problems?
> > >
> > > Eventually I'm expecting about 5-10GB per day to be received by our
> > > centralised rsyslog log server.
> > >
> > 
> > I guess it all comes down to performance testing, but 10GB would probably
> > mean ~20M logs or something like that. If the majority of those will be
> > sent during the day (say 10 hours), my poor math says if you handle 500-600
> > logs/sec you should be fine.
> 
> seeing that number, I'd say it requires quite some regexpes to get
> rsyslog to sweat. HOWEVER... do we really need regexpes? Can you post a
> couple of samples?
> 
> Rainer

On a slightly related note. With regular expressions is there a way to extract the sub-matches into separate positions within the template?

For example, here's my test template to use with omelasticsearch (broken onto new lines for readability)...
$template ApacheAccessElasticSearch,"{
\"msg\":\"%msg:::json%\",
\"sysloghost\":\"%HOSTNAME:::json%\",
\"syslogip\":\"%fromhost-ip%\",
\"syslogfacility\":\"%syslogfacility-text%\",
\"syslogpri\":\"%pri%\",
\"syslogseverity\":\"%syslogseverity-text%\",
\"program\":\"%programname%\",
\"syslogtime\":\"%timereported:1:19:date-rfc3339%.%timereported:1:3:date-subseconds%\",
\"syslogtag\":\"%syslogtag:::json%\",
\"http.usec\":\"%msg:R,ERE,1,BLANK:([0-9]+)$--end%\"
\"http.vhost\":\"%msg:R,ERE,1,BLANK:([a-z0-9\-\.]+) [0-9]+$--end%\"
}"


If you look at the http.usec and http.vhost fields in the template, is there a way I can have a single regex with submatch 2 going in to http.usec and submatch 1 going in to http.vhost?
And submatch 3,4,5,6 etc going into their own fields in the JSON output of the template?

Cheers, Ben



More information about the rsyslog mailing list