[Lognorm] regex engine for lognorm

Fri Apr 25 22:50:00 CEST 2014

Hi,

I have been looking for a log normalization engine for log captured
via Syslog. The output is to be fed into apps/platforms that like
key/value pairs - Hadoop, Lucene based search tools and some kind of
stream processor (Storm/Spark).

An easy way to feed rules to this engine would be in this format:

[descriptor/label]
REGEX=someRegexThatExtractsMultipleGroups(1....n)
output=$1:time, $2:host, $3:tag, $4:group4, $5:group5 ........ $n:groupn

You should be able to specify multiple regex rules in this format that
would get evaluated one after the other. Preferably, the engine would
internally rank the regex list in the order of most used to least
used.

Is this something the libnorm project can take on? Combined with the
capabilities of rsyslog, this would be an enormously scalable and
powerful tool for log analysis because will allow people to maintain a
single data dictionary across multiple analysis engines like Hadoop,
Search and others.

Similar functionality exists in Flume/Logstash via grok but Java
simply sucks when it comes to regex parsing.

Thanks,

- Xuri