[rsyslog] Unicode & rsyslog - was: RE: PostgreSQL: Problems with character encoding
david at lang.hm
david at lang.hm
Fri Jan 22 19:19:25 CET 2010
On Fri, 22 Jan 2010, Rainer Gerhards wrote:
> However, even then I need to have a build time switch to turn this on/off,
> because rsyslog in Unicode mode will take not only considerably more space
> (especially with larger in-memory queues), it will also considerably affect
> its performance (in terms of bytes, the memory transfer rate is effectively
> cut in half, as most data in syslog is character-based - also think about the
> effects on cache performance).
if the code uses UTF-8 throughout this doesn't make sense. assuming the
input is plain ascii, UTF-8 strings and ASCII strings should be the same
size (there is some additional cpu cycles involved to figure out the
length in characters for any output routines that grab substrings, but
that should be all)
the only way things would take double the space (and therefor halve the
memory transfer rate) is if it converts everything to UTF-16 strings
internally. This is a bad idea to start with as UTF-16 does not handle all
characters (which is why there is UTF-32 as well), but also because UTF-16
is significantly more expensive to store/copy/etc than UTF-8 for the
common case where most of the characters are ASCII.
It may be that you have picked the wrong string library to use. prior to
UTF-8 being defined 'unicode' and UTF-16 were basicly synonomous and a
_lot_ of string libraries have been written with this assumption
(converting everything to UTF-16 on input and to whatever on output). If
you can find one that can handle the strings as UTF-8 internally it should
be able to just about eliminate the overhead.
David Lang
More information about the rsyslog
mailing list