From singh.janmejay at gmail.com Tue Dec 2 11:26:03 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Tue, 2 Dec 2014 15:56:03 +0530 Subject: [Lognorm] Patches for 'recursive' field type Message-ID: Hi, Here are code-changes for 'recursive' field-type: https://github.com/janmejay/liblognorm/compare/janmejay:master...recursive?expand=1 I wanted to put it out for review and discussion, hence this link. I'll create a pull request once already open pull request(for regex support) is wrapped up. Github tries to club old and new changes into one pull-request which mixes up multiple changes, so im trying to avoid that. -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Tue Dec 2 11:28:34 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Tue, 2 Dec 2014 15:58:34 +0530 Subject: [Lognorm] Patches for 'recursive' field type In-Reply-To: References: Message-ID: We also have a json-comparer as a part of this patch-set. This allows lognorm tests to use equality tests of json-objects rather than having to work with fragments of json in order to avoid key ordering problem. On Tue, Dec 2, 2014 at 3:56 PM, singh.janmejay wrote: > Hi, > > Here are code-changes for 'recursive' field-type: > https://github.com/janmejay/liblognorm/compare/janmejay:master...recursive?expand=1 > > I wanted to put it out for review and discussion, hence this link. > > I'll create a pull request once already open pull request(for regex > support) is wrapped up. Github tries to club old and new changes into one > pull-request which mixes up multiple changes, so im trying to avoid that. > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Tue Dec 2 12:03:38 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Tue, 2 Dec 2014 16:33:38 +0530 Subject: [Lognorm] Pull request for tokenized and regex field types In-Reply-To: References: Message-ID: Does it look ok? On Tue, Nov 25, 2014 at 5:50 PM, singh.janmejay wrote: > Hi, > > These patch-sets complete tokenized field-type and regex field-type > support + the rscript side features required for effective use of > json-arrays in rulesets. > > Some other changes include fixes for memory-leak and invalid memory access > in liblognorm in non-happy-path flows + testing-setup for liblognorm (with > optional and transparent valgrind support). > > Summary: > - tokenized field-type (integration-tests with rsyslog, documentation etc) > - regex support (tests, integration-tests, documentation) > - memory access/leak bug fixes > - testing env setup for liblognorm > - rscript support for json-array subscripting and 'foreach' loop > - rscript support for 'reset' statement, which as opposed to 'set' always > overwrites old value, regardless of the type) > - dedicated page for rscript control-structures > - detailed documentation around behaviour of rscript 'set', 'unset' and > 'reset' > > The patch-sets go in the following order: > https://github.com/rsyslog/liblognorm/pull/9 > https://github.com/rsyslog/rsyslog/pull/149 > https://github.com/rsyslog/rsyslog-doc/pull/98 > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Tue Dec 2 13:22:59 2014 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Tue, 02 Dec 2014 15:22:59 +0300 Subject: [Lognorm] Pull request for tokenized and regex field types In-Reply-To: References: Message-ID: <547DAF23.4000406@levshin.spb.ru> It looks... complex. I think liblognorm part will be merged this week. Sorry for the delay. -- Pavel 02.12.2014 14:03, singh.janmejay: > Does it look ok? > > On Tue, Nov 25, 2014 at 5:50 PM, singh.janmejay > > wrote: > > Hi, > > These patch-sets complete tokenized field-type and regex > field-type support + the rscript side features required for > effective use of json-arrays in rulesets. > > Some other changes include fixes for memory-leak and invalid > memory access in liblognorm in non-happy-path flows + > testing-setup for liblognorm (with optional and transparent > valgrind support). > > Summary: > - tokenized field-type (integration-tests with rsyslog, > documentation etc) > - regex support (tests, integration-tests, documentation) > - memory access/leak bug fixes > - testing env setup for liblognorm > - rscript support for json-array subscripting and 'foreach' loop > - rscript support for 'reset' statement, which as opposed to 'set' > always overwrites old value, regardless of the type) > - dedicated page for rscript control-structures > - detailed documentation around behaviour of rscript 'set', > 'unset' and 'reset' > > The patch-sets go in the following order: > https://github.com/rsyslog/liblognorm/pull/9 > https://github.com/rsyslog/rsyslog/pull/149 > https://github.com/rsyslog/rsyslog-doc/pull/98 > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > > > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgerhards at hq.adiscon.com Tue Dec 2 13:59:44 2014 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 2 Dec 2014 13:59:44 +0100 Subject: [Lognorm] Pull request for tokenized and regex field types In-Reply-To: <547DAF23.4000406@levshin.spb.ru> References: <547DAF23.4000406@levshin.spb.ru> Message-ID: 2014-12-02 13:22 GMT+01:00 Pavel Levshin : > > It looks... complex. > > I think liblognorm part will be merged this week. Sorry for the delay. > > Thanks for looking at this. Slightly off-topic: Now that the rsyslog 8.6.0 is released, I will also try to find time to look at the other patches. Looking at the volume and size, this may also take a while. But at least I am trying to get at it. Goal is to have them in the January release, or March at latest. Rainer > > -- > Pavel > > 02.12.2014 14:03, singh.janmejay: > > Does it look ok? > > On Tue, Nov 25, 2014 at 5:50 PM, singh.janmejay > wrote: > >> Hi, >> >> These patch-sets complete tokenized field-type and regex field-type >> support + the rscript side features required for effective use of >> json-arrays in rulesets. >> >> Some other changes include fixes for memory-leak and invalid memory >> access in liblognorm in non-happy-path flows + testing-setup for liblognorm >> (with optional and transparent valgrind support). >> >> Summary: >> - tokenized field-type (integration-tests with rsyslog, documentation >> etc) >> - regex support (tests, integration-tests, documentation) >> - memory access/leak bug fixes >> - testing env setup for liblognorm >> - rscript support for json-array subscripting and 'foreach' loop >> - rscript support for 'reset' statement, which as opposed to 'set' >> always overwrites old value, regardless of the type) >> - dedicated page for rscript control-structures >> - detailed documentation around behaviour of rscript 'set', 'unset' and >> 'reset' >> >> The patch-sets go in the following order: >> https://github.com/rsyslog/liblognorm/pull/9 >> https://github.com/rsyslog/rsyslog/pull/149 >> https://github.com/rsyslog/rsyslog-doc/pull/98 >> >> -- >> Regards, >> Janmejay >> http://codehunk.wordpress.com >> > > > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > > _______________________________________________ > Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Sat Dec 6 22:03:40 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Sun, 7 Dec 2014 02:33:40 +0530 Subject: [Lognorm] Race condition in deletion of json-objects with 'event.tags' Message-ID: Little bit of an elaborate story. So, I was fighting a segfault for last few hours. It was happening in freeing-up of msg. Here is backtrace (it was the same in many core-dumps): (gdb) bt #0 0x0000000100000019 in ?? () #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at linkhash.c:116 #2 0x00007fa23ee9a6fd in json_object_object_delete (jso=0x7fa22967bf50) at json_object.c:273 #3 0x00007fa23ee9d536 in lh_table_free (t=0x7fa2009ac960) at linkhash.c:116 #4 0x00007fa23ee9a6fd in json_object_object_delete (jso=0x7fa22dcd8a70) at json_object.c:273 #5 0x000000000041bef7 in msgDestruct (ppThis=0x7fa2317f9b48) at msg.c:928 #6 0x0000000000441bd4 in DeleteProcessedBatch (pThis=0xa7eb90, pBatch=0xac35a8) at queue.c:1589 #7 0x0000000000441c9e in DequeueConsumableElements (pThis=0xa7eb90, pWti=0xac3570, piRemainingQueueSize=0x7fa2317f9bd8) at queue.c:1626 #8 0x0000000000441e6d in DequeueConsumable (pThis=0xa7eb90, pWti=0xac3570) at queue.c:1680 #9 0x000000000044226c in DequeueForConsumer (pThis=0xa7eb90, pWti=0xac3570) at queue.c:1821 #10 0x0000000000442502 in ConsumerReg (pThis=0xa7eb90, pWti=0xac3570) at queue.c:1875 #11 0x000000000043d95a in wtiWorker (pThis=0xac3570) at wti.c:334 #12 0x000000000043c51f in wtpWorker (arg=0xac3570) at wtp.c:389 #13 0x00007fa23f6b6b50 in start_thread (arg=) at pthread_create.c:304 #14 0x00007fa23e7de7bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #15 0x0000000000000000 in ?? () Now, turns out, the thing lh_table_free is trying to free(#1), is basically an object with several keys, and whenever it failed, it failed on the last one. How do I know that? (gdb) up 1 #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at linkhash.c:116 116 linkhash.c: No such file or directory. (gdb) p *c $1 = {k = 0x7fa22dcd8a50, v = 0xa6bd20, next = 0x0, prev = 0x7fa22e93d490} And out of 2 rulebases(both being called for events of roughly the same frequency), every single segfault was for the rulebase that was using tags on rule. The rulebase in question had 2 rules, matching several fields, and both rules had tags applied. Note that ln_normalize applies event.tags only when tags are present, and insert of new key in json-c linked-hash-map goes in the end of the linked-list (check lh_table_insert). So, this key-value pair was supposed to be 'event.tags' and corresponding tags. The data that c was pointing to was garbled, so I couldn't tell what the key actually is, but I can conclude it was 'event.tags' based on size of objects. The garbling may be because the memory allocated for key was reused(note key is freed-up before value, check json_object_lh_entry_free). Considering it is allocated in a different thread (when message is parsed using mmnormalize), it is possible (as opposed to an object allocated in the same thread, using the same virtual-address range from one thread only). Now, what makes it a race condition? Not that event.tags object in ln_normalize is not duplicated, instead it is the same object with incrementing reference count. Not freeing routine does the counter check without locking or using atomic operation, check json_object_put. The only lock that msgDestruct has, is on msg->mut, which means different messages being processed at the same time can get that lock and concurrently try to free 'event.tags' object. With careful timing, some of them may concurrently decrement the counter and one of them may free the object. Now segfault can happen either because someone accesses it, or because someone tries to free it again etc. For me, the segfault was reproducing every few hours. The solution is simple, we need to duplicate the event.tags object instead of passing it on with incrementing reference count. I'll fix it, but before jumping to fix need another (few?) pairs of eyes to check correctness in reasoning (to point out any mistakes in analysis and ensure I haven't overlooked anything and misdiagnosed). Can someone please help validate/correct? -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Mon Dec 8 23:50:14 2014 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Tue, 09 Dec 2014 01:50:14 +0300 Subject: [Lognorm] Race condition in deletion of json-objects with 'event.tags' In-Reply-To: References: Message-ID: <54862B26.50206@levshin.spb.ru> Your analysis is most probably correct. As far as I remember, this object reuse was confusing to me. But there were no segfaults at the moment. Object reuse is intended to reduce memory allocations/frees, which have big impact on performance. On the other side, locks are expensive too. Even unsuitable for the case. Can you confirm that just duplication of event.tags solves your segfaults? -- Pavel 07.12.2014 0:03, singh.janmejay: > Little bit of an elaborate story. > > So, I was fighting a segfault for last few hours. It was happening in > freeing-up of msg. Here is backtrace (it was the same in many core-dumps): > > (gdb) bt > #0 0x0000000100000019 in ?? () > #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at > linkhash.c:116 > #2 0x00007fa23ee9a6fd in json_object_object_delete > (jso=0x7fa22967bf50) at json_object.c:273 > #3 0x00007fa23ee9d536 in lh_table_free (t=0x7fa2009ac960) at > linkhash.c:116 > #4 0x00007fa23ee9a6fd in json_object_object_delete > (jso=0x7fa22dcd8a70) at json_object.c:273 > #5 0x000000000041bef7 in msgDestruct (ppThis=0x7fa2317f9b48) at msg.c:928 > #6 0x0000000000441bd4 in DeleteProcessedBatch (pThis=0xa7eb90, > pBatch=0xac35a8) at queue.c:1589 > #7 0x0000000000441c9e in DequeueConsumableElements (pThis=0xa7eb90, > pWti=0xac3570, piRemainingQueueSize=0x7fa2317f9bd8) at queue.c:1626 > #8 0x0000000000441e6d in DequeueConsumable (pThis=0xa7eb90, > pWti=0xac3570) at queue.c:1680 > #9 0x000000000044226c in DequeueForConsumer (pThis=0xa7eb90, > pWti=0xac3570) at queue.c:1821 > #10 0x0000000000442502 in ConsumerReg (pThis=0xa7eb90, pWti=0xac3570) > at queue.c:1875 > #11 0x000000000043d95a in wtiWorker (pThis=0xac3570) at wti.c:334 > #12 0x000000000043c51f in wtpWorker (arg=0xac3570) at wtp.c:389 > #13 0x00007fa23f6b6b50 in start_thread (arg=) at > pthread_create.c:304 > #14 0x00007fa23e7de7bd in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #15 0x0000000000000000 in ?? () > > > Now, turns out, the thing lh_table_free is trying to free(#1), is > basically an object with several keys, and whenever it failed, it > failed on the last one. How do I know that? > > (gdb) up 1 > #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at > linkhash.c:116 > 116 linkhash.c: No such file or directory. > (gdb) p *c > $1 = {k = 0x7fa22dcd8a50, v = 0xa6bd20, next = 0x0, prev = 0x7fa22e93d490} > > And out of 2 rulebases(both being called for events of roughly the > same frequency), every single segfault was for the rulebase that was > using tags on rule. > > The rulebase in question had 2 rules, matching several fields, and > both rules had tags applied. > > Note that ln_normalize applies event.tags only when tags are present, > and insert of new key in json-c linked-hash-map goes in the end of the > linked-list (check lh_table_insert). > > So, this key-value pair was supposed to be 'event.tags' and > corresponding tags. The data that c was pointing to was garbled, so I > couldn't tell what the key actually is, but I can conclude it was > 'event.tags' based on size of objects. The garbling may be because the > memory allocated for key was reused(note key is freed-up before value, > check json_object_lh_entry_free). Considering it is allocated in a > different thread (when message is parsed using mmnormalize), it is > possible (as opposed to an object allocated in the same thread, using > the same virtual-address range from one thread only). > > Now, what makes it a race condition? Not that event.tags object in > ln_normalize is not duplicated, instead it is the same object with > incrementing reference count. Not freeing routine does the counter > check without locking or using atomic operation, check json_object_put. > > The only lock that msgDestruct has, is on msg->mut, which means > different messages being processed at the same time can get that lock > and concurrently try to free 'event.tags' object. > > With careful timing, some of them may concurrently decrement the > counter and one of them may free the object. Now segfault can happen > either because someone accesses it, or because someone tries to free > it again etc. > > For me, the segfault was reproducing every few hours. > > The solution is simple, we need to duplicate the event.tags object > instead of passing it on with incrementing reference count. > > I'll fix it, but before jumping to fix need another (few?) pairs of > eyes to check correctness in reasoning (to point out any mistakes in > analysis and ensure I haven't overlooked anything and misdiagnosed). > > Can someone please help validate/correct? > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Tue Dec 9 00:19:47 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Tue, 9 Dec 2014 04:49:47 +0530 Subject: [Lognorm] Race condition in deletion of json-objects with 'event.tags' In-Reply-To: <54862B26.50206@levshin.spb.ru> References: <54862B26.50206@levshin.spb.ru> Message-ID: Sorry, I forgot to post that once I removed the tags from each rule, segfaults stopped. It ran for more than 24 hours after which it was restated deliberately for some deployment and config change etc. I'll duplicate it and deploy new version and post what happens. -- Regards, Janmejay PS: Please blame the typos in this mail on my phone's uncivilized soft keyboard sporting it's not-so-smart-assist technology. On Dec 9, 2014 4:20 AM, "Pavel Levshin" wrote: > > Your analysis is most probably correct. As far as I remember, this object > reuse was confusing to me. But there were no segfaults at the moment. > > Object reuse is intended to reduce memory allocations/frees, which have > big impact on performance. On the other side, locks are expensive too. Even > unsuitable for the case. > > Can you confirm that just duplication of event.tags solves your segfaults? > > > -- > Pavel > > > 07.12.2014 0:03, singh.janmejay: > > Little bit of an elaborate story. > > So, I was fighting a segfault for last few hours. It was happening in > freeing-up of msg. Here is backtrace (it was the same in many core-dumps): > > (gdb) bt > #0 0x0000000100000019 in ?? () > #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at > linkhash.c:116 > #2 0x00007fa23ee9a6fd in json_object_object_delete (jso=0x7fa22967bf50) > at json_object.c:273 > #3 0x00007fa23ee9d536 in lh_table_free (t=0x7fa2009ac960) at > linkhash.c:116 > #4 0x00007fa23ee9a6fd in json_object_object_delete (jso=0x7fa22dcd8a70) > at json_object.c:273 > #5 0x000000000041bef7 in msgDestruct (ppThis=0x7fa2317f9b48) at msg.c:928 > #6 0x0000000000441bd4 in DeleteProcessedBatch (pThis=0xa7eb90, > pBatch=0xac35a8) at queue.c:1589 > #7 0x0000000000441c9e in DequeueConsumableElements (pThis=0xa7eb90, > pWti=0xac3570, piRemainingQueueSize=0x7fa2317f9bd8) at queue.c:1626 > #8 0x0000000000441e6d in DequeueConsumable (pThis=0xa7eb90, > pWti=0xac3570) at queue.c:1680 > #9 0x000000000044226c in DequeueForConsumer (pThis=0xa7eb90, > pWti=0xac3570) at queue.c:1821 > #10 0x0000000000442502 in ConsumerReg (pThis=0xa7eb90, pWti=0xac3570) at > queue.c:1875 > #11 0x000000000043d95a in wtiWorker (pThis=0xac3570) at wti.c:334 > #12 0x000000000043c51f in wtpWorker (arg=0xac3570) at wtp.c:389 > #13 0x00007fa23f6b6b50 in start_thread (arg=) at > pthread_create.c:304 > #14 0x00007fa23e7de7bd in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #15 0x0000000000000000 in ?? () > > > Now, turns out, the thing lh_table_free is trying to free(#1), is > basically an object with several keys, and whenever it failed, it failed on > the last one. How do I know that? > > (gdb) up 1 > #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at > linkhash.c:116 > 116 linkhash.c: No such file or directory. > (gdb) p *c > $1 = {k = 0x7fa22dcd8a50, v = 0xa6bd20, next = 0x0, prev = 0x7fa22e93d490} > > And out of 2 rulebases(both being called for events of roughly the same > frequency), every single segfault was for the rulebase that was using tags > on rule. > > The rulebase in question had 2 rules, matching several fields, and both > rules had tags applied. > > Note that ln_normalize applies event.tags only when tags are present, > and insert of new key in json-c linked-hash-map goes in the end of the > linked-list (check lh_table_insert). > > So, this key-value pair was supposed to be 'event.tags' and > corresponding tags. The data that c was pointing to was garbled, so I > couldn't tell what the key actually is, but I can conclude it was > 'event.tags' based on size of objects. The garbling may be because the > memory allocated for key was reused(note key is freed-up before value, > check json_object_lh_entry_free). Considering it is allocated in a > different thread (when message is parsed using mmnormalize), it is possible > (as opposed to an object allocated in the same thread, using the same > virtual-address range from one thread only). > > Now, what makes it a race condition? Not that event.tags object in > ln_normalize is not duplicated, instead it is the same object with > incrementing reference count. Not freeing routine does the counter check > without locking or using atomic operation, check json_object_put. > > The only lock that msgDestruct has, is on msg->mut, which means > different messages being processed at the same time can get that lock and > concurrently try to free 'event.tags' object. > > With careful timing, some of them may concurrently decrement the counter > and one of them may free the object. Now segfault can happen either because > someone accesses it, or because someone tries to free it again etc. > > For me, the segfault was reproducing every few hours. > > The solution is simple, we need to duplicate the event.tags object > instead of passing it on with incrementing reference count. > > I'll fix it, but before jumping to fix need another (few?) pairs of eyes > to check correctness in reasoning (to point out any mistakes in analysis > and ensure I haven't overlooked anything and misdiagnosed). > > Can someone please help validate/correct? > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > > _______________________________________________ > Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Tue Dec 9 00:23:12 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Tue, 9 Dec 2014 04:53:12 +0530 Subject: [Lognorm] Race condition in deletion of json-objects with 'event.tags' In-Reply-To: References: <54862B26.50206@levshin.spb.ru> Message-ID: + rsyslog-users (FYI) -- Regards, Janmejay PS: Please blame the typos in this mail on my phone's uncivilized soft keyboard sporting it's not-so-smart-assist technology. On Dec 9, 2014 4:49 AM, "singh.janmejay" wrote: > Sorry, I forgot to post that once I removed the tags from each rule, > segfaults stopped. It ran for more than 24 hours after which it was > restated deliberately for some deployment and config change etc. > > I'll duplicate it and deploy new version and post what happens. > > -- > Regards, > Janmejay > > PS: Please blame the typos in this mail on my phone's uncivilized soft > keyboard sporting it's not-so-smart-assist technology. > > On Dec 9, 2014 4:20 AM, "Pavel Levshin" wrote: > >> >> Your analysis is most probably correct. As far as I remember, this object >> reuse was confusing to me. But there were no segfaults at the moment. >> >> Object reuse is intended to reduce memory allocations/frees, which have >> big impact on performance. On the other side, locks are expensive too. Even >> unsuitable for the case. >> >> Can you confirm that just duplication of event.tags solves your >> segfaults? >> >> >> -- >> Pavel >> >> >> 07.12.2014 0:03, singh.janmejay: >> >> Little bit of an elaborate story. >> >> So, I was fighting a segfault for last few hours. It was happening in >> freeing-up of msg. Here is backtrace (it was the same in many core-dumps): >> >> (gdb) bt >> #0 0x0000000100000019 in ?? () >> #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at >> linkhash.c:116 >> #2 0x00007fa23ee9a6fd in json_object_object_delete (jso=0x7fa22967bf50) >> at json_object.c:273 >> #3 0x00007fa23ee9d536 in lh_table_free (t=0x7fa2009ac960) at >> linkhash.c:116 >> #4 0x00007fa23ee9a6fd in json_object_object_delete (jso=0x7fa22dcd8a70) >> at json_object.c:273 >> #5 0x000000000041bef7 in msgDestruct (ppThis=0x7fa2317f9b48) at msg.c:928 >> #6 0x0000000000441bd4 in DeleteProcessedBatch (pThis=0xa7eb90, >> pBatch=0xac35a8) at queue.c:1589 >> #7 0x0000000000441c9e in DequeueConsumableElements (pThis=0xa7eb90, >> pWti=0xac3570, piRemainingQueueSize=0x7fa2317f9bd8) at queue.c:1626 >> #8 0x0000000000441e6d in DequeueConsumable (pThis=0xa7eb90, >> pWti=0xac3570) at queue.c:1680 >> #9 0x000000000044226c in DequeueForConsumer (pThis=0xa7eb90, >> pWti=0xac3570) at queue.c:1821 >> #10 0x0000000000442502 in ConsumerReg (pThis=0xa7eb90, pWti=0xac3570) at >> queue.c:1875 >> #11 0x000000000043d95a in wtiWorker (pThis=0xac3570) at wti.c:334 >> #12 0x000000000043c51f in wtpWorker (arg=0xac3570) at wtp.c:389 >> #13 0x00007fa23f6b6b50 in start_thread (arg=) at >> pthread_create.c:304 >> #14 0x00007fa23e7de7bd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #15 0x0000000000000000 in ?? () >> >> >> Now, turns out, the thing lh_table_free is trying to free(#1), is >> basically an object with several keys, and whenever it failed, it failed on >> the last one. How do I know that? >> >> (gdb) up 1 >> #1 0x00007fa23ee9d536 in lh_table_free (t=0x7fa22a28bc70) at >> linkhash.c:116 >> 116 linkhash.c: No such file or directory. >> (gdb) p *c >> $1 = {k = 0x7fa22dcd8a50, v = 0xa6bd20, next = 0x0, prev = 0x7fa22e93d490} >> >> And out of 2 rulebases(both being called for events of roughly the same >> frequency), every single segfault was for the rulebase that was using tags >> on rule. >> >> The rulebase in question had 2 rules, matching several fields, and both >> rules had tags applied. >> >> Note that ln_normalize applies event.tags only when tags are present, >> and insert of new key in json-c linked-hash-map goes in the end of the >> linked-list (check lh_table_insert). >> >> So, this key-value pair was supposed to be 'event.tags' and >> corresponding tags. The data that c was pointing to was garbled, so I >> couldn't tell what the key actually is, but I can conclude it was >> 'event.tags' based on size of objects. The garbling may be because the >> memory allocated for key was reused(note key is freed-up before value, >> check json_object_lh_entry_free). Considering it is allocated in a >> different thread (when message is parsed using mmnormalize), it is possible >> (as opposed to an object allocated in the same thread, using the same >> virtual-address range from one thread only). >> >> Now, what makes it a race condition? Not that event.tags object in >> ln_normalize is not duplicated, instead it is the same object with >> incrementing reference count. Not freeing routine does the counter check >> without locking or using atomic operation, check json_object_put. >> >> The only lock that msgDestruct has, is on msg->mut, which means >> different messages being processed at the same time can get that lock and >> concurrently try to free 'event.tags' object. >> >> With careful timing, some of them may concurrently decrement the >> counter and one of them may free the object. Now segfault can happen either >> because someone accesses it, or because someone tries to free it again etc. >> >> For me, the segfault was reproducing every few hours. >> >> The solution is simple, we need to duplicate the event.tags object >> instead of passing it on with incrementing reference count. >> >> I'll fix it, but before jumping to fix need another (few?) pairs of >> eyes to check correctness in reasoning (to point out any mistakes in >> analysis and ensure I haven't overlooked anything and misdiagnosed). >> >> Can someone please help validate/correct? >> >> -- >> Regards, >> Janmejay >> http://codehunk.wordpress.com >> >> >> _______________________________________________ >> Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm >> >> >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Mon Dec 15 11:56:50 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Mon, 15 Dec 2014 16:26:50 +0530 Subject: [Lognorm] Patches for 'recursive' field type In-Reply-To: References: Message-ID: Created the PR: https://github.com/rsyslog/liblognorm/pull/11 It also adds a descent field, which like recursive parses parts using top level parse-tree, but unlike recursive, it uses a different rulebase. Added type-interpretation too, which comes handy when objects in tokenized or recursively parsed objects need to be specified (it identifies int, base16-int, double and boolean as of now). Calling it interpret and not type-interpret because it can later be used for other things(non type things) too. On Tue, Dec 2, 2014 at 3:58 PM, singh.janmejay wrote: > > We also have a json-comparer as a part of this patch-set. This allows > lognorm tests to use equality tests of json-objects rather than having to > work with fragments of json in order to avoid key ordering problem. > > On Tue, Dec 2, 2014 at 3:56 PM, singh.janmejay > wrote: > >> Hi, >> >> Here are code-changes for 'recursive' field-type: >> https://github.com/janmejay/liblognorm/compare/janmejay:master...recursive?expand=1 >> >> I wanted to put it out for review and discussion, hence this link. >> >> I'll create a pull request once already open pull request(for regex >> support) is wrapped up. Github tries to club old and new changes into one >> pull-request which mixes up multiple changes, so im trying to avoid that. >> >> -- >> Regards, >> Janmejay >> http://codehunk.wordpress.com >> > > > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: