Bug 1222746

Summary: rsyslog crashes while writing to disk queue (MsgSerialize)
Product: Red Hat Enterprise Linux 7 Reporter: Susant Sahani <ssahani>
Component: rsyslogAssignee: Radovan Sroka <rsroka>
Status: CLOSED ERRATA QA Contact: Stefan Dordevic <sdordevi>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: cww, moagrawa, pvrabec, rsroka, rvdwees, ssahani
Target Milestone: rcKeywords: Patch
Target Release: 7.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 05:35:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1203710, 1295396, 1296594, 1313485    
Attachments:
Description Flags
01446126-Conf
none
Full rsyslog configuration is attached
none
Reproducer
none
patch
none
updated patch
none
Script to reproduce the crash none

Description Susant Sahani 2015-05-19 04:20:54 UTC
Description of problem:

rsyslog crashes while writing to disk queue (MsgSerialize)
Due to double free.

Version-Release number of selected component (if applicable):
rsyslog-debuginfo-7.4.7-7.el7_0.x86_64
rsyslog-7.4.7-7.el7_0.x86_64

json-c-0.11-4.el7_0.x86_64


How reproducible:

At customer side.

Actual results:
rsyslog crashes

Expected results:

rsyslog should not crash

Additional info:

coredump and sos

Comment 1 Susant Sahani 2015-05-19 04:21:25 UTC
(gdb) bt
#0  0x00007f3b370475d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f3b37048cc8 in __GI_abort () at abort.c:90
#2  0x00007f3b38698a08 in sigsegvHdlr (signum=6) at debug.c:849
#3  <signal handler called>
#4  0x00007f3b370475d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5  0x00007f3b37048cc8 in __GI_abort () at abort.c:90
#6  0x00007f3b37087e07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f3b371908c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#7  0x00007f3b3708f1fd in malloc_printerr (ptr=<optimized out>, str=0x7f3b371909a0 "double free or corruption (!prev)", action=3) at malloc.c:4972


Double free. The malloc chunk is corrupted now.

#8  _int_free (av=0x7f3b373cc760 <main_arena>, p=<optimized out>, have_lock=1) at malloc.c:3804
#9  0x00007f3b37091a50 in _int_realloc (av=av@entry=0x7f3b373cc760 <main_arena>, oldp=oldp@entry=0x7f3b0c001a30, oldsize=oldsize@entry=400, nb=nb@entry=752) at malloc.c:4295
#10 0x00007f3b37092702 in __GI___libc_realloc (oldmem=0x7f3b0c001a40, bytes=bytes@entry=737) at malloc.c:2998

We got a abort on frame 10 on glibc


#11 0x00007f3b377f58a0 in printbuf_extend (min_size=<optimized out>, p=0x7f3b0c001a20) at printbuf.c:72

We trying to extend the buffer which essentially calls to realloc


#12 printbuf_memappend (p=p@entry=0x7f3b0c001a20, buf=buf@entry=0x7f3b31bdea30 "\"", size=size@entry=1) at printbuf.c:82
#13 0x00007f3b377f5a56 in sprintbuf (p=p@entry=0x7f3b0c001a20, msg=msg@entry=0x7f3b377f5c79 "\"") at printbuf.c:175
#14 0x00007f3b377f194d in json_object_string_to_json_string (jso=0x7f3b2c1bf9b0, pb=0x7f3b0c001a20, level=<optimized out>, flags=<optimized out>) at json_object.c:622
#15 0x00007f3b377f1ebd in json_object_object_to_json_string (jso=<optimized out>, pb=0x7f3b0c001a20, level=0, flags=1) at json_object.c:325
#16 0x00007f3b377f1d5c in json_object_to_json_string_ext (jso=0x7f3b2c0d9440, flags=flags@entry=1) at json_object.c:270
#17 0x00007f3b377f1daa in json_object_to_json_string (jso=<optimized out>) at json_object.c:280
#18 0x00007f3b377f25a5 in json_object_get_string (jso=<optimized out>) at json_object.c:665

#19 0x00007f3b38690e60 in MsgSerialize (pThis=0x7f3b2c1bffc0, pStrm=0x7f3b3a0d2e60) at msg.c:1068 <====== 

rsyslog is trying  to write to disk queue here. It does use the json libray to do so.

#20 0x00007f3b386a230c in qAddDisk (pThis=0x7f3b3a0d2b10, pMsg=0x7f3b2c1bffc0) at queue.c:902
#21 0x00007f3b386a5eee in qqueueAdd (pMsg=<optimized out>, pThis=0x7f3b3a0d2b10) at queue.c:1024
#22 doEnqSingleObj (pMsg=0x7f3b2c1bffc0, flowCtlType=eFLOWCTL_NO_DELAY, pThis=0x7f3b3a0d2b10) at queue.c:2637
#23 qqueueEnqMsg (pThis=0x7f3b3a0d2b10, flowCtlType=flowCtlType@entry=eFLOWCTL_NO_DELAY, pMsg=<optimized out>) at queue.c:2738
#24 0x00007f3b386a6155 in ConsumerDA (pThis=0x7f3b3a0c40d0, pWti=0x7f3b3a0d21d0) at queue.c:1922
#25 0x00007f3b386a1ad8 in wtiWorker (pThis=pThis@entry=0x7f3b3a0d21d0) at wti.c:313
#26 0x00007f3b386a0d61 in wtpWorker (arg=0x7f3b3a0d21d0) at wtp.c:388
#27 0x00007f3b38010df5 in start_thread (arg=0x7f3b31bdf700) at pthread_create.c:308
#28 0x00007f3b371081ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 5 Tomas Heinrich 2015-10-27 14:09:47 UTC
Hello Susant,

thanks for you help.

The daemon definitely shouldn't crash but if it doesn't happen often, it will be hard to pinpoint the underlying issue.

We don't need backtraces as much as we need a reliable reproducer. Do we have access to their configuration?

Comment 6 Susant Sahani 2015-10-28 03:13:23 UTC
Created attachment 1087135 [details]
01446126-Conf

Comment 9 Mohit Agrawal 2015-11-24 04:29:53 UTC
Created attachment 1097995 [details]
Full rsyslog configuration is attached

Comment 10 Radovan Sroka 2016-02-08 12:56:50 UTC
Created attachment 1122166 [details]
Reproducer

Here is the reproducer.

Comment 11 Radovan Sroka 2016-02-10 16:14:22 UTC
Created attachment 1122840 [details]
patch

This patch introduces new mutex inside of message structure to handle totally thread unsafe library json-c and locks suspicious json function calls.

Comment 13 Radovan Sroka 2016-02-15 14:07:22 UTC
Created attachment 1127297 [details]
updated patch

changed type from int to rsRetVal

Comment 19 Tomas Heinrich 2016-07-18 17:24:17 UTC
Created attachment 1181194 [details]
Script to reproduce the crash

Comment 26 errata-xmlrpc 2016-11-04 05:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2401.html