RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1549706 - rsyslog SEGV with disk-assisted queues, possible corruption
Summary: rsyslog SEGV with disk-assisted queues, possible corruption
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rsyslog
Version: 7.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Jiří Vymazal
QA Contact: Dalibor Pospíšil
URL:
Whiteboard:
Depends On:
Blocks: 1477664 1738213
TreeView+ depends on / blocked
 
Reported: 2018-02-27 16:59 UTC by Peter Portante
Modified: 2023-09-07 19:04 UTC (History)
14 users (show)

Fixed In Version: rsyslog-8.24.0-48.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1738213 (view as bug list)
Environment:
Last Closed: 2020-03-31 19:10:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rsyslogd killed by SIGSEGV (180.17 KB, text/plain)
2018-02-27 16:59 UTC, Peter Portante
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/rsyslog rsyslog pull 1647 0 None None None 2020-09-24 16:16:58 UTC
Red Hat Bugzilla 1511485 1 None None None 2021-12-10 15:23:59 UTC
Red Hat Knowledge Base (Solution) 4393091 0 None None None 2019-09-17 10:06:11 UTC
Red Hat Product Errata RHSA-2020:1000 0 None None None 2020-03-31 19:10:48 UTC

Internal Links: 1511485

Description Peter Portante 2018-02-27 16:59:56 UTC
Created attachment 1401462 [details]
rsyslogd killed by SIGSEGV

With rsyslog-8.24.0-12.el7_4.1 (the customer support exception image), we need repeatedly encountered SEGV related to the on-disk queues configured.

See the attached SEGV abrt report.

We put this version of rsyslog onto our 12 production machines and we have encountered this crash on 2 of the nodes.  We don't think it is related to the patch this image was generated for (duplicate logs), as we encounter the same stack track when we use the regular version of rsyslog on RHEL 7.4.

[root@gprfc053 ~]# gdb /usr/sbin/rsyslogd
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7_4.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/rsyslogd...Reading symbols from /usr/lib/debug/usr/sbin/rsyslogd.debug...done.
done.
(gdb) run -n
Starting program: /usr/sbin/rsyslogd -n
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffed86d2700 (LWP 45053)]
[New Thread 0x7ffed7ed1700 (LWP 45054)]
[New Thread 0x7ffed60ca700 (LWP 45055)]
[New Thread 0x7ffed58c9700 (LWP 45056)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffed86d2700 (LWP 45053)]
rsCStrDestruct (ppThis=ppThis@entry=0x7ffed00168f0) at stringbuf.c:216
216		free(pThis->pBuf);
(gdb) list
211
212	void rsCStrDestruct(cstr_t **ppThis)
213	{
214		cstr_t *pThis = *ppThis;
215
216		free(pThis->pBuf);
217		RSFREEOBJ(pThis);
218		*ppThis = NULL;
219	}
220
(gdb) up
#1  0x000055555558a468 in varDestruct (ppThis=0x7ffed86d1bc0) at var.c:72
72				rsCStrDestruct(&pThis->val.pStr);
(gdb) list
67	CODESTARTobjDestruct(var)
68		if(pThis->pcsName != NULL)
69			rsCStrDestruct(&pThis->pcsName);
70		if(pThis->varType == VARTYPE_STR) {
71			if(pThis->val.pStr != NULL)
72				rsCStrDestruct(&pThis->val.pStr);
73		}
74	ENDobjDestruct(var)
75
76
(gdb) up
#2  0x0000555555576314 in MsgDeserialize (pMsg=0x7ffed0019680, pStrm=pStrm@entry=0x555555826ed0) at msg.c:1405
1405			var.Destruct(&pVar);
(gdb) list
1400		if(!isProp("offMSG"))
1401			ABORT_FINALIZE(RS_RET_DS_PROP_SEQ_ERR);
1402		MsgSetMSGoffs(pMsg, pVar->val.num);
1403	finalize_it:
1404		if(pVar != NULL)
1405			var.Destruct(&pVar);
1406		RETiRet;
1407	}
1408	#undef isProp
1409
(gdb) where
#0  rsCStrDestruct (ppThis=ppThis@entry=0x7ffed00168f0) at stringbuf.c:216
#1  0x000055555558a468 in varDestruct (ppThis=0x7ffed86d1bc0) at var.c:72
#2  0x0000555555576314 in MsgDeserialize (pMsg=0x7ffed0019680, pStrm=pStrm@entry=0x555555826ed0) at msg.c:1405
#3  0x00005555555822a9 in objDeserializeWithMethods (ppObj=0x7ffed86d1cf0, pszTypeExpected=pszTypeExpected@entry=0x5555555bbfe6 "msg", lenTypeExpected=lenTypeExpected@entry=3, pStrm=0x555555826ed0, fFixup=fFixup@entry=0x0, pUsr=pUsr@entry=0x0, objConstruct=0x555555574100 <msgConstructForDeserializer>, objConstructFinalize=objConstructFinalize@entry=0x0, objDeserialize=0x5555555762b0 <MsgDeserialize>) at obj.c:915
#4  0x000055555558c8b8 in qDeqDisk (pThis=<optimized out>, ppMsg=<optimized out>) at queue.c:988
#5  0x000055555558e741 in qqueueDeq (ppMsg=0x7ffed86d1cf0, pThis=0x555555823080) at queue.c:1115
#6  DequeueConsumableElements (pSkippedMsgs=<optimized out>, piRemainingQueueSize=<synthetic pointer>, pWti=<optimized out>, pThis=0x555555823080) at queue.c:1713
#7  DequeueConsumable (pThis=pThis@entry=0x555555823080, pWti=pWti@entry=0x5555558235d0, pSkippedMsgs=pSkippedMsgs@entry=0x7ffed86d1d44) at queue.c:1767
#8  0x000055555558ea92 in DequeueForConsumer (pSkippedMsgs=0x7ffed86d1d44, pWti=0x5555558235d0, pThis=0x555555823080) at queue.c:1919
#9  ConsumerReg (pThis=0x555555823080, pWti=0x5555558235d0) at queue.c:1975
#10 0x000055555558b66e in wtiWorker (pThis=pThis@entry=0x5555558235d0) at wti.c:365
#11 0x000055555558a7dd in wtpWorker (arg=0x5555558235d0) at wtp.c:410
#12 0x00007ffff79b0e25 in start_thread (arg=0x7ffed86d2700) at pthread_create.c:308
#13 0x00007ffff6aa934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 3 Peter Portante 2018-02-27 19:38:28 UTC
Could this be the same as https://access.redhat.com/solutions/882893?  I would think not, but it does match the stack trace.

Comment 4 Peter Portante 2018-02-28 00:00:59 UTC
Perhaps another clue: when I deleted the first queue file, and edited the .qi file to reference the next queue file, it processed many more queue files until is SEGV'd again.  When I then deleted that queue file, it continued until a 3rd one.  Finally is processed the remaining few hundred of them (1 GB max on disk, with 1 MB files) until it caught up.

I saved the state of the /var/lib/rsyslog directory before making these changes.  There are 1,026 files in /var/lib/rsyslog:

[/var/lib/rsyslog.bz1549706]# ls -1 | wc -l
1026
[/var/lib/rsyslog.bz1549706]# ls -l | head -n 3
total 1052060
-rw-------. 1 root root     126 Feb 23 01:27 imjournal.state
-rw-------. 1 root root 1051359 Jan 30 13:10 ViaQ_Collector_01.00055319
[/var/lib/rsyslog.bz1549706]# ls -l | tail -n 3
-rw-------. 1 root root 1051129 Feb 22 19:53 ViaQ_Collector_01.00056341
-rw-------. 1 root root  354308 Feb 23 01:25 ViaQ_Collector_01.00056342
-rw-------. 1 root root     572 Feb 22 19:53 ViaQ_Collector_01.qi
[/var/lib/rsyslog.bz1549706]# cat ViaQ_Collector_01.qi
<OPB:1:qqueue:1:
+iQueueSize:2:6:323723:
+tVars.disk.sizeOnDisk:2:10:1073394353:
>End
.
<Obj:1:strm:1:
+iCurrFNum:2:5:56342:
+pszFName:1:17:ViaQ_Collector_01:
+iMaxFiles:2:8:10000000:
+bDeleteOnClose:2:1:0:
+sType:2:1:1:
+tOperationsMode:2:1:2:
+tOpenMode:2:3:384:
+iCurrOffs:2:4:2447:
+inode:2:1:0:
+bPrevWasNL:2:1:0:
>End
.
<Obj:1:strm:1:
+iCurrFNum:2:5:55319:
+pszFName:1:17:ViaQ_Collector_01:
+iMaxFiles:2:8:10000000:
+bDeleteOnClose:2:1:1:
+sType:2:1:1:
+tOperationsMode:2:1:1:
+tOpenMode:2:3:384:
+iCurrOffs:2:6:432905:
+inode:2:7:1060046:
+bPrevWasNL:2:1:0:
>End
.

Below is our rsyslog configuration:

[/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.conf
# rsyslog v7 for RHEL 7 configuration file

# For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html
# If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html

#### MODULES ####

# The imjournal module below is now used as a message source instead of imuxsock,
# providing access to the systemd journal and /dev/log messages.
$ModLoad imjournal


#### GLOBAL DIRECTIVES ####

# Where to place auxiliary files
$WorkDirectory /var/lib/rsyslog

# Use updated timestamp format for local files
$ActionFileDefaultTemplate RSYSLOG_FileFormat

# We want fully qualified domain names for common logging
$PreserveFQDN on

# File to store the position in the journal
$IMJournalStateFile imjournal.state
$IMJournalPersistStateInterval 10000
$IMJournalRatelimitInterval 1
$IMJournalRatelimitBurst 100000000

# Include all config files in /etc/rsyslog.d/
$IncludeConfig /etc/rsyslog.d/*.conf


#### RULES ####

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                /var/log/messages

# The authpriv file has restricted access.
authpriv.*                                              /var/log/secure

# Log all the mail messages in one place.
mail.*                                                  -/var/log/maillog

# Log cron stuff
cron.*                                                  /var/log/cron

# Save news errors of level crit and higher in a special file.
uucp,news.crit                                          /var/log/spooler

# Save boot messages also to boot.log
local7.*                                                /var/log/boot.log
[/var/lib/rsyslog.bz1549706]# ls /etc/rsyslog.d/
00-viaq-tags.conf           01-viaq-forwarding.conf     99-example-forwarding.conf
[/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.d/00-viaq-tags.conf
# A simple string of "tags" that system administrators can use to identify the
# various roles a system can take on, or categories to consider it under.
#
# E.g.
# set $!tags = "EXAMPLETAG EXAMPLETAGTWO EXAMPLETAGTHREE";
set $!tags = "BAGL perf-dept OSE";
[/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.d/01-viaq-forwarding.conf
# This is basically the RSYSLOG_SyslogProtocol23Format, which is RFC 5424 on
# the wire, but with the message payload a CEE/Lumberjack JSON document.
template(name="ViaQ_SyslogProtocol23Format" type="string"
    string="<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% @cee:%$!%\n")

# Some log messages have not been stored in the JSON variables, so check and
# add them here.
if strlen($!msg) < 1 then {
    set $!msg = $msg;
}

# rsyslog instance in the Integration Lab over the 1Gb public lab switch
$ActionQueueType LinkedList             # run asynchronously
$ActionQueueFileName ViaQ_Collector_01  # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g             # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on           # save messages to disk on shutdown
$ActionResumeRetryCount -1              # infinite retries if host is down
*.*  @@172.17.40.134:514;ViaQ_SyslogProtocol23Format
[/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.d/99-example-forwarding.conf
# ### begin forwarding rule ###
# The statement between the begin ... end define a SINGLE forwarding
# rule. They belong together, do NOT split them. If you create multiple
# forwarding rules, duplicate the whole block!
# Remote Logging (we use TCP for reliable delivery)
#
# An on-disk queue is created for this action. If the remote host is
# down, messages are spooled to disk and sent when it is up again.
#$ActionQueueFileName fwdRule1 # unique name prefix for spool files
#$ActionQueueMaxDiskSpace 1g   # 1gb space limit (use as much as possible)
#$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
#$ActionQueueType LinkedList   # run asynchronously
#$ActionResumeRetryCount -1    # infinite retries if host is down
# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
#*.* @@remote-host:514
# ### end of the forwarding rule ###

Comment 9 Peter Portante 2018-03-01 20:05:31 UTC
Well, the bug is with the queue files, and it seems to be because they are corrupted.  The previous bug, 1511485, does not seem to be the problem.

Here is an example queue file line:

/var/lib/rsyslog/ViaQ_Collector_01.00055319 (line 5137)
+<Obj:1:msg:1:

Full record:

.
<Obj:1:msg:1:
+iProtocolVersion:2:1:0:
+iSeverity:2:1:6:
+iFacility:2:1:1:
+msgFlags:2:1:0:
+ttGenTime:2:10:1517316316:
+<Obj:1:msg:1:
+iProtocolVersion:2:1:0:
+iSeverity:2:1:6:
+iFacility:2:1:1:
+msgFlags:2:1:0:
+ttGenTime:2:10:1517316316:
+tRcvdAt:3:35:2:2018:1:30:12:45:16:668346:6:+:0:0:
+tTIMESTAMP:3:35:2:2018:1:30:12:45:16:668346:6:+:0:0:
+pszTAG:1:8:journal::
+pszRawMsg:1:1423:... removed for anonymity ...:
+pszHOSTNAME:1:15:foo.example.com:
+pszInputName:1:9:imjournal:
+pszRcvFrom:1:15:foo.example.com:
+pszRcvFromIP:1:9:127.0.0.1:
+json:1:4587:... removed for anonymity ...:
+offMSG:2:1:0:
>End
.

Comment 20 Jiří Vymazal 2019-07-31 13:27:00 UTC
This issue was not selected to be included in Red Hat Enterprise Linux 7.7 because it is seen either as low or moderate impact to a small number of use-cases. This release is in Maintenance Support 1 Phase, which means that qualified Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released as they become available. We will now close this issue, but if you believe that it qualifies for the Maintenance Support 1 Phase, please re-open; otherwise, we recommend moving the request to Red Hat Enterprise Linux 8 if applicable.

Comment 21 Christophe Besson 2019-09-04 07:43:44 UTC
Finally succeeded to reproduce the issue in my environment with the latest rsyslog package:
rsyslog-8.24.0-41.el7_7

It happens on RHEL 7.6 as well as RHEL 7.7. The backtrace shows exactly the same:

#0  rsCStrDestruct (ppThis=ppThis@entry=0x7fd874004250) at stringbuf.c:216
#1  0x00005579594a7a08 in varDestruct (ppThis=0x7fd87bbf6bc0) at var.c:72
#2  0x00005579594933c4 in MsgDeserialize (pMsg=0x7fd874004590, pStrm=pStrm@entry=0x55795b4e1710) at msg.c:1407
#3  0x000055795949f439 in objDeserializeWithMethods (ppObj=0x7fd87bbf6cf0, pszTypeExpected=pszTypeExpected@entry=0x5579594d962f "msg", lenTypeExpected=lenTypeExpected@entry=3, 
    pStrm=0x55795b4e1710, fFixup=fFixup@entry=0x0, pUsr=pUsr@entry=0x0, objConstruct=0x5579594911b0 <msgConstructForDeserializer>, objConstructFinalize=objConstructFinalize@entry=0x0, 
    objDeserialize=0x557959493360 <MsgDeserialize>) at obj.c:915
#4  0x00005579594a9e58 in qDeqDisk (pThis=<optimized out>, ppMsg=<optimized out>) at queue.c:988
#5  0x00005579594abd21 in qqueueDeq (ppMsg=0x7fd87bbf6cf0, pThis=0x55795b4da900) at queue.c:1115
#6  DequeueConsumableElements (pSkippedMsgs=<optimized out>, piRemainingQueueSize=<synthetic pointer>, pWti=<optimized out>, pThis=0x55795b4da900) at queue.c:1714
#7  DequeueConsumable (pThis=pThis@entry=0x55795b4da900, pWti=pWti@entry=0x55795b4dada0, pSkippedMsgs=pSkippedMsgs@entry=0x7fd87bbf6d44) at queue.c:1770
#8  0x00005579594ac092 in DequeueForConsumer (pSkippedMsgs=0x7fd87bbf6d44, pWti=0x55795b4dada0, pThis=0x55795b4da900) at queue.c:1922
#9  ConsumerReg (pThis=0x55795b4da900, pWti=0x55795b4dada0) at queue.c:1978
#10 0x00005579594a8c0e in wtiWorker (pThis=pThis@entry=0x55795b4dada0) at wti.c:365
#11 0x00005579594a7d7d in wtpWorker (arg=0x55795b4dada0) at wtp.c:410
#12 0x00007fd87de44ea5 in start_thread (arg=0x7fd87bbf7700) at pthread_create.c:307
#13 0x00007fd87cf388cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I provide the needed data to reproduce this issue in a private attachment (rsyslog.conf and a set of queue files).

Comment 27 Jiří Vymazal 2019-11-04 14:53:23 UTC
Found the fix (see linked upstream PR)

Comment 32 errata-xmlrpc 2020-03-31 19:10:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1000


Note You need to log in before you can comment on or make changes to this bug.