Bug 1549706
Summary: | rsyslog SEGV with disk-assisted queues, possible corruption | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Peter Portante <pportant> | ||||
Component: | rsyslog | Assignee: | Jiří Vymazal <jvymazal> | ||||
Status: | CLOSED ERRATA | QA Contact: | Dalibor Pospíšil <dapospis> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.4 | CC: | amitkuma, cbesson, cww, dapospis, dkopecek, jvymazal, mgrepl, mmatsuya, nhosoi, pportant, pvrabec, rmetrich, rsroka, sbroz | ||||
Target Milestone: | rc | Keywords: | Reopened, Triaged | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | rsyslog-8.24.0-48.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1738213 (view as bug list) | Environment: | |||||
Last Closed: | 2020-03-31 19:10:09 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1477664, 1738213 | ||||||
Attachments: |
|
Description
Peter Portante
2018-02-27 16:59:56 UTC
Could this be the same as https://access.redhat.com/solutions/882893? I would think not, but it does match the stack trace. Perhaps another clue: when I deleted the first queue file, and edited the .qi file to reference the next queue file, it processed many more queue files until is SEGV'd again. When I then deleted that queue file, it continued until a 3rd one. Finally is processed the remaining few hundred of them (1 GB max on disk, with 1 MB files) until it caught up. I saved the state of the /var/lib/rsyslog directory before making these changes. There are 1,026 files in /var/lib/rsyslog: [/var/lib/rsyslog.bz1549706]# ls -1 | wc -l 1026 [/var/lib/rsyslog.bz1549706]# ls -l | head -n 3 total 1052060 -rw-------. 1 root root 126 Feb 23 01:27 imjournal.state -rw-------. 1 root root 1051359 Jan 30 13:10 ViaQ_Collector_01.00055319 [/var/lib/rsyslog.bz1549706]# ls -l | tail -n 3 -rw-------. 1 root root 1051129 Feb 22 19:53 ViaQ_Collector_01.00056341 -rw-------. 1 root root 354308 Feb 23 01:25 ViaQ_Collector_01.00056342 -rw-------. 1 root root 572 Feb 22 19:53 ViaQ_Collector_01.qi [/var/lib/rsyslog.bz1549706]# cat ViaQ_Collector_01.qi <OPB:1:qqueue:1: +iQueueSize:2:6:323723: +tVars.disk.sizeOnDisk:2:10:1073394353: >End . <Obj:1:strm:1: +iCurrFNum:2:5:56342: +pszFName:1:17:ViaQ_Collector_01: +iMaxFiles:2:8:10000000: +bDeleteOnClose:2:1:0: +sType:2:1:1: +tOperationsMode:2:1:2: +tOpenMode:2:3:384: +iCurrOffs:2:4:2447: +inode:2:1:0: +bPrevWasNL:2:1:0: >End . <Obj:1:strm:1: +iCurrFNum:2:5:55319: +pszFName:1:17:ViaQ_Collector_01: +iMaxFiles:2:8:10000000: +bDeleteOnClose:2:1:1: +sType:2:1:1: +tOperationsMode:2:1:1: +tOpenMode:2:3:384: +iCurrOffs:2:6:432905: +inode:2:7:1060046: +bPrevWasNL:2:1:0: >End . Below is our rsyslog configuration: [/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.conf # rsyslog v7 for RHEL 7 configuration file # For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html # If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html #### MODULES #### # The imjournal module below is now used as a message source instead of imuxsock, # providing access to the systemd journal and /dev/log messages. $ModLoad imjournal #### GLOBAL DIRECTIVES #### # Where to place auxiliary files $WorkDirectory /var/lib/rsyslog # Use updated timestamp format for local files $ActionFileDefaultTemplate RSYSLOG_FileFormat # We want fully qualified domain names for common logging $PreserveFQDN on # File to store the position in the journal $IMJournalStateFile imjournal.state $IMJournalPersistStateInterval 10000 $IMJournalRatelimitInterval 1 $IMJournalRatelimitBurst 100000000 # Include all config files in /etc/rsyslog.d/ $IncludeConfig /etc/rsyslog.d/*.conf #### RULES #### # Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;mail.none;authpriv.none;cron.none /var/log/messages # The authpriv file has restricted access. authpriv.* /var/log/secure # Log all the mail messages in one place. mail.* -/var/log/maillog # Log cron stuff cron.* /var/log/cron # Save news errors of level crit and higher in a special file. uucp,news.crit /var/log/spooler # Save boot messages also to boot.log local7.* /var/log/boot.log [/var/lib/rsyslog.bz1549706]# ls /etc/rsyslog.d/ 00-viaq-tags.conf 01-viaq-forwarding.conf 99-example-forwarding.conf [/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.d/00-viaq-tags.conf # A simple string of "tags" that system administrators can use to identify the # various roles a system can take on, or categories to consider it under. # # E.g. # set $!tags = "EXAMPLETAG EXAMPLETAGTWO EXAMPLETAGTHREE"; set $!tags = "BAGL perf-dept OSE"; [/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.d/01-viaq-forwarding.conf # This is basically the RSYSLOG_SyslogProtocol23Format, which is RFC 5424 on # the wire, but with the message payload a CEE/Lumberjack JSON document. template(name="ViaQ_SyslogProtocol23Format" type="string" string="<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% @cee:%$!%\n") # Some log messages have not been stored in the JSON variables, so check and # add them here. if strlen($!msg) < 1 then { set $!msg = $msg; } # rsyslog instance in the Integration Lab over the 1Gb public lab switch $ActionQueueType LinkedList # run asynchronously $ActionQueueFileName ViaQ_Collector_01 # unique name prefix for spool files $ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible) $ActionQueueSaveOnShutdown on # save messages to disk on shutdown $ActionResumeRetryCount -1 # infinite retries if host is down *.* @@172.17.40.134:514;ViaQ_SyslogProtocol23Format [/var/lib/rsyslog.bz1549706]# cat /etc/rsyslog.d/99-example-forwarding.conf # ### begin forwarding rule ### # The statement between the begin ... end define a SINGLE forwarding # rule. They belong together, do NOT split them. If you create multiple # forwarding rules, duplicate the whole block! # Remote Logging (we use TCP for reliable delivery) # # An on-disk queue is created for this action. If the remote host is # down, messages are spooled to disk and sent when it is up again. #$ActionQueueFileName fwdRule1 # unique name prefix for spool files #$ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible) #$ActionQueueSaveOnShutdown on # save messages to disk on shutdown #$ActionQueueType LinkedList # run asynchronously #$ActionResumeRetryCount -1 # infinite retries if host is down # remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional #*.* @@remote-host:514 # ### end of the forwarding rule ### Well, the bug is with the queue files, and it seems to be because they are corrupted. The previous bug, 1511485, does not seem to be the problem.
Here is an example queue file line:
/var/lib/rsyslog/ViaQ_Collector_01.00055319 (line 5137)
+<Obj:1:msg:1:
Full record:
.
<Obj:1:msg:1:
+iProtocolVersion:2:1:0:
+iSeverity:2:1:6:
+iFacility:2:1:1:
+msgFlags:2:1:0:
+ttGenTime:2:10:1517316316:
+<Obj:1:msg:1:
+iProtocolVersion:2:1:0:
+iSeverity:2:1:6:
+iFacility:2:1:1:
+msgFlags:2:1:0:
+ttGenTime:2:10:1517316316:
+tRcvdAt:3:35:2:2018:1:30:12:45:16:668346:6:+:0:0:
+tTIMESTAMP:3:35:2:2018:1:30:12:45:16:668346:6:+:0:0:
+pszTAG:1:8:journal::
+pszRawMsg:1:1423:... removed for anonymity ...:
+pszHOSTNAME:1:15:foo.example.com:
+pszInputName:1:9:imjournal:
+pszRcvFrom:1:15:foo.example.com:
+pszRcvFromIP:1:9:127.0.0.1:
+json:1:4587:... removed for anonymity ...:
+offMSG:2:1:0:
>End
.
This issue was not selected to be included in Red Hat Enterprise Linux 7.7 because it is seen either as low or moderate impact to a small number of use-cases. This release is in Maintenance Support 1 Phase, which means that qualified Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released as they become available. We will now close this issue, but if you believe that it qualifies for the Maintenance Support 1 Phase, please re-open; otherwise, we recommend moving the request to Red Hat Enterprise Linux 8 if applicable. Finally succeeded to reproduce the issue in my environment with the latest rsyslog package: rsyslog-8.24.0-41.el7_7 It happens on RHEL 7.6 as well as RHEL 7.7. The backtrace shows exactly the same: #0 rsCStrDestruct (ppThis=ppThis@entry=0x7fd874004250) at stringbuf.c:216 #1 0x00005579594a7a08 in varDestruct (ppThis=0x7fd87bbf6bc0) at var.c:72 #2 0x00005579594933c4 in MsgDeserialize (pMsg=0x7fd874004590, pStrm=pStrm@entry=0x55795b4e1710) at msg.c:1407 #3 0x000055795949f439 in objDeserializeWithMethods (ppObj=0x7fd87bbf6cf0, pszTypeExpected=pszTypeExpected@entry=0x5579594d962f "msg", lenTypeExpected=lenTypeExpected@entry=3, pStrm=0x55795b4e1710, fFixup=fFixup@entry=0x0, pUsr=pUsr@entry=0x0, objConstruct=0x5579594911b0 <msgConstructForDeserializer>, objConstructFinalize=objConstructFinalize@entry=0x0, objDeserialize=0x557959493360 <MsgDeserialize>) at obj.c:915 #4 0x00005579594a9e58 in qDeqDisk (pThis=<optimized out>, ppMsg=<optimized out>) at queue.c:988 #5 0x00005579594abd21 in qqueueDeq (ppMsg=0x7fd87bbf6cf0, pThis=0x55795b4da900) at queue.c:1115 #6 DequeueConsumableElements (pSkippedMsgs=<optimized out>, piRemainingQueueSize=<synthetic pointer>, pWti=<optimized out>, pThis=0x55795b4da900) at queue.c:1714 #7 DequeueConsumable (pThis=pThis@entry=0x55795b4da900, pWti=pWti@entry=0x55795b4dada0, pSkippedMsgs=pSkippedMsgs@entry=0x7fd87bbf6d44) at queue.c:1770 #8 0x00005579594ac092 in DequeueForConsumer (pSkippedMsgs=0x7fd87bbf6d44, pWti=0x55795b4dada0, pThis=0x55795b4da900) at queue.c:1922 #9 ConsumerReg (pThis=0x55795b4da900, pWti=0x55795b4dada0) at queue.c:1978 #10 0x00005579594a8c0e in wtiWorker (pThis=pThis@entry=0x55795b4dada0) at wti.c:365 #11 0x00005579594a7d7d in wtpWorker (arg=0x55795b4dada0) at wtp.c:410 #12 0x00007fd87de44ea5 in start_thread (arg=0x7fd87bbf7700) at pthread_create.c:307 #13 0x00007fd87cf388cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 I provide the needed data to reproduce this issue in a private attachment (rsyslog.conf and a set of queue files). Found the fix (see linked upstream PR) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1000 |