Red Hat Bugzilla – Bug 983986
Rsyslog hangs on startup when failover remote hosts are configured
Last modified: 2016-09-20 00:51:28 EDT
Description of problem:
Configuring failover remote hosts in rsyslog.conf causes rsyslog to hang on startup.
7775.447762377:7f124a8b5700: deque option i, optarg '/var/run/syslogd.pid'
7775.447771451:7f124a8b5700: Checking pidfile.
(it creates pidfile normally)
and hangs for 5 minutes and dies without any output.
Version-Release number of selected component (if applicable):
Repeatable every time
Steps to Reproduce:
1. Stop rsyslog
2. Configure a failover remote syslog host as per http://wiki.rsyslog.com/index.php/FailoverSyslogServer
3. Start syslog
Normal operation with failover on remote server down when using TCP
This is likely a duplicate of bug #927405.
The perceived hang is the parent process waiting for a child that has crashed due to the bug mentioned above.
It is trigger by a configuration like this:
Such as the one on the wiki.
When you run the daemon in the debug mode (options '-dn'), it should crash with a segfault. If you can confirm this, please close it as a duplicate of the bug above.
The bit in my config that seems relevant is the same as you say:
The debug output is as follows:
8867.112337781:7f22a0132700: template: 'RSYSLOG_TraditionalFileFormat' assigned
8867.112355976:7f22a0132700: action 7 queue: save on shutdown 1, max disk space allowed 0
8867.112378862:7f22a0132700: action 7 queue: type 3, enq-only 0, disk assisted 0, maxFileSz 1048576, lqsize 0, pqsize 0, child 0, full delay 970, light delay 700, deq batch size 16 starting
8867.112401191:7f22a0132700: Action 0x7f22a21e5050: queue 0x7f22a21e5190 created
8867.112424688:7f22a0132700: cfline: '*.* @@primary-host:514'
8867.112441543:7f22a0132700: selector line successfully processed
8867.112458281:7f22a0132700: - traditional PRI filter
8867.112475197:7f22a0132700: symbolic name: * ==> 255
8867.112497279:7f22a0132700: tried selector action for builtin-file: -2001
8867.112514208:7f22a0132700: tried selector action for builtin-pipe: -2001
8867.112534192:7f22a0132700: caller requested object 'netstrms', not found (iRet -3003)
8867.112551968:7f22a0132700: Requested to load module 'lmnetstrms'
8867.112570693:7f22a0132700: loading module '/lib64/rsyslog/lmnetstrms.so'
8867.113614949:7f22a0132700: doing nsselClassInit
8867.113728003:7f22a0132700: doing nspollClassInit
8867.113750403:7f22a0132700: module of type 2 being loaded.
8867.113767893:7f22a0132700: entry point 'isCompatibleWithFeature' not present in module
8867.113788756:7f22a0132700: source file omfwd.c requested reference for module 'lmnetstrms', reference count now 1
8867.113816191:7f22a0132700: source file omfwd.c requested reference for module 'lmnetstrms', reference count now 2
8867.113836116:7f22a0132700: caller requested object 'tcpclt', not found (iRet -3003)
8867.113854394:7f22a0132700: Requested to load module 'lmtcpclt'
8867.113873671:7f22a0132700: loading module '/lib64/rsyslog/lmtcpclt.so'
8867.115955319:7f22a0132700: module of type 2 being loaded.
8867.115978109:7f22a0132700: entry point 'isCompatibleWithFeature' not present in module
8867.115997611:7f22a0132700: source file omfwd.c requested reference for module 'lmtcpclt', reference count now 1
8867.116020922:7f22a0132700: tried selector action for builtin-fwd: 0
8867.116040389:7f22a0132700: Module builtin-fwd processed this config line.
8867.116063625:7f22a0132700: template: 'RSYSLOG_TraditionalForwardFormat' assigned
8867.116087323:7f22a0132700: action 8 queue: save on shutdown 1, max disk space allowed 0
8867.116113529:7f22a0132700: action 8 queue: type 3, enq-only 0, disk assisted 0, maxFileSz 1048576, lqsize 0, pqsize 0, child 0, full delay 970, light delay 700, deq batch size 16 starting
8867.116136860:7f22a0132700: Action 0x7f22a21e6590: queue 0x7f22a21e66b0 created
8867.116157502:7f22a0132700: cfline: '$ActionExecOnlyWhenPreviousIsSuspended on'
8867.116175588:7f22a0132700: selector line successfully processed
8867.116202063:7f22a0132700: cfline: '&@@seconday-host:514'
8867.116220885:7f22a0132700: tried selector action for builtin-file: -2001
8867.116238454:7f22a0132700: tried selector action for builtin-pipe: -2001
8867.116257508:7f22a0132700: tried selector action for builtin-fwd: 0
8867.116275322:7f22a0132700: Module builtin-fwd processed this config line.
8867.116294831:7f22a0132700: template: 'RSYSLOG_TraditionalForwardFormat' assigned
8867.116325535:7f22a0132700: action 9 queue: save on shutdown 1, max disk space allowed 0
8867.116349828:7f22a0132700: action 9 queue: type 3, enq-only 0, disk assisted 0, maxFileSz 1048576, lqsize 0, pqsize 0, child 0, full delay 970, light delay 700, deq batch size 16 starting
8867.116372773:7f22a0132700: Action 0x7f22a21e6b20: queue 0x7f22a21e6ec0 created
Signal 11 (SIGSEGV) occured, execution must be terminated.
8867.116440497:7f22a0132700: Mutex log for all known mutex operations:
8867.116457711:7f22a0132700: If the call trace is empty, you may want to ./configure --enable-rtinst
To submit bug reports, visit http://www.rsyslog.com/bugs
To submit bug reports, visit http://www.rsyslog.com/bugs
Looks like it may be a duplicate, cheers for the other bug info, do you know if a fix is on the way or a fixed stable version I can add to my local repo to use with 6.4 ?
You can get rid of the issue if you drop this patch from the spec file:
A different problem will reappear upon dropping it but not as severe as the one you're currently having.
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Recently re visited this issue as we have upgraded recently. I can now confirm this is fixed in rsyslog-5.8.10-8.el6.x86_64 :)
rsyslogd now takes the config without choking! I haven't as yet tested the FO functionality but fingers crossed it shall work as described.
(In reply to Alister Burrell from comment #6)
> Recently re visited this issue as we have upgraded recently. I can now
> confirm this is fixed in rsyslog-5.8.10-8.el6.x86_64 :)
> rsyslogd now takes the config without choking! I haven't as yet tested the
> FO functionality but fingers crossed it shall work as described.
Any news on this?
Sorry for the huge delay on this one.
Testing with rsyslog-5.8.10-8.el6.x86_64:
Using `/sbin/iptables -I OUTPUT -p tcp --destination logstash_srv1_ip/32 -j REJECT` there appears to be no immediate flip on adding this rule.
On restart it tries the now unavailable ip and then connects to the second:
tcp 0 4469 self_ip:41311 logstash_srv1_ip:514 FIN_WAIT1 0 0 -
tcp 0 0 self_ip:34976 logstash_srv2_ip:514 ESTABLISHED 0 70256902 6188/rsyslogd
On removing the REJECT rule it reconnects the the first server but still keeps the other connection active:
cp 0 0 self_ip:34976 logstash_srv2_ip:514 ESTABLISHED 0 70256902 6188/rsyslogd
tcp 0 0 self_ip:41456 logstash_srv1_ip:514 ESTABLISHED 0 70257181 6188/rsyslogd
tcpdump shows only traffic to and from the first server so it did revert on it returning but just didn't close the other connection.
So for new nodes this seems to work well for FO and back. For current nodes seems they would need a restart to flip across? I'm going to leave a node in a bad state overnight to see if it flips eventually without a restart.
It has just flipped over without a restart and reverted on restore fine. Many thanks for the fixes everyone :)
(In reply to Alister Burrell from comment #9)
> It has just flipped over without a restart and reverted on restore fine.
> Many thanks for the fixes everyone :)
Thanks for getting back. I'll close this as a duplicate then.
*** This bug has been marked as a duplicate of bug 927405 ***