Bug 983986 - Rsyslog hangs on startup when failover remote hosts are configured
Rsyslog hangs on startup when failover remote hosts are configured
Status: CLOSED DUPLICATE of bug 927405
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: rsyslog (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Tomas Heinrich
BaseOS QE Security Team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-12 09:15 EDT by Alister Burrell
Modified: 2016-09-20 00:51 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-09-10 11:20:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
CentOS 6372 None None None Never

  None (edit)
Description Alister Burrell 2013-07-12 09:15:31 EDT
Description of problem:

Configuring failover remote hosts in rsyslog.conf causes rsyslog to hang on startup.

Debug shows:

7775.447762377:7f124a8b5700: deque option i, optarg '/var/run/syslogd.pid'
7775.447771451:7f124a8b5700: Checking pidfile.
(it creates pidfile normally)
and hangs for 5 minutes and dies without any output.

Version-Release number of selected component (if applicable):

rsyslog-5.8.10-6.el6.x86_64

How reproducible:

Repeatable every time

Steps to Reproduce:
1. Stop rsyslog
2. Configure a failover remote syslog host as per http://wiki.rsyslog.com/index.php/FailoverSyslogServer
3. Start syslog

Actual results:

Hangs

Expected results:

Normal operation with failover on remote server down when using TCP

Additional info:
Comment 2 Tomas Heinrich 2013-07-12 09:58:03 EDT
This is likely a duplicate of bug #927405.

The perceived hang is the parent process waiting for a child that has crashed due to the bug mentioned above.

It is trigger by a configuration like this:

[filter] [action]
$[directive]
& [action]

Such as the one on the wiki.

When you run the daemon in the debug mode (options '-dn'), it should crash with a segfault. If you can confirm this, please close it as a duplicate of the bug above.
Comment 3 Alister Burrell 2013-07-12 10:32:28 EDT
The bit in my config that seems relevant is the same as you say:
*.* @@primary-host:514
$ActionExecOnlyWhenPreviousIsSuspended on
&@@secondary-host:514
$ActionExecOnlyWhenPreviousIsSuspended off


The debug output is as follows:

8867.112337781:7f22a0132700: template: 'RSYSLOG_TraditionalFileFormat' assigned
8867.112355976:7f22a0132700: action 7 queue: save on shutdown 1, max disk space allowed 0
8867.112378862:7f22a0132700: action 7 queue: type 3, enq-only 0, disk assisted 0, maxFileSz 1048576, lqsize 0, pqsize 0, child 0, full delay 970, light delay 700, deq batch size 16 starting
8867.112401191:7f22a0132700: Action 0x7f22a21e5050: queue 0x7f22a21e5190 created
8867.112424688:7f22a0132700: cfline: '*.* @@primary-host:514'
8867.112441543:7f22a0132700: selector line successfully processed
8867.112458281:7f22a0132700:  - traditional PRI filter
8867.112475197:7f22a0132700: symbolic name: * ==> 255
8867.112497279:7f22a0132700: tried selector action for builtin-file: -2001
8867.112514208:7f22a0132700: tried selector action for builtin-pipe: -2001
8867.112534192:7f22a0132700: caller requested object 'netstrms', not found (iRet -3003)
8867.112551968:7f22a0132700: Requested to load module 'lmnetstrms'
8867.112570693:7f22a0132700: loading module '/lib64/rsyslog/lmnetstrms.so'
8867.113614949:7f22a0132700: doing nsselClassInit
8867.113728003:7f22a0132700: doing nspollClassInit
8867.113750403:7f22a0132700: module of type 2 being loaded.
8867.113767893:7f22a0132700: entry point 'isCompatibleWithFeature' not present in module
8867.113788756:7f22a0132700: source file omfwd.c requested reference for module 'lmnetstrms', reference count now 1
8867.113816191:7f22a0132700: source file omfwd.c requested reference for module 'lmnetstrms', reference count now 2
8867.113836116:7f22a0132700: caller requested object 'tcpclt', not found (iRet -3003)
8867.113854394:7f22a0132700: Requested to load module 'lmtcpclt'
8867.113873671:7f22a0132700: loading module '/lib64/rsyslog/lmtcpclt.so'
8867.115955319:7f22a0132700: module of type 2 being loaded.
8867.115978109:7f22a0132700: entry point 'isCompatibleWithFeature' not present in module
8867.115997611:7f22a0132700: source file omfwd.c requested reference for module 'lmtcpclt', reference count now 1
8867.116020922:7f22a0132700: tried selector action for builtin-fwd: 0
8867.116040389:7f22a0132700: Module builtin-fwd processed this config line.
8867.116063625:7f22a0132700: template: 'RSYSLOG_TraditionalForwardFormat' assigned
8867.116087323:7f22a0132700: action 8 queue: save on shutdown 1, max disk space allowed 0
8867.116113529:7f22a0132700: action 8 queue: type 3, enq-only 0, disk assisted 0, maxFileSz 1048576, lqsize 0, pqsize 0, child 0, full delay 970, light delay 700, deq batch size 16 starting
8867.116136860:7f22a0132700: Action 0x7f22a21e6590: queue 0x7f22a21e66b0 created
8867.116157502:7f22a0132700: cfline: '$ActionExecOnlyWhenPreviousIsSuspended on'
8867.116175588:7f22a0132700: selector line successfully processed
8867.116202063:7f22a0132700: cfline: '&@@seconday-host:514'
8867.116220885:7f22a0132700: tried selector action for builtin-file: -2001
8867.116238454:7f22a0132700: tried selector action for builtin-pipe: -2001
8867.116257508:7f22a0132700: tried selector action for builtin-fwd: 0
8867.116275322:7f22a0132700: Module builtin-fwd processed this config line.
8867.116294831:7f22a0132700: template: 'RSYSLOG_TraditionalForwardFormat' assigned
8867.116325535:7f22a0132700: action 9 queue: save on shutdown 1, max disk space allowed 0
8867.116349828:7f22a0132700: action 9 queue: type 3, enq-only 0, disk assisted 0, maxFileSz 1048576, lqsize 0, pqsize 0, child 0, full delay 970, light delay 700, deq batch size 16 starting
8867.116372773:7f22a0132700: Action 0x7f22a21e6b20: queue 0x7f22a21e6ec0 created
8867.116415893:7f22a0132700: 



Signal 11 (SIGSEGV) occured, execution must be terminated.



8867.116440497:7f22a0132700: Mutex log for all known mutex operations:
8867.116457711:7f22a0132700: If the call trace is empty, you may want to ./configure --enable-rtinst
8867.116475273:7f22a0132700: 

To submit bug reports, visit http://www.rsyslog.com/bugs

8867.116494822:7f22a0132700: 

To submit bug reports, visit http://www.rsyslog.com/bugs

Aborted


Looks like it may be a duplicate, cheers for the other bug info, do you know if a fix is on the way or a fixed stable version I can add to my local repo to use with 6.4 ?
Comment 4 Tomas Heinrich 2013-07-15 05:20:44 EDT
You can get rid of the issue if you drop this patch from the spec file:

Patch5: rsyslog-5.8.10-bz847568.patch

A different problem will reappear upon dropping it but not as severe as the one you're currently having.
Comment 5 RHEL Product and Program Management 2013-10-13 23:09:47 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 6 Alister Burrell 2014-05-16 10:15:00 EDT
Recently re visited this issue as we have upgraded recently.  I can now confirm this is fixed in rsyslog-5.8.10-8.el6.x86_64 :)

rsyslogd now takes the config without choking! I haven't as yet tested the FO functionality but fingers crossed it shall work as described.
Comment 7 Tomas Heinrich 2014-06-25 16:43:39 EDT
(In reply to Alister Burrell from comment #6)
> Recently re visited this issue as we have upgraded recently.  I can now
> confirm this is fixed in rsyslog-5.8.10-8.el6.x86_64 :)

Great.

> rsyslogd now takes the config without choking! I haven't as yet tested the
> FO functionality but fingers crossed it shall work as described.

Any news on this?
Comment 8 Alister Burrell 2014-09-09 11:55:47 EDT
Sorry for the huge delay on this one.

Testing with rsyslog-5.8.10-8.el6.x86_64:

Using `/sbin/iptables -I OUTPUT -p tcp --destination logstash_srv1_ip/32 -j REJECT` there appears to be no immediate flip on adding this rule.  

On restart it tries the now unavailable ip and then connects to the second:

tcp        0   4469 self_ip:41311        logstash_srv1_ip:514          FIN_WAIT1   0          0          -                   
tcp        0      0 self_ip:34976        logstash_srv2_ip:514          ESTABLISHED 0          70256902   6188/rsyslogd       

On removing the REJECT rule it reconnects the the first server but still keeps the other connection active:

cp        0      0 self_ip:34976        logstash_srv2_ip:514          ESTABLISHED 0          70256902   6188/rsyslogd       
tcp        0      0 self_ip:41456        logstash_srv1_ip:514          ESTABLISHED 0          70257181   6188/rsyslogd

tcpdump shows only traffic to and from the first server so it did revert on it returning but just didn't close the other connection.


So for new nodes this seems to work well for FO and back.  For current nodes seems they would need a restart to flip across?  I'm going to leave a node in a bad state overnight to see if it flips eventually without a restart.
Comment 9 Alister Burrell 2014-09-09 12:14:16 EDT
It has just flipped over without a restart and reverted on restore fine.  Many thanks for the fixes everyone :)
Comment 10 Tomas Heinrich 2014-09-10 11:20:03 EDT
(In reply to Alister Burrell from comment #9)
> It has just flipped over without a restart and reverted on restore fine. 
> Many thanks for the fixes everyone :)

Thanks for getting back. I'll close this as a duplicate then.

*** This bug has been marked as a duplicate of bug 927405 ***

Note You need to log in before you can comment on or make changes to this bug.