Description of problem: Customer noticed that when jabberd service was shutdown, connection refused messages would appear in /var/log/messages Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting connection to router at 127.0.0.1, port=5347 Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: [5] [router] write error: Connection refused (111) Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: connection to router closed Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting reconnect (2 left) Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: attempting connection to router at 127.0.0.1, port=5347 Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: [6] [router] write error: Connection refused (111) Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: connection to router closed Version-Release number of selected component (if applicable): jabberd-2.2.8-12.el6sat.x86_64 jabberd-2.2.8-12.el5sat How reproducible: Always Steps to Reproduce: 1. Shutdown jabberd 2. Observe [router] write error: Connection refused (111) in /var/log/messages 3. No such errors appear on startup Actual results: [router] write error: Connection refused (111) messages Expected results: No error messages Additional info: Observing the process shutdown sequence: Terminating jabberd processes ... Stopping router: [ OK ] Stopping sm: [ OK ] Stopping c2s: [ OK ] Stopping s2s: [ OK ] ... and stracing the router and s2s processes showed that the router process was terminated before the s2s and c2s processes. It appeared that the s2s and c2s processes were trying to write data to the router process, but it had already been terminated. # strace -ttT -f -p $(pidof router) -v -s2048 Process 8787 attached - interrupt to quit ... 11:42:45.475703 close(8) = 0 <0.000040> 11:42:45.475831 close(4) = 0 <0.000023> 11:42:45.476011 close(3) = 0 <0.000025> 11:42:45.476297 exit_group(0) = ? Process 8787 detached # strace -ttT -f -p $(pidof s2s) -v -s2048 Process 8810 attached - interrupt to quit ... 11:42:47.476619 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: attempting connection to router at 127.0.0.1, port=5347", 95, MSG_NOSIGNAL, NULL, 0) = 95 <0.000109> 11:42:47.477831 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6 <0.000169> 11:42:47.478058 fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) <0.000020> 11:42:47.478132 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020> 11:42:47.478202 connect(6, {sa_family=AF_INET, sin_port=htons(5347), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000058> 11:42:47.478426 fcntl(6, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) <0.000019> 11:42:47.478495 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020> 11:42:47.479061 sendto(6, "<?xml version='1.0'?><stream:stream xmlns:stream='http://etherx.jabber.org/streams' version='1.0'>", 98, 0, NULL, 0) = -1 ECONNREFUSED (Connection refused) <0.000020> 11:42:47.479186 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: [6] [router] write error: Connection refused (111)", 90, MSG_NOSIGNAL, NULL, 0) = 90 <0.000089>
Created attachment 527818 [details] Patch to jabberd init.d script to reverse the order in which the jabber processes are shutdown Testing the patch [root@gss-sat ~]# service jabberd stop Terminating jabberd processes ... Stopping router: [ OK ] Stopping sm: [ OK ] Stopping c2s: [ OK ] Stopping s2s: [ OK ] ... router process is stopped before c2s and s2s and error messages appear in /var/log/messages *** apply patch *** [root@gss-sat ~]# service jabberd start Initializing jabberd processes ... Starting router: [ OK ] Starting sm: [ OK ] Starting c2s: [ OK ] Starting s2s: [ OK ] [root@gss-sat ~]# service jabberd stop Terminating jabberd processes ... Stopping s2s: [ OK ] Stopping c2s: [ OK ] Stopping sm: [ OK ] Stopping router: [ OK ] [root@gss-sat ~]# Processes shutdown in the reverse order now and no connection refused errors in /var/log/messages: Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: shutting down Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [8] [10.65.211.14, port=5269] disconnect, packets: 2 Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [9] [10.65.211.14, port=52533] disconnect, packets: 3 Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: connection to router closed Oct 11 12:10:01 gss-sat jabberd/router[16326]: [127.0.0.1, port=37406] disconnect Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] default route offline Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] offline Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: shutting down Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: [7] [10.65.208.11, port=56650] disconnect jid=rhn-dispatcher-sat.pnq.redhat.com/superclient, packets: 30 Oct 11 12:10:03 gss-sat jabberd/router[16326]: [127.0.0.1, port=37405] disconnect Oct 11 12:10:03 gss-sat jabberd/router[16326]: [c2s] offline Oct 11 12:10:03 gss-sat jabberd/sm[16333]: session ended: jid=rhn-dispatcher-sat.pnq.redhat.com/superclient Oct 11 12:10:04 gss-sat jabberd/sm[16333]: shutting down Oct 11 12:10:04 gss-sat jabberd/router[16326]: [127.0.0.1, port=37404] disconnect Oct 11 12:10:04 gss-sat jabberd/router[16326]: [gss-sat.gsslab.pnq.redhat.com] offline Oct 11 12:10:05 gss-sat jabberd/router[16326]: shutting down
Created attachment 528096 [details] This patch doesn't modify the ${progs} variable, so a 'service jabberd restart' works correctly now
Any chance we could get acknowledgement from engineering that this is planned to be in the next release of satellite? Its not a serious problem, but one we see pop up quite frequently and it can be disconcerting for customers and support staff alike. No-one likes to see [FAILED] when services are shutting down, like so: root@sat1us2 ~]# service jabberd stop Terminating jabberd processes ... Stopping router: [ OK ] Stopping sm: [ OK ] Stopping c2s: [ OK ] Stopping s2s: [FAILED] Reversing the shutdown order of the jabberd processes fixes this.
Patch from Comment#2 look good.
Patch from Comment#2 committed to internal branch: 238592f188d707c998a9a4310e5f6ca2aa0751ad We do not have this scheduled for a erratum yet but the fix has been committed so it can easily be pulled into one.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0697.html