Bug 745684 - [router] write error: Connection refused (111) messages when shuttting down jabberd
[router] write error: Connection refused (111) messages when shuttting down j...
Status: CLOSED ERRATA
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Provisioning (Show other bugs)
541
All Linux
medium Severity medium
: ---
: ---
Assigned To: Stephen Herr
Lukas Pramuk
: EasyFix, Patch
Depends On:
Blocks: 465198 sat541-triage 834373
  Show dependency treegraph
 
Reported: 2011-10-12 23:20 EDT by Mark Huth
Modified: 2012-06-26 05:51 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 834373 (view as bug list)
Environment:
Last Closed: 2012-05-29 09:48:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Patch to jabberd init.d script to reverse the order in which the jabber processes are shutdown (399 bytes, patch)
2011-10-12 23:27 EDT, Mark Huth
no flags Details | Diff
This patch doesn't modify the ${progs} variable, so a 'service jabberd restart' works correctly now (422 bytes, patch)
2011-10-13 16:02 EDT, Mark Huth
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Legacy) 64876 None None None Never

  None (edit)
Description Mark Huth 2011-10-12 23:20:51 EDT
Description of problem:
Customer noticed that when jabberd service was shutdown, connection refused messages would appear in /var/log/messages

Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting connection to router at 127.0.0.1, port=5347
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: [5] [router] write error: Connection refused (111)
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: connection to router closed
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting reconnect (2 left)
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: attempting connection to router at 127.0.0.1, port=5347
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: [6] [router] write error: Connection refused (111)
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: connection to router closed

Version-Release number of selected component (if applicable):
jabberd-2.2.8-12.el6sat.x86_64
jabberd-2.2.8-12.el5sat

How reproducible:
Always

Steps to Reproduce:
1. Shutdown jabberd 
2. Observe [router] write error: Connection refused (111) in /var/log/messages
3. No such errors appear on startup
 
Actual results:
[router] write error: Connection refused (111) messages

Expected results:
No error messages

Additional info:
Observing the process shutdown sequence:
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [  OK  ]

... and stracing the router and s2s processes showed that the router process was terminated before the s2s and c2s processes.  It appeared that the s2s and c2s processes were trying to write data to the router process, but it had already been terminated.

# strace -ttT -f -p $(pidof router) -v -s2048
Process 8787 attached - interrupt to quit
...
11:42:45.475703 close(8)                = 0 <0.000040>
11:42:45.475831 close(4)                = 0 <0.000023>
11:42:45.476011 close(3)                = 0 <0.000025>
11:42:45.476297 exit_group(0)           = ?
Process 8787 detached

# strace -ttT -f -p $(pidof s2s) -v -s2048
Process 8810 attached - interrupt to quit
...
11:42:47.476619 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: attempting connection to router at 127.0.0.1, port=5347", 95, MSG_NOSIGNAL, NULL, 0) = 95 <0.000109>
11:42:47.477831 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6 <0.000169>
11:42:47.478058 fcntl(6, F_GETFL)       = 0x2 (flags O_RDWR) <0.000020>
11:42:47.478132 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020>
11:42:47.478202 connect(6, {sa_family=AF_INET, sin_port=htons(5347), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000058>
11:42:47.478426 fcntl(6, F_GETFL)       = 0x802 (flags O_RDWR|O_NONBLOCK) <0.000019>
11:42:47.478495 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020>
11:42:47.479061 sendto(6, "<?xml version='1.0'?><stream:stream xmlns:stream='http://etherx.jabber.org/streams' version='1.0'>", 98, 0, NULL, 0) = -1 ECONNREFUSED (Connection refused) <0.000020>
11:42:47.479186 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: [6] [router] write error: Connection refused (111)", 90, MSG_NOSIGNAL, NULL, 0) = 90 <0.000089>
Comment 1 Mark Huth 2011-10-12 23:27:23 EDT
Created attachment 527818 [details]
Patch to jabberd init.d script to reverse the order in which the jabber processes are shutdown

Testing the patch

[root@gss-sat ~]# service jabberd stop
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [  OK  ]
... router process is stopped before c2s and s2s and error messages appear in /var/log/messages


*** apply patch ***


[root@gss-sat ~]# service jabberd start
Initializing jabberd processes ...
Starting router:                                           [  OK  ]
Starting sm:                                               [  OK  ]
Starting c2s:                                              [  OK  ]
Starting s2s:                                              [  OK  ]
[root@gss-sat ~]# service jabberd stop
Terminating jabberd processes ...
Stopping s2s:                                              [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping router:                                           [  OK  ]
[root@gss-sat ~]# 

Processes shutdown in the reverse order now and no connection refused errors in /var/log/messages:

Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: shutting down
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [8] [10.65.211.14, port=5269] disconnect, packets: 2
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [9] [10.65.211.14, port=52533] disconnect, packets: 3
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: connection to router closed
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [127.0.0.1, port=37406] disconnect
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] default route offline
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] offline
Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: shutting down
Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: [7] [10.65.208.11, port=56650] disconnect jid=rhn-dispatcher-sat@gss-sat.gsslab.pnq.redhat.com/superclient, packets: 30
Oct 11 12:10:03 gss-sat jabberd/router[16326]: [127.0.0.1, port=37405] disconnect
Oct 11 12:10:03 gss-sat jabberd/router[16326]: [c2s] offline
Oct 11 12:10:03 gss-sat jabberd/sm[16333]: session ended: jid=rhn-dispatcher-sat@gss-sat.gsslab.pnq.redhat.com/superclient
Oct 11 12:10:04 gss-sat jabberd/sm[16333]: shutting down
Oct 11 12:10:04 gss-sat jabberd/router[16326]: [127.0.0.1, port=37404] disconnect
Oct 11 12:10:04 gss-sat jabberd/router[16326]: [gss-sat.gsslab.pnq.redhat.com] offline
Oct 11 12:10:05 gss-sat jabberd/router[16326]: shutting down
Comment 2 Mark Huth 2011-10-13 16:02:53 EDT
Created attachment 528096 [details]
This patch doesn't modify the ${progs} variable, so a 'service jabberd restart' works correctly now
Comment 3 Mark Huth 2012-02-28 17:55:24 EST
Any chance we could get acknowledgement from engineering that this is planned to be in the next release of satellite?  Its not a serious problem, but one we see pop up quite frequently and it can be disconcerting for customers and support staff alike.  No-one likes to see [FAILED] when services are shutting down, like so:

root@sat1us2 ~]# service jabberd stop
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [FAILED]

Reversing the shutdown order of the jabberd processes fixes this.
Comment 4 Tomas Lestach 2012-02-29 03:21:42 EST
Patch from Comment#2 look good.
Comment 5 Stephen Herr 2012-03-28 13:28:44 EDT
Patch from Comment#2 committed to internal branch: 238592f188d707c998a9a4310e5f6ca2aa0751ad

We do not have this scheduled for a erratum yet but the fix has been committed so it can easily be pulled into one.
Comment 9 errata-xmlrpc 2012-05-29 09:48:34 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0697.html

Note You need to log in before you can comment on or make changes to this bug.