Bug 745684

Summary: [router] write error: Connection refused (111) messages when shuttting down jabberd
Product: Red Hat Satellite 5 Reporter: Mark Huth <mhuth>
Component: ProvisioningAssignee: Stephen Herr <sherr>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: medium Docs Contact:
Priority: medium    
Version: 541CC: adellape, cperry, jpazdziora, kgay, lpramuk, xdmoon
Target Milestone: ---Keywords: EasyFix, Patch
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 834373 (view as bug list) Environment:
Last Closed: 2012-05-29 13:48:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 465198, 677498, 834373    
Attachments:
Description Flags
Patch to jabberd init.d script to reverse the order in which the jabber processes are shutdown
none
This patch doesn't modify the ${progs} variable, so a 'service jabberd restart' works correctly now none

Description Mark Huth 2011-10-13 03:20:51 UTC
Description of problem:
Customer noticed that when jabberd service was shutdown, connection refused messages would appear in /var/log/messages

Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting connection to router at 127.0.0.1, port=5347
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: [5] [router] write error: Connection refused (111)
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: connection to router closed
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting reconnect (2 left)
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: attempting connection to router at 127.0.0.1, port=5347
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: [6] [router] write error: Connection refused (111)
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: connection to router closed

Version-Release number of selected component (if applicable):
jabberd-2.2.8-12.el6sat.x86_64
jabberd-2.2.8-12.el5sat

How reproducible:
Always

Steps to Reproduce:
1. Shutdown jabberd 
2. Observe [router] write error: Connection refused (111) in /var/log/messages
3. No such errors appear on startup
 
Actual results:
[router] write error: Connection refused (111) messages

Expected results:
No error messages

Additional info:
Observing the process shutdown sequence:
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [  OK  ]

... and stracing the router and s2s processes showed that the router process was terminated before the s2s and c2s processes.  It appeared that the s2s and c2s processes were trying to write data to the router process, but it had already been terminated.

# strace -ttT -f -p $(pidof router) -v -s2048
Process 8787 attached - interrupt to quit
...
11:42:45.475703 close(8)                = 0 <0.000040>
11:42:45.475831 close(4)                = 0 <0.000023>
11:42:45.476011 close(3)                = 0 <0.000025>
11:42:45.476297 exit_group(0)           = ?
Process 8787 detached

# strace -ttT -f -p $(pidof s2s) -v -s2048
Process 8810 attached - interrupt to quit
...
11:42:47.476619 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: attempting connection to router at 127.0.0.1, port=5347", 95, MSG_NOSIGNAL, NULL, 0) = 95 <0.000109>
11:42:47.477831 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6 <0.000169>
11:42:47.478058 fcntl(6, F_GETFL)       = 0x2 (flags O_RDWR) <0.000020>
11:42:47.478132 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020>
11:42:47.478202 connect(6, {sa_family=AF_INET, sin_port=htons(5347), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000058>
11:42:47.478426 fcntl(6, F_GETFL)       = 0x802 (flags O_RDWR|O_NONBLOCK) <0.000019>
11:42:47.478495 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020>
11:42:47.479061 sendto(6, "<?xml version='1.0'?><stream:stream xmlns:stream='http://etherx.jabber.org/streams' version='1.0'>", 98, 0, NULL, 0) = -1 ECONNREFUSED (Connection refused) <0.000020>
11:42:47.479186 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: [6] [router] write error: Connection refused (111)", 90, MSG_NOSIGNAL, NULL, 0) = 90 <0.000089>

Comment 1 Mark Huth 2011-10-13 03:27:23 UTC
Created attachment 527818 [details]
Patch to jabberd init.d script to reverse the order in which the jabber processes are shutdown

Testing the patch

[root@gss-sat ~]# service jabberd stop
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [  OK  ]
... router process is stopped before c2s and s2s and error messages appear in /var/log/messages


*** apply patch ***


[root@gss-sat ~]# service jabberd start
Initializing jabberd processes ...
Starting router:                                           [  OK  ]
Starting sm:                                               [  OK  ]
Starting c2s:                                              [  OK  ]
Starting s2s:                                              [  OK  ]
[root@gss-sat ~]# service jabberd stop
Terminating jabberd processes ...
Stopping s2s:                                              [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping router:                                           [  OK  ]
[root@gss-sat ~]# 

Processes shutdown in the reverse order now and no connection refused errors in /var/log/messages:

Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: shutting down
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [8] [10.65.211.14, port=5269] disconnect, packets: 2
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [9] [10.65.211.14, port=52533] disconnect, packets: 3
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: connection to router closed
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [127.0.0.1, port=37406] disconnect
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] default route offline
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] offline
Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: shutting down
Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: [7] [10.65.208.11, port=56650] disconnect jid=rhn-dispatcher-sat.pnq.redhat.com/superclient, packets: 30
Oct 11 12:10:03 gss-sat jabberd/router[16326]: [127.0.0.1, port=37405] disconnect
Oct 11 12:10:03 gss-sat jabberd/router[16326]: [c2s] offline
Oct 11 12:10:03 gss-sat jabberd/sm[16333]: session ended: jid=rhn-dispatcher-sat.pnq.redhat.com/superclient
Oct 11 12:10:04 gss-sat jabberd/sm[16333]: shutting down
Oct 11 12:10:04 gss-sat jabberd/router[16326]: [127.0.0.1, port=37404] disconnect
Oct 11 12:10:04 gss-sat jabberd/router[16326]: [gss-sat.gsslab.pnq.redhat.com] offline
Oct 11 12:10:05 gss-sat jabberd/router[16326]: shutting down

Comment 2 Mark Huth 2011-10-13 20:02:53 UTC
Created attachment 528096 [details]
This patch doesn't modify the ${progs} variable, so a 'service jabberd restart' works correctly now

Comment 3 Mark Huth 2012-02-28 22:55:24 UTC
Any chance we could get acknowledgement from engineering that this is planned to be in the next release of satellite?  Its not a serious problem, but one we see pop up quite frequently and it can be disconcerting for customers and support staff alike.  No-one likes to see [FAILED] when services are shutting down, like so:

root@sat1us2 ~]# service jabberd stop
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [FAILED]

Reversing the shutdown order of the jabberd processes fixes this.

Comment 4 Tomas Lestach 2012-02-29 08:21:42 UTC
Patch from Comment#2 look good.

Comment 5 Stephen Herr 2012-03-28 17:28:44 UTC
Patch from Comment#2 committed to internal branch: 238592f188d707c998a9a4310e5f6ca2aa0751ad

We do not have this scheduled for a erratum yet but the fix has been committed so it can easily be pulled into one.

Comment 9 errata-xmlrpc 2012-05-29 13:48:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0697.html