745684 – [router] write error: Connection refused (111) messages when shuttting down jabberd

Bug 745684 - [router] write error: Connection refused (111) messages when shuttting down jabberd

Summary: [router] write error: Connection refused (111) messages when shuttting down j...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite 5
Classification:	Red Hat
Component:	Provisioning
Sub Component:
Version:	541
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Stephen Herr
QA Contact:	Lukas Pramuk
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	465198 sat541-triage 834373
TreeView+	depends on / blocked

Reported:	2011-10-13 03:20 UTC by Mark Huth
Modified:	2018-11-28 21:27 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	834373 (view as bug list)
Environment:
Last Closed:	2012-05-29 13:48:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch to jabberd init.d script to reverse the order in which the jabber processes are shutdown (399 bytes, patch) 2011-10-13 03:27 UTC, Mark Huth	no flags	Details \| Diff
This patch doesn't modify the ${progs} variable, so a 'service jabberd restart' works correctly now (422 bytes, patch) 2011-10-13 20:02 UTC, Mark Huth	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Legacy)	64876	0	None	None	None	Never
Red Hat Product Errata	RHBA-2012:0697	0	normal	SHIPPED_LIVE	Red Hat Network Satellite server bug fix update	2012-05-29 17:45:19 UTC

Description Mark Huth 2011-10-13 03:20:51 UTC

Description of problem:
Customer noticed that when jabberd service was shutdown, connection refused messages would appear in /var/log/messages

Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting connection to router at 127.0.0.1, port=5347
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: [5] [router] write error: Connection refused (111)
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: connection to router closed
Oct 11 11:23:50 gss-sat jabberd/c2s[3249]: attempting reconnect (2 left)
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: attempting connection to router at 127.0.0.1, port=5347
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: [6] [router] write error: Connection refused (111)
Oct 11 11:23:50 gss-sat jabberd/s2s[3256]: connection to router closed

Version-Release number of selected component (if applicable):
jabberd-2.2.8-12.el6sat.x86_64
jabberd-2.2.8-12.el5sat

How reproducible:
Always

Steps to Reproduce:
1. Shutdown jabberd 
2. Observe [router] write error: Connection refused (111) in /var/log/messages
3. No such errors appear on startup
 
Actual results:
[router] write error: Connection refused (111) messages

Expected results:
No error messages

Additional info:
Observing the process shutdown sequence:
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [  OK  ]

... and stracing the router and s2s processes showed that the router process was terminated before the s2s and c2s processes.  It appeared that the s2s and c2s processes were trying to write data to the router process, but it had already been terminated.

# strace -ttT -f -p $(pidof router) -v -s2048
Process 8787 attached - interrupt to quit
...
11:42:45.475703 close(8)                = 0 <0.000040>
11:42:45.475831 close(4)                = 0 <0.000023>
11:42:45.476011 close(3)                = 0 <0.000025>
11:42:45.476297 exit_group(0)           = ?
Process 8787 detached

# strace -ttT -f -p $(pidof s2s) -v -s2048
Process 8810 attached - interrupt to quit
...
11:42:47.476619 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: attempting connection to router at 127.0.0.1, port=5347", 95, MSG_NOSIGNAL, NULL, 0) = 95 <0.000109>
11:42:47.477831 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6 <0.000169>
11:42:47.478058 fcntl(6, F_GETFL)       = 0x2 (flags O_RDWR) <0.000020>
11:42:47.478132 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020>
11:42:47.478202 connect(6, {sa_family=AF_INET, sin_port=htons(5347), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000058>
11:42:47.478426 fcntl(6, F_GETFL)       = 0x802 (flags O_RDWR|O_NONBLOCK) <0.000019>
11:42:47.478495 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000020>
11:42:47.479061 sendto(6, "<?xml version='1.0'?><stream:stream xmlns:stream='http://etherx.jabber.org/streams' version='1.0'>", 98, 0, NULL, 0) = -1 ECONNREFUSED (Connection refused) <0.000020>
11:42:47.479186 sendto(3, "<157>Oct 11 11:42:47 jabberd/s2s[8810]: [6] [router] write error: Connection refused (111)", 90, MSG_NOSIGNAL, NULL, 0) = 90 <0.000089>

Comment 1 Mark Huth 2011-10-13 03:27:23 UTC

Created attachment 527818 [details]
Patch to jabberd init.d script to reverse the order in which the jabber processes are shutdown

Testing the patch

[root@gss-sat ~]# service jabberd stop
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [  OK  ]
... router process is stopped before c2s and s2s and error messages appear in /var/log/messages


*** apply patch ***


[root@gss-sat ~]# service jabberd start
Initializing jabberd processes ...
Starting router:                                           [  OK  ]
Starting sm:                                               [  OK  ]
Starting c2s:                                              [  OK  ]
Starting s2s:                                              [  OK  ]
[root@gss-sat ~]# service jabberd stop
Terminating jabberd processes ...
Stopping s2s:                                              [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping router:                                           [  OK  ]
[root@gss-sat ~]# 

Processes shutdown in the reverse order now and no connection refused errors in /var/log/messages:

Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: shutting down
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [8] [10.65.211.14, port=5269] disconnect, packets: 2
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: [9] [10.65.211.14, port=52533] disconnect, packets: 3
Oct 11 12:10:01 gss-sat jabberd/s2s[16347]: connection to router closed
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [127.0.0.1, port=37406] disconnect
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] default route offline
Oct 11 12:10:01 gss-sat jabberd/router[16326]: [s2s] offline
Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: shutting down
Oct 11 12:10:03 gss-sat jabberd/c2s[16340]: [7] [10.65.208.11, port=56650] disconnect jid=rhn-dispatcher-sat.pnq.redhat.com/superclient, packets: 30
Oct 11 12:10:03 gss-sat jabberd/router[16326]: [127.0.0.1, port=37405] disconnect
Oct 11 12:10:03 gss-sat jabberd/router[16326]: [c2s] offline
Oct 11 12:10:03 gss-sat jabberd/sm[16333]: session ended: jid=rhn-dispatcher-sat.pnq.redhat.com/superclient
Oct 11 12:10:04 gss-sat jabberd/sm[16333]: shutting down
Oct 11 12:10:04 gss-sat jabberd/router[16326]: [127.0.0.1, port=37404] disconnect
Oct 11 12:10:04 gss-sat jabberd/router[16326]: [gss-sat.gsslab.pnq.redhat.com] offline
Oct 11 12:10:05 gss-sat jabberd/router[16326]: shutting down

Comment 2 Mark Huth 2011-10-13 20:02:53 UTC

Created attachment 528096 [details]
This patch doesn't modify the ${progs} variable, so a 'service jabberd restart' works correctly now

Comment 3 Mark Huth 2012-02-28 22:55:24 UTC

Any chance we could get acknowledgement from engineering that this is planned to be in the next release of satellite?  Its not a serious problem, but one we see pop up quite frequently and it can be disconcerting for customers and support staff alike.  No-one likes to see [FAILED] when services are shutting down, like so:

root@sat1us2 ~]# service jabberd stop
Terminating jabberd processes ...
Stopping router:                                           [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping s2s:                                              [FAILED]

Reversing the shutdown order of the jabberd processes fixes this.

Comment 4 Tomas Lestach 2012-02-29 08:21:42 UTC

Patch from Comment#2 look good.

Comment 5 Stephen Herr 2012-03-28 17:28:44 UTC

Patch from Comment#2 committed to internal branch: 238592f188d707c998a9a4310e5f6ca2aa0751ad

We do not have this scheduled for a erratum yet but the fix has been committed so it can easily be pulled into one.

Comment 9 errata-xmlrpc 2012-05-29 13:48:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0697.html

Note You need to log in before you can comment on or make changes to this bug.