-- Description of the problem -- Client system : RHEL5.2 with osad-5.1.0-7.el5 installed. Satellite : RHN5.1 Satellite running on a RHEL4.7 with jabberpy-0.5-0.12.rhn.rhel4 installed. When osad is running, there's an opened connection to Satellite on port 5222. State of this connection is ESTABLISHED. If for any reason client cannot communicate with Satellite on port 5222, then the connection goes into CLOSE_WAIT state. When the network connection is back, osad opens a new connection with jabberd on port 5222, leaving the CLOSE_WAIT connection as it is. -- How to reproduce -- - register a RHEL5 client to a RHN Satellite. - make sure that osad is running on the client and jabberd is running on the Satellite. - check if there's an opened connection on port 5222 # netstat -anp | grep 5222 tcp 0 0 <client IP>:34269 <Sat IP>:5222 ESTABLISHED 2748/python - switch off jabberd service #service jabberd stop - check the network connection state : # netstat -anp | grep 5222 tcp 1 0 <client IP>:34269 <Sat IP>:5222 CLOSE_WAIT 2748/python - restart jabberd service and check for connections on port 5222 : # netstat -anp | grep 5222 tcp 0 0 <client IP>:39093 <Sat IP>:5222 ESTABLISHED 1807/python tcp 0 0 <client IP>:34238 <Sat IP>:5222 CLOSE_WAIT 1807/python We can see that there are two connections to Satellite on port 5222. Each time jabberd will be restarted a new connection will be opened, stacked with the existing ones. Restarting osad service on client system helps to get rid on these connections. -- Expected results -- These CLOSE_WAIT connections should be closed, leaving only one opened. This event sent from IssueTracker by mpoole [Support Engineering Group] issue 217483 --- Additional comment from mpoole on 2008-09-05 10:16:36 EDT --- The problem appears to be the lack of exception handling in the jabber_lib.py code in the osad package. In the process method there is the following, log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE) try: data = self._read(self.BLOCK_SIZE) except SSL.SSL.SysCallError, e: raise SSLError("OpenSSL error; will retry", str(e)) when I set the log level to 5 and changed the above to the following log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE) try: data = self._read(self.BLOCK_SIZE) except: log_debug(5, "Error caught:") log_debug(5, extract_traceback()) the following is noted in the osad.log file 2008-09-02 18:18:54 jabber_lib.process: before select(); timeout None 2008-09-02 18:18:54 jabber_lib.process: select() returned 2008-09-02 18:18:54 jabber_lib.process: Reading 1024 bytes from ssl socket 2008-09-02 18:18:54 jabber_lib.process: Error caught: 2008-09-02 18:18:54 jabber_lib.process: Traceback (most recent call last): File "/usr/share/rhn/osad/jabber_lib.py", line 1037, in process data = self._read(self.BLOCK_SIZE) SysCallError: (-1, 'Unexpected EOF') on changing the code to be log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE) try: data = self._read(self.BLOCK_SIZE) except: log_debug(5, "Closing socket") self._non_ssl_sock.close() we get 2008-09-02 18:50:28 jabber_lib.process: Reading 1024 bytes from ssl socket 2008-09-02 18:50:28 jabber_lib.process: Closing socket 2008-09-02 18:50:28 jabber_lib.main: Sleeping 100 seconds 2008-09-02 18:52:08 osad.setup_config: Skipping config setup; counter=1; interval=65 2008-09-02 18:52:08 jabber_lib.setup_connection: Connecting to dhcp-1-221.fab.redhat.com and the connection restarts cleanly with no lost socket hanging around.
Change committed to git, commit f140305fcbcfed8a8a343972bbe7a33eee096712.
Spacewalk has been released for some time.