Bug 463143 - Osad leaves connections in CLOSE_WAIT state.
Summary: Osad leaves connections in CLOSE_WAIT state.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Clients
Version: 0.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jan Pazdziora (Red Hat)
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: space03
TreeView+ depends on / blocked
 
Reported: 2008-09-22 08:27 UTC by Jan Pazdziora (Red Hat)
Modified: 2009-09-17 07:01 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2009-09-17 07:01:49 UTC
Embargoed:


Attachments (Terms of Use)

Description Jan Pazdziora (Red Hat) 2008-09-22 08:27:21 UTC
-- Description of the problem --

Client system : RHEL5.2 with osad-5.1.0-7.el5 installed.

Satellite : RHN5.1 Satellite running on a RHEL4.7 with  jabberpy-0.5-0.12.rhn.rhel4 installed.

When osad is running, there's an opened connection to Satellite on port 5222. State of this connection is ESTABLISHED. If for any reason client cannot communicate with Satellite on port 5222, then the connection goes into CLOSE_WAIT state. When the network connection is back, osad opens a new connection with jabberd on port 5222, leaving the CLOSE_WAIT connection as it is.

-- How to reproduce --

- register a RHEL5 client to a RHN Satellite.

- make sure that osad is running on the client and jabberd is running on the Satellite.

- check if there's an opened connection on port 5222

# netstat -anp | grep 5222
tcp        0      0 <client IP>:34269            <Sat IP>:5222            ESTABLISHED 2748/python

- switch off jabberd service

#service jabberd stop

- check the network connection state :

# netstat -anp | grep 5222
tcp        1      0 <client IP>:34269            <Sat IP>:5222            CLOSE_WAIT  2748/python

- restart jabberd service and check for connections on port 5222 :

# netstat -anp | grep 5222
tcp        0      0 <client IP>:39093            <Sat IP>:5222            ESTABLISHED 1807/python         
tcp        0      0 <client IP>:34238            <Sat IP>:5222            CLOSE_WAIT  1807/python

We can see that there are two connections to Satellite on port 5222. Each time jabberd will be restarted a new connection will be opened, stacked with the existing ones.
Restarting osad service on client system helps to get rid on these connections.

-- Expected results --

These CLOSE_WAIT connections should be closed, leaving only one opened.
This event sent from IssueTracker by mpoole  [Support Engineering Group]
 issue 217483

--- Additional comment from mpoole on 2008-09-05 10:16:36 EDT ---

The problem appears to be the lack of exception handling in the jabber_lib.py code in the osad package.

In the process method there is the following,

    log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE)
    try:
        data = self._read(self.BLOCK_SIZE)
    except SSL.SSL.SysCallError, e:
        raise SSLError("OpenSSL error; will retry", str(e))

when I set the log level to 5 and changed the above to the following

    log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE)
    try:
        data = self._read(self.BLOCK_SIZE)
    except:
        log_debug(5, "Error caught:")
        log_debug(5, extract_traceback())

the following is noted in the osad.log file

2008-09-02 18:18:54 jabber_lib.process: before select(); timeout None
2008-09-02 18:18:54 jabber_lib.process: select() returned
2008-09-02 18:18:54 jabber_lib.process: Reading 1024 bytes from ssl socket
2008-09-02 18:18:54 jabber_lib.process: Error caught:
2008-09-02 18:18:54 jabber_lib.process: Traceback (most recent call last):
  File "/usr/share/rhn/osad/jabber_lib.py", line 1037, in process
    data = self._read(self.BLOCK_SIZE)
SysCallError: (-1, 'Unexpected EOF')


on changing the code to be

    log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE)
    try:
        data = self._read(self.BLOCK_SIZE)
    except:
        log_debug(5, "Closing socket")
        self._non_ssl_sock.close()

we get

2008-09-02 18:50:28 jabber_lib.process: Reading 1024 bytes from ssl socket
2008-09-02 18:50:28 jabber_lib.process: Closing socket
2008-09-02 18:50:28 jabber_lib.main: Sleeping 100 seconds
2008-09-02 18:52:08 osad.setup_config: Skipping config setup; counter=1; interval=65
2008-09-02 18:52:08 jabber_lib.setup_connection: Connecting to dhcp-1-221.fab.redhat.com

and the connection restarts cleanly with no lost socket hanging around.

Comment 1 Jan Pazdziora (Red Hat) 2008-09-22 08:30:17 UTC
Change committed to git, commit f140305fcbcfed8a8a343972bbe7a33eee096712.

Comment 3 Miroslav Suchý 2009-09-17 07:01:49 UTC
Spacewalk has been released for some time.


Note You need to log in before you can comment on or make changes to this bug.