Bug 461274 - Osad leaves connections in CLOSE_WAIT state.
Osad leaves connections in CLOSE_WAIT state.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Other (Show other bugs)
511
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jan Pazdziora
Garik Khachikyan
:
Depends On:
Blocks: 456985 461291
  Show dependency treegraph
 
Reported: 2008-09-05 10:03 EDT by Issue Tracker
Modified: 2015-01-04 16:56 EST (History)
6 users (show)

See Also:
Fixed In Version: sat530
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-10 16:27:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Issue Tracker 2008-09-05 10:03:40 EDT
Escalated to Bugzilla from IssueTracker
Comment 1 Issue Tracker 2008-09-05 10:03:42 EDT
Hello,

This ticket is a follow up of IT#99018.

-- Description of the problem --

Client system : RHEL5.2 with osad-5.1.0-7.el5 installed.

Satellite : RHN5.1 Satellite running on a RHEL4.7 with  jabberpy-0.5-0.12.rhn.rhel4 installed.

When osad is running, there's an opened connection to Satellite on port 5222. State of this connection is ESTABLISHED. If for any reason client cannot communicate with Satellite on port 5222, then the connection goes into CLOSE_WAIT state. When the network connection is back, osad opens a new connection with jabberd on port 5222, leaving the CLOSE_WAIT connection as it is.

-- How to reproduce --

- register a RHEL5 client to a RHN Satellite.

- make sure that osad is running on the client and jabberd is running on the Satellite.

- check if there's an opened connection on port 5222

# netstat -anp | grep 5222
tcp        0      0 <client IP>:34269            <Sat IP>:5222            ESTABLISHED 2748/python

- switch off jabberd service

#service jabberd stop

- check the network connection state :

# netstat -anp | grep 5222
tcp        1      0 <client IP>:34269            <Sat IP>:5222            CLOSE_WAIT  2748/python

- restart jabberd service and check for connections on port 5222 :

# netstat -anp | grep 5222
tcp        0      0 <client IP>:39093            <Sat IP>:5222            ESTABLISHED 1807/python         
tcp        0      0 <client IP>:34238            <Sat IP>:5222            CLOSE_WAIT  1807/python

We can see that there are two connections to Satellite on port 5222. Each time jabberd will be restarted a new connection will be opened, stacked with the existing ones.
Restarting osad service on client system helps to get rid on these connections.

-- Expected results --

These CLOSE_WAIT connections should be closed, leaving only one opened.
This event sent from IssueTracker by mpoole  [Support Engineering Group]
 issue 217483
Comment 2 Martin Poole 2008-09-05 10:16:36 EDT
Follow-on from BZ#201166

The problem appears to be the lack of exception handling in the jabber_lib.py code in the osad package.

In the process method there is the following,

    log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE)
    try:
        data = self._read(self.BLOCK_SIZE)
    except SSL.SSL.SysCallError, e:
        raise SSLError("OpenSSL error; will retry", str(e))

when I set the log level to 5 and changed the above to the following

    log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE)
    try:
        data = self._read(self.BLOCK_SIZE)
    except:
        log_debug(5, "Error caught:")
        log_debug(5, extract_traceback())

the following is noted in the osad.log file

2008-09-02 18:18:54 jabber_lib.process: before select(); timeout None
2008-09-02 18:18:54 jabber_lib.process: select() returned
2008-09-02 18:18:54 jabber_lib.process: Reading 1024 bytes from ssl socket
2008-09-02 18:18:54 jabber_lib.process: Error caught:
2008-09-02 18:18:54 jabber_lib.process: Traceback (most recent call last):
  File "/usr/share/rhn/osad/jabber_lib.py", line 1037, in process
    data = self._read(self.BLOCK_SIZE)
SysCallError: (-1, 'Unexpected EOF')


on changing the code to be

    log_debug(5, "Reading %s bytes from ssl socket" % self.BLOCK_SIZE)
    try:
        data = self._read(self.BLOCK_SIZE)
    except:
        log_debug(5, "Closing socket")
        self._non_ssl_sock.close()

we get

2008-09-02 18:50:28 jabber_lib.process: Reading 1024 bytes from ssl socket
2008-09-02 18:50:28 jabber_lib.process: Closing socket
2008-09-02 18:50:28 jabber_lib.main: Sleeping 100 seconds
2008-09-02 18:52:08 osad.setup_config: Skipping config setup; counter=1; interval=65
2008-09-02 18:52:08 jabber_lib.setup_connection: Connecting to dhcp-1-221.fab.redhat.com

and the connection restarts cleanly with no lost socket hanging around.
Comment 3 Clifford Perry 2008-09-09 11:15:36 EDT
Adding to Sep Customer backlog tracking bug.
Comment 4 Jan Pazdziora 2008-09-22 04:28:56 EDT
Filed upstream (Spacewalk) bug 463143.
Comment 5 Jan Pazdziora 2008-09-22 04:31:14 EDT
Code change committed in upstream (bug 463143).
Comment 6 Jan Pazdziora 2008-12-02 14:11:38 EST
With

/mnt/redhat/devel/candidate-trees/Satellite-5.3.0-RHEL5-re20081128.0
/mnt/redhat/devel/candidate-trees/Satellite-5.3.0-RHEL4-re20081128.0

available, moving ON_QA per Cliff's advice.
Comment 7 Jeff Browning 2009-04-27 17:32:13 EDT
Need more information to retest this in 530.

After installing osad on the client and jabberd on the satellite, I followed the instructions in the reproduction steps.

Even though both services were running, I could not get the 5222 port to display any status other than LISTENING.
Comment 8 Jan Pazdziora 2009-04-28 02:34:43 EDT
(In reply to comment #7)
> 
> Even though both services were running, I could not get the 5222 port to
> display any status other than LISTENING.  

LISTENING is weird. LISTENING means the connection was not made at all. So it looks like you do not have the thing configured / setup yet, to even start the reproduction steps.

Please check logs or anything of interest on both ends.
Comment 9 Garik Khachikyan 2009-06-19 04:54:19 EDT
# REOPEN for Satellite-5.3.0-RHEL4-20090616.1

The issue is fixed for a server RHEL5.3 but still exists for server RHEL4.8 installed with a Satellite-5.3.0-RHEL4-20090616.1/tree-x86_64
(the CLOSE_WAIT exists after jabberd restart).

How to reproduce:
1. stop osad on client side.
2. stop jabberd on server side.
3. register client to server.
4. start osad on client side.
5. start jabberd on server side.
6. wait approx. 2 minutes (follow by server netstat-s to see connection established to client)
7. restart jabberd on server side.
8. wait approx. 2 minutes (follow by server netstat-s to see a new connection re-established to client)
9. !!! - on client side two connections are there, one is on CLOSE_WAIT state.

I still have the servers / client to reproduce. Please feel free to refer for accessing to them (avail till next Mon I think).

Versions:
server RHEL5.3 - jabberpy-0.5-0.15.el5 
server RHEL4.8 - jabberpy-0.5-0.15.el4 
client RHEL5.2 - osad-5.1.0-7.el5
Comment 10 Jan Pazdziora 2009-06-19 05:14:00 EDT
The machines are:

sun-v60x-01.rhts.bos.redhat.com - server RHEL4.8
pogo-cn1100-01.rhts.bos.redhat.com - server RHEL 5.3
dell-pem710-01.rhts.eng.bos.redhat.com - client with RHEL Server 5.2 version
Comment 11 Jan Pazdziora 2009-06-19 05:26:03 EDT
Garik, the fix went into the *client* package, osad. The machine dell-pem710-01 still has the original version installed, osad-5.1.0-7.el5. The latest version in RHN Tools channels on webqa is osad-5.9.10-4.el5sat.noarch and they contain the code change. Please test with latest packages, on client as well. Thank you.
Comment 12 Garik Khachikyan 2009-06-19 11:48:06 EDT
# VERIFIED

The osad version osad-5.9.10-4.el5sat.noarch performs quite well with connections closing in jabberd restarts (i.e. no more CLOSE_WAIT-s).

Checked on: RHEL 5.3 (x86_64) and RHEL 4.8 (i386)
Comment 13 Petr Sklenar 2009-08-06 12:10:00 EDT
checked again with

client: x86_64 rhel54-20090805
[root@nec-em14 ~]# netstat -anp | grep 5222
tcp        0      0 10.16.64.156:54457          10.16.65.162:5222           ESTABLISHED 26470/python        

>>satellite: jabberd manually stopped

[root@nec-em14 ~]# netstat -anp | grep 5222
tcp        1      0 10.16.64.156:54457          10.16.65.162:5222           CLOSE_WAIT  26470/python        
[root@nec-em14 ~]# netstat -anp | grep 5222

>>satellite: jabberd manually started

[root@nec-em14 ~]# netstat -anp | grep 5222
tcp      548      0 10.16.64.156:49151          10.16.65.162:5222           ESTABLISHED 26470/python        
[root@nec-em14 ~]# rpm -q osad
osad-5.9.10-5.el5sat

--
>> Osa status ping works
Comment 15 Brandon Perkins 2009-09-10 16:27:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1434.html

Note You need to log in before you can comment on or make changes to this bug.