201166 – osad leaves dangling network connections

Bug 201166 - osad leaves dangling network connections

Summary: osad leaves dangling network connections

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Network
Classification:	Retired
Component:	RHN/Other
Sub Component:
Version:	rhn370
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Pradeep Kilambi
QA Contact:	wes hayutin
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	203731 (view as bug list)
Depends On:
Blocks:	248627
TreeView+	depends on / blocked

Reported:	2006-08-03 10:49 UTC by Jose Plans
Modified:	2018-11-14 20:35 UTC (History)
CC List:	4 users (show)
Fixed In Version:	sat510
Clone Of:
Environment:
Last Closed:	2008-04-03 00:18:26 UTC
Embargoed:

Attachments	(Terms of Use)

Description Jose Plans 2006-08-03 10:49:29 UTC

Hi,

  RHN osad does not seem to clean connections/sockets properly after the
connection gets aborted for some reasons, it does not clean-up before opening a
new connection.

  The versions used for this :

  > osad-0.9-2.rhel3
  > and tested too with osad-0.9-5.rhel3.
  -
  > jabberpy-0.5-0.7.rhn.rhel3

If we get disconnected from the network a few times we will get :

[root@lthpjstst rhn]# netstat -np | grep :5222
tcp        0      0 10.230.244.51:32955         10.230.52.10:5222
CLOSE_WAIT  27607/python
tcp        0      0 10.230.244.51:32953         10.230.52.10:5222
CLOSE_WAIT  27607/python
tcp        0      0 10.230.244.51:32959         10.230.52.10:5222
ESTABLISHED 27607/python
tcp        0      0 10.230.244.51:32957         10.230.52.10:5222
CLOSE_WAIT  27607/python
[root@lthpjstst rhn]# ps -ef | grep osad
root     27607     1  0 13:15 pts/0    00:00:00 python /usr/sbin/osad
--pid-file /var/run/osad.pid
root     28713 15029  0 13:23 pts/0    00:00:00 grep osad
[root@lthpjstst rhn]# rpm -q osad
osad-0.9-5.rhel3
[root@lpgace11a root]# uname -a
Linux lpgace11a 2.4.21-32.0.1.ELsmp #1 SMP Tue May 17 17:52:23 EDT 2005 i686
i686 i386 GNU/Linux
[root@lpgace11a root]#

The problem is always reproducible. To reproduce the issue we just need to
restart the services a few times, letting some time in between to allow 
osad to sleep and retry the jabber connection.

Please let us know if you require more information,

Comment 3 Shannon Hughes 2007-04-12 15:20:18 UTC

*** Bug 203731 has been marked as a duplicate of this bug. ***

Comment 4 Shannon Hughes 2007-04-12 18:26:50 UTC

was able to get the osad client to drop the zombie socket when we run out of
servers to connect, but it kills the service as well, introduces a dangling pid
file and can't reconnect since its dead.  need some more time on this one. 

moving to sat510 triage due to time constraints

Comment 5 Pradeep Kilambi 2007-10-03 20:06:37 UTC

This is what is going on:

When osad is getting into this state, it calls jabber.Client.disconnected(self)
which in turn is calling xmlstream.Client.disconnect

The disconnect method tries to close the connection and then the socket but only
if the process is not alive. But in our case the process is always alive as we
get into the sleep state. This is putting the older ports into a CLOSED_WAIT state.




client># while true; do echo; date; netstat -npt | grep -i 5222; sleep 150; done

Wed Oct  3 14:19:42 EDT 2007
tcp        0      0 10.10.76.162:33133          10.10.76.168:5222          
ESTABLISHED 30378/python        
tcp        0      0 10.10.76.162:33111          10.10.76.168:5222          
TIME_WAIT   -  
Wed Oct  3 14:22:12 EDT 2007
tcp        0      0 10.10.76.162:33133          10.10.76.168:5222          
ESTABLISHED 30378/python        

Wed Oct  3 14:24:42 EDT 2007
tcp        0      0 10.10.76.162:33133          10.10.76.168:5222          
ESTABLISHED 30378/python        

Wed Oct  3 14:27:12 EDT 2007
tcp        0      0 10.10.76.162:33133          10.10.76.168:5222          
ESTABLISHED 30378/python

Comment 6 Pradeep Kilambi 2007-10-03 20:07:29 UTC

forgot to mention the above run is after adding the fix . As we can see there
are no closed-wait state connections.

Comment 7 wes hayutin 2008-01-21 16:15:29 UTC

verified build 47

[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 0.0.0.0:5222                0.0.0.0:*                  
LISTEN      
tcp        0      0 10.10.76.189:5222           10.10.76.189:32789         
ESTABLISHED 
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
ESTABLISHED 
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
ESTABLISHED 
[root@rlx-3-18 ~]# /etc/init.d/rhn-satellite stop
Shutting down rhn-satellite...
Stopping rhn-search...
Stopped rhn-search.
Stopping satellite-httpd: audit(1200931575.079:14): avc:  denied  { unlink } for
 pid=2720 comm="httpd" name="jk-runtime-status.2720.lock" dev=dm-0 ino=6357181
scontext=user_u:system_r:httpd_t tcontext=user_u:object_r:httpd_log_t tclass=file
[  OK  ]
waiting for processes to exit
waiting for processes to exit
Stopping RHN Taskomatic...
Stopped RHN Taskomatic.
Shutting down osa-dispatcher: [  OK  ]
Shutting down rhn-database: [  OK  ]
Shutting down Jabber router: [  OK  ]
Done.
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# 
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
tcp        0      0 10.10.76.189:32789          10.10.76.189:5222          
TIME_WAIT   
tcp        0      0 10.10.76.189:5222           10.10.76.182:42740         
FIN_WAIT2   
[root@rlx-3-18 ~]# netstat -an | grep 5222
[root@rlx-3-18 ~]# netstat -an | grep 5222
[root@rlx-3-18 ~]#

Comment 8 Devan Goodwin 2008-03-17 20:15:11 UTC

Looks good, tested by using the netstat commands above and bringing down
rhn-satellite service, no closed_wait states appear.

Comment 9 Brandon Perkins 2008-04-03 00:18:26 UTC

5.1 Sat GA so Closed for Current Release.

Note You need to log in before you can comment on or make changes to this bug.