Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 688095

Summary:	rhn_check hangs forever when sat not available
Product:	Red Hat Enterprise Linux 6	Reporter:	Miroslav Suchý <msuchy>
Component:	rhnlib	Assignee:	Milan Zázrivec <mzazrivec>
Status:	CLOSED ERRATA	QA Contact:	Martin Minar <mminar>
Severity:	low	Docs Contact:
Priority:	low
Version:	6.2	CC:	jhutar, luc, mkoci, mminar, msuchy, msvoboda
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	rhnlib-2.5.22-11.el6	Doc Type:	Bug Fix
Doc Text:	Due to an error in the rhnlib code, network operations would have become unresponsive when an HTTP connection to Red Hat Network (RHN) or RHN Satellite became idle. The code has been modified to use timeout for HTTP connections. Network operations are now terminated after predefined time interval and can be restarted.	Story Points:	---
Clone Of:	630875	Environment:
Last Closed:	2011-12-06 16:50:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	630875
Bug Blocks:

Description Miroslav Suchý 2011-03-16 10:20:17 UTC

+++ This bug was initially created as a clone of Bug #630875 +++

Description of problem:
rhn_check which is triggered by the rhnsd hangs forever if the satellite server crashed 

Version-Release number of selected component (if applicable):
0.4.20-33.el5_5.2

How reproducible:
Not sure, I'm trying to provoke it again

Steps to Reproduce:
1. Crash the satellite server
2. Wait until systems getting of status "inactive"
3. Start up Satellite server again
  
Actual results:
rhn_check process hangs and does nothing


Expected results:
rhn_check should terminate after a timeout to give rhnsd the chance to start rhn_check again -> Systems will get state active again.

Additional info:
The crash of the satellite server was a strange one, the system was pingable, but access to rhn satellite was not possible anymore, same applies to ssh etc.

After restarting the Sat Server, a lsof to the PID of rhn_check shows an established https connection to the satellite.

--- Additional comment from jhutar on 2011-01-31 22:10:03 EST ---

QA: This will need more testing - ensure you will find a way how to reproduce on OLD version please as this might be important (system stuck in "inactive" state forever because of Satellite crash/restart)

--- Additional comment from luc on 2011-02-10 07:51:11 EST ---

Hi Jan,

It is quite hard to reproduce this. Maybe the best is to drop off a fork bomb like
":(){ :|:& };:" on the satellite, then rhn_check hangs.

I don't think that it is bound to a specific phase of rhn_check.

On a clean shutdown rhn_check bails out with an error message:

* Satellite shutdown after fire rhn_check:
server:~# rhn_check 
Error: Server Unavailable. Please try later.

* Fire rhn_check after the shutdown:
server:~# rhn_check 
Could not retrieve action from <RetryServer for sat.example.com/XMLRPC>.
Possible networking problem?

Thanks,

Luc

--- Additional comment from msuchy on 2011-03-16 06:18:27 EDT ---

Steps to reproduce:
1. shutdown satellite
2. Instead of satellite run:
 nc -l 0.0.0.0 80
or 
 nc -l 0.0.0.0 443
3. on client run:
 rhn_check

rhn_check will stuck forever and will wait for response.

For the connection we use httplib.HTTPConnection from python. It accept as one parameter timeout, which will solve this problem. But this timout was added in python 2.6 whereas RHEL5 has python 2.4.

So I'm afraid I could not fix it in RHEL5. I will close this bug and will clone it to RHEL6, where fix is possible.

One note for the fix. The fix will only help in situation when Satellite completely die, but did not close connection. If it is "just" under heavy load (as suggested in #3), and will sent at least one byte before timeout, then httplib will not timout.

Comment 1 Miroslav Suchý 2011-03-16 10:25:38 UTC

Note for developer:
The change will be here:
--- /usr/lib/python2.6/site-packages/rhn/connections.py.orig    2011-03-16 11:37:41.369889498 +0100
+++ /usr/lib/python2.6/site-packages/rhn/connections.py 2011-03-16 11:24:46.604918969 +0100
@@ -64,7 +64,7 @@
     response_class = HTTPResponse
     
     def __init__(self, host, port=None):
-        httplib.HTTPConnection.__init__(self, host, port)
+        httplib.HTTPConnection.__init__(self, host, port, timeout=30)
         self._cb_rs = []
         self._cb_ws = []
         self._cb_ex = []

The change must be done in all classes in this module. And of course - it will be nice to set timeout in /etc/sysconfig/rhn/up2date config file. However rhnlib package has no way to read this file. So the timeout value will need to propagate from code from rhn-client-tools packages, which use rhn.connections module (it may be several layers).

Comment 2 Milan Zázrivec 2011-08-08 11:22:53 UTC

spacewalk.git master: 6c2a93bcb7efa9873aee956f0cf7355177d4cc59
satellite.git CLIENT-RHEL-6: ae670a56be85d4ef50720d829dda71066640e88c

Comment 4 Milan Zázrivec 2011-08-08 15:07:47 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: A bug in rhnlib code.

Consequence: Network operations would hang forever in cases when connection to RHN / RHN Satellite would be established but idle.

Fix: Establish a timeout for HTTP connections to RHN / RHN Satellite.

Result: Idle HTTP connections would timeout after a predefined time interval.

Comment 5 Martin Minar 2011-08-09 12:28:49 UTC

Verified with rhnlib-2.5.22-11.el6.

Notes:
1. Problem is only with http (port 80) version.
2. Used "nc -l 0.0.0.0 80" reproducer.
3. Old version didn't timeout.
4. New version:
[root@XYZ ~]# time rhn_check -vv
Could not retrieve action from <RetryServer for dell-pe-sc1435-02.rhts.englab.brq.redhat.com/XMLRPC>.
Possible networking problem?

real	2m0.382s
user	0m0.110s
sys	0m0.030s

Comment 6 Miroslav Svoboda 2011-08-26 11:51:15 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: A bug in rhnlib code.
+Due to an error in the rhnlib code, network operations would have become unresponsive when an HTTP connection to Red Hat Network (RHN) or RHN Satellite became idle. The code has been modified to use timeout for HTTP connections. Network operations are now terminated after predefined time interval and can be restarted.-
-Consequence: Network operations would hang forever in cases when connection to RHN / RHN Satellite would be established but idle.
-
-Fix: Establish a timeout for HTTP connections to RHN / RHN Satellite.
-
-Result: Idle HTTP connections would timeout after a predefined time interval.

Comment 7 errata-xmlrpc 2011-12-06 16:50:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1665.html