Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 630875

Summary: rhn_check hangs forever when sat not available
Product: Red Hat Enterprise Linux 5 Reporter: Luc de Louw <luc>
Component: rhn-client-toolsAssignee: Miroslav Suchý <msuchy>
Status: CLOSED CANTFIX QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: low Docs Contact:
Priority: low    
Version: 5.5CC: jhutar, msuchy
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 688095 (view as bug list) Environment:
Last Closed: 2011-03-16 10:18:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 688095    

Description Luc de Louw 2010-09-07 08:54:45 UTC
Description of problem:
rhn_check which is triggered by the rhnsd hangs forever if the satellite server crashed 

Version-Release number of selected component (if applicable):
0.4.20-33.el5_5.2

How reproducible:
Not sure, I'm trying to provoke it again

Steps to Reproduce:
1. Crash the satellite server
2. Wait until systems getting of status "inactive"
3. Start up Satellite server again
  
Actual results:
rhn_check process hangs and does nothing


Expected results:
rhn_check should terminate after a timeout to give rhnsd the chance to start rhn_check again -> Systems will get state active again.

Additional info:
The crash of the satellite server was a strange one, the system was pingable, but access to rhn satellite was not possible anymore, same applies to ssh etc.

After restarting the Sat Server, a lsof to the PID of rhn_check shows an established https connection to the satellite.

Comment 1 Jan Hutař 2011-01-31 09:49:28 UTC
Hello. Do you have idea in which phase of rhn_check Satellite got unaccessible? This would be important for reproducing the issue.

Thanks in advance,
Jan

Comment 3 Luc de Louw 2011-02-10 12:51:11 UTC
Hi Jan,

It is quite hard to reproduce this. Maybe the best is to drop off a fork bomb like
":(){ :|:& };:" on the satellite, then rhn_check hangs.

I don't think that it is bound to a specific phase of rhn_check.

On a clean shutdown rhn_check bails out with an error message:

* Satellite shutdown after fire rhn_check:
server:~# rhn_check 
Error: Server Unavailable. Please try later.

* Fire rhn_check after the shutdown:
server:~# rhn_check 
Could not retrieve action from <RetryServer for sat.example.com/XMLRPC>.
Possible networking problem?

Thanks,

Luc

Comment 4 Miroslav Suchý 2011-03-16 10:18:27 UTC
Steps to reproduce:
1. shutdown satellite
2. Instead of satellite run:
 nc -l 0.0.0.0 80
or 
 nc -l 0.0.0.0 443
3. on client run:
 rhn_check

rhn_check will stuck forever and will wait for response.

For the connection we use httplib.HTTPConnection from python. It accept as one parameter timeout, which will solve this problem. But this timout was added in python 2.6 whereas RHEL5 has python 2.4.

So I'm afraid I could not fix it in RHEL5. I will close this bug and will clone it to RHEL6, where fix is possible.

One note for the fix. The fix will only help in situation when Satellite completely die, but did not close connection. If it is "just" under heavy load (as suggested in #3), and will sent at least one byte before timeout, then httplib will not timout.