Bug 679174

Summary: netstat loop fixes needed
Product: Red Hat Enterprise Linux 6 Reporter: Matthew Harmsen <mharmsen>
Component: pki-coreAssignee: Matthew Harmsen <mharmsen>
Status: CLOSED ERRATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: alee, benl, jdennis, jgalipea, shaines
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pki-core-9.0.3-3.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 678715 Environment:
Last Closed: 2011-05-19 13:44:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 678715    
Bug Blocks:    
Attachments:
Description Flags
better netstat loop behaviour awnuk: review+

Description Matthew Harmsen 2011-02-21 19:46:07 UTC
+++ This bug was initially created as a clone of Bug #678715 +++

In the start_instance() function in file pki/base/common/scripts/functions

A loop was recently added to wait for the instance to fully initialize by using netstat to check socket availability.

There are a couple of minor problems which need fixing.

1) No check for previous status

The instance is started like this:

    $PKI_INSTANCE_INITSCRIPT start
    rv=$?

then the loop is entered. But if the initscript failed ($rv -ne 0) there is no point in looping for 30 seconds waiting for it to come up. The function should immediately return the failed error code.

2) If the loop is exhausted and no sockets were detected the function should return a failure error code.

--- Additional comment from jdennis on 2011-02-18 18:54:53 EST ---

Created attachment 479632 [details]
better netstat loop behaviour

Comment 1 Matthew Harmsen 2011-02-22 00:34:07 UTC
Created attachment 480025 [details]
better netstat loop behaviour

Comment 2 Matthew Harmsen 2011-02-22 00:39:23 UTC
IPA_v2_RHEL_6_1_ERRATA_BRANCH:

# cd pki

# svn status | grep -v ^$ | grep -v ^P | grep -v ^X | grep -v ^?
M       base/common/scripts/functions

# svn commit
Sending        base/common/scripts/functions
Transmitting file data .
Committed revision 1862.

Resolves #679174 - netstat loop fixes needed

Comment 4 Jenny Severance 2011-04-18 17:54:26 UTC
Can you please add steps to verify this issue? thanks

Comment 5 John Dennis 2011-04-18 18:09:58 UTC
You to get the CA into state where it can't initialize and can't come up such that it reports an initialization failure. Previously the initscript would wait a very long time before it decided it ain't going to happen, with the fix in place the initscript reports the failure immediately. Sorry, can't help you with how to screw up the CA so bad it won't initialize, but somewhere along the way we did have such a situation.

Comment 6 John Dennis 2011-04-18 18:31:47 UTC
The tomcat6 initscript returns immediately after launching the JVM, as long as the JVM starts it reports success, but that does not mean the tomcat instance hosted by the JVM fully initialized. To discover if the instance fully initialized and was ready to serve we used the existence of a listening port as evidence. Therefore we added a loop which used netstat to check for the port. It looped pausing each second until it saw the port. It did this for 30 seconds, if no port appeared after 30 seconds it reported failure. But the tomcat6 initscript can immediately report failure in some circumstances. Thus there is no point in looping for 30 for a port you know for a fact will never appear because the JVM launch failed. We should immediately break out the loop in this case and immediately report the failure instead of waiting 30 seconds.

One way you might immediately case a failure of the JVM launch is by removing the pkiuser from the system because the tomcat6 initscript is instructed to run the JVM as that user, if the user does not exist it should immediately fail.

Comment 7 Jenny Severance 2011-04-18 19:02:22 UTC
After discussion with john, an easy way to verify this issue is to stop the ipa services, remove the pkiuser and start the services again ... starting of the CA should fail immediately, rather that keep trying for 30 seconds.

starting CA immediately returned:

Starting CA Service
Starting pki-ca: chown: invalid user: `pkiuser:pkiuser'
chown: invalid user: `pkiuser:pkiuser'
Error code 4                                               [FAILED]
Failed to start CA Service


version
pki-ca-9.0.3-10.el6.noarch
ipa-server-2.0.0-21.el6.x86_64

Comment 8 errata-xmlrpc 2011-05-19 13:44:05 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0627.html