Bug 673199

Summary: init script returns control before web apps have started
Product: [Fedora] Fedora Reporter: Rob Crittenden <rcritten>
Component: pki-caAssignee: Matthew Harmsen <mharmsen>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14CC: awnuk, dennis, jdennis, kwright, mharmsen, msauton
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-16 22:25:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Fix for startup of Tomcat instances . . . awnuk: review+

Description Rob Crittenden 2011-01-27 17:26:57 UTC
Description of problem:

When starting or restarting the CA with pki-cad the script just handles restarting tomcat6, not managing the applications that it starts. Control may return to the user before the web apps have started.

This is causing problems in IPA where we need to restart the server to do things like disable nonces, then continue with a pkisilent invocation. What we're seeing is the restart is successful then pkisilent fails with a Connection Refused because the CA is not available yet.

Version-Release number of selected component (if applicable):

pki-ca-9.0.1-2.svn.1762M.20110121T1347z.fc14.noarch

Steps to Reproduce:
1. service pki-cad restart
2. curl -v http://localhost:9180
* About to connect() to localhost port 9180 (#0)
*   Trying ::1... Connection refused
*   Trying 127.0.0.1... Connection refused
* couldn't connect to host
* Closing connection #0
curl: (7) couldn't connect to host

IPA ticket https://fedorahosted.org/freeipa/ticket/835

Comment 1 Marc Sauton 2011-02-01 02:03:01 UTC
Fedora release 14 (Laughlin)
Linux ipaserver1.example.com 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

tomcatjss-2.1.0-1.fc14.noarch
jss-4.2.6-12.fc14.x86_64
package redhat-ds-base is not installed
pki-ca-9.0.1-2.fc14.noarch
osutil-2.0.1-1.fc14.x86_64

hostname
ipaserver1.example.com

cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.14.5.16  ipaserver1.example.com ipaserver1

getenforce
Permissive

ifconfig
eth0      Link encap:Ethernet  HWaddr 54:52:00:1A:BD:69
          inet addr:10.14.5.16  Bcast:10.14.7.255  Mask:255.255.252.0
          inet6 addr: 3ffe:1111:2222:1000:5652:ff:fe1a:bd69/64 Scope:Global
          inet6 addr: fe80::5652:ff:fe1a:bd69/64 Scope:Link

pkicreate -pki_instance_root=/var/lib -pki_instance_name=pki-ca \
 -subsystem_type=ca  -agent_secure_port=9443  -ee_secure_port=9444  -ee_secure_client_auth_port=9446 \
 -admin_secure_port=9445  -unsecure_port=9180  -tomcat_server_port=9701  -user=pkiuser

web wizard

/sbin/service pki-cad status
pki-ca (pid 5167) is running...                            [  OK  ]
    Unsecure Port       = http://ipaserver1.example.com:9180/ca/ee/ca
    Secure Agent Port   = https://ipaserver1.example.com:9443/ca/agent/ca
    Secure EE Port      = https://ipaserver1.example.com:9444/ca/ee/ca
    Secure Admin Port   = https://ipaserver1.example.com:9445/ca/services
    EE Client Auth Port = https://ipaserver1.example.com:9446/ca/eeca/ca
    PKI Console Port    = pkiconsole https://ipaserver1.example.com:9445/ca
    Tomcat Port         = 9701 (for shutdown)

    PKI Instance Name:   pki-ca

    PKI Subsystem Type:  Root CA (Security Domain)

    Registered PKI Security Domain Information:
    ==========================================================================
    Name:  Example Domain
    URL:   https://ipaserver1.example.com:9445
    ==========================================================================
[root@ipaserver1 ~]#


[root@ipaserver1 ~]# /sbin/service pki-cad start; curl -v http://localhost:9180
Starting pki-ca:                                           [  OK  ]
* About to connect() to localhost port 9180 (#0)
*   Trying ::1... Connection refused
*   Trying 127.0.0.1... Connection refused
* couldn't connect to host
* Closing connection #0
curl: (7) couldn't connect to host
[root@ipaserver1 ~]#


but if I wait 1 or 2 seconds more...:

[root@ipaserver1 ~]# curl -v http://localhost:9180
* About to connect() to localhost port 9180 (#0)
*   Trying ::1... connected
* Connected to localhost (::1) port 9180 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.21.0 (x86_64-redhat-linux-gnu) libcurl/7.21.0 NSS/3.12.8.0 zlib/1.2.5 libidn/1.18 libssh2/1.2.4
> Host: localhost:9180
> Accept: */*
>
< HTTP/1.1 302 Moved Temporarily
< Server: Apache-Coyote/1.1
< Set-Cookie: JSESSIONID=B0198DEA772E316FDF913592F6091001; Path=/
< Location: http://localhost:9180/ca/ee/ca
< Content-Type: text/html
< Content-Length: 0
< Date: Mon, 31 Jan 2011 23:21:43 GMT
<
* Connection #0 to host localhost left intact
* Closing connection #0
[root@ipaserver1 ~]#

Comment 2 Matthew Harmsen 2011-02-02 03:36:09 UTC
Created attachment 476514 [details]
Fix for startup of Tomcat instances . . .

Comment 3 Matthew Harmsen 2011-02-02 03:46:31 UTC
# cd pki

# svn stat
M       base/common/scripts/functions

# svn commit
Sending        base/common/scripts/functions
Transmitting file data .
Committed revision 1807.

Comment 4 Matthew Harmsen 2011-02-02 04:30:27 UTC
Changed "netstat -an" to "netstat -antl":

Index: base/common/scripts/functions
===================================================================
--- base/common/scripts/functions	(revision 1807)
+++ base/common/scripts/functions	(working copy)
@@ -702,7 +702,7 @@
             port=`grep '^pkicreate.unsecure_port=' ${pki_instance_configuration_file} | cut -b25- -`
             while [ $count -lt $tries ]
             do
-                netstat -an | grep ${port} > /dev/null
+                netstat -antl | grep ${port} > /dev/null
                 netrv=$?
                 if [ $netrv -eq 0 ] ; then
                     break;

# cd pki

# svn status | grep -v ^$ | grep -v ^P | grep -v ^X | grep -v ^?
M       base/common/scripts/functions

# svn commit
Sending        base/common/scripts/functions
Transmitting file data .
Committed revision 1809.

Comment 5 Rob Crittenden 2011-02-02 14:23:28 UTC
I'd have probably used cut -d= -f2 but this works.

Comment 6 John Dennis 2011-02-07 18:48:33 UTC
There was an email I wrote to Rob and Ade which captured the issues before Matt jumped in and provided a fix. So although this bug is now closed I thought I would paste the email in here just to capture some information for historical purposes. I'm also adding a response from Ade

--- jdennis 1/26/2010 ---

So I've done some research and thought I would share what I found.

The tomcat6 initscript starts the jvm in the background and immediately 
writes the pid from the background process into the pidfile. It does not 
wait, as long as the jvm starts you'll have a valid pid (for some 
duration) and the init script will consider this success. FWIW this is 
backward from how many daemons work. Most daemons write the pid file 
themselves once they've successfully initialized. It's not normal for 
the initscript to write the pid file for the obvious reason the 
initscript can't know what the process is doing.

I did remove the "wait" code from pki-cad, mostly because it was silly, 
it was testing the existence of the pidfile which (from above) is 
instantly created when the jvm is invoked. In other words it wasn't 
doing anything useful.

I checked the tomcat6 BootStrap code which loads the instance, there 
does not seem to be any support for either writing the pid file or 
providing any indication of start up completion :-(

I checked our initialization code (CMSEngine.java). There is a flag 
(isStarted) returned by the method isInRunningState(). It's used 
internally only and it's state is not exported into the environment.

However I'm not 100% sure that isInRunningState() is actually accurate 
to an external process because the tomcat instance might need to 
complete other initialization after invoking our start up, but logically 
I would expect by the time it invokes our start up code it's ready to 
serve. So I think it's probably an O.K. indicator (if it were visible).

At this point I have two suggestions (based on the premise we don't want 
to sleep a fixed amount of time).

1) When CMSEngine thinks it's fully initialized it could write an 
"alternative" pid file. The external test to see if the server is up 
would be to check if the alternative pid file is present and contains 
the same pid as the normal pid file and the process is still running of 
course.

2) Try to connect to one of the connector sockets, if the connection 
succeeds we're up. Or maybe even have a servlet whose only purpose is to 
provide status (I didn't check, maybe we already have such a servlet, do 
we?). Rob, is that what you were trying with curl?

--- alee 1/26/2010 ---

I like #2. And I believe there is already a servlet that can be used for
this purpose.

The servlet getStatus.java - accessible from the admin port as:

https://dhcp231-121.rdu.redhat.com:9305/ca/admin/ca/getStatus

returns the following:

<XMLResponse>
<State>1</State>
<Type>CA</Type>
</XMLResponse>

Type is the type of the subsystem (CA, KRA etc.) and State is whether
the configuration has been successfully completed or not (1 means
configured, 0 means not). 

It does not return the value of isInRunningState() - although it could
easily be modified to do so, and it is likely that you would not be able
to reach this servlet unless initialization was complete.

There are also some servlets which check the value of isInRunningState()
and exit out if we are not yet ready.  We could use one of those
servlets - or add that check to getStatus.

Examples of those servlets include:

https://dhcp231-121.rdu.redhat.com:9304/ca/ee/dynamicVars.js
 (ee port)
https://dhcp231-121.rdu.redhat.com:9303/ca/agent/dynamicVars.js (agent
port)
https://dhcp231-121.rdu.redhat.com:9304/ca/admin/dynamicVars.js (admin
port)

Comment 7 Fedora End Of Life 2012-08-16 22:26:00 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping