Hide Forgot
Description of problem: When starting or restarting the CA with pki-cad the script just handles restarting tomcat6, not managing the applications that it starts. Control may return to the user before the web apps have started. This is causing problems in IPA where we need to restart the server to do things like disable nonces, then continue with a pkisilent invocation. What we're seeing is the restart is successful then pkisilent fails with a Connection Refused because the CA is not available yet. Version-Release number of selected component (if applicable): pki-ca-9.0.1-2.svn.1762M.20110121T1347z.fc14.noarch Steps to Reproduce: 1. service pki-cad restart 2. curl -v http://localhost:9180 * About to connect() to localhost port 9180 (#0) * Trying ::1... Connection refused * Trying 127.0.0.1... Connection refused * couldn't connect to host * Closing connection #0 curl: (7) couldn't connect to host IPA ticket https://fedorahosted.org/freeipa/ticket/835
Fedora release 14 (Laughlin) Linux ipaserver1.example.com 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux tomcatjss-2.1.0-1.fc14.noarch jss-4.2.6-12.fc14.x86_64 package redhat-ds-base is not installed pki-ca-9.0.1-2.fc14.noarch osutil-2.0.1-1.fc14.x86_64 hostname ipaserver1.example.com cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.14.5.16 ipaserver1.example.com ipaserver1 getenforce Permissive ifconfig eth0 Link encap:Ethernet HWaddr 54:52:00:1A:BD:69 inet addr:10.14.5.16 Bcast:10.14.7.255 Mask:255.255.252.0 inet6 addr: 3ffe:1111:2222:1000:5652:ff:fe1a:bd69/64 Scope:Global inet6 addr: fe80::5652:ff:fe1a:bd69/64 Scope:Link pkicreate -pki_instance_root=/var/lib -pki_instance_name=pki-ca \ -subsystem_type=ca -agent_secure_port=9443 -ee_secure_port=9444 -ee_secure_client_auth_port=9446 \ -admin_secure_port=9445 -unsecure_port=9180 -tomcat_server_port=9701 -user=pkiuser web wizard /sbin/service pki-cad status pki-ca (pid 5167) is running... [ OK ] Unsecure Port = http://ipaserver1.example.com:9180/ca/ee/ca Secure Agent Port = https://ipaserver1.example.com:9443/ca/agent/ca Secure EE Port = https://ipaserver1.example.com:9444/ca/ee/ca Secure Admin Port = https://ipaserver1.example.com:9445/ca/services EE Client Auth Port = https://ipaserver1.example.com:9446/ca/eeca/ca PKI Console Port = pkiconsole https://ipaserver1.example.com:9445/ca Tomcat Port = 9701 (for shutdown) PKI Instance Name: pki-ca PKI Subsystem Type: Root CA (Security Domain) Registered PKI Security Domain Information: ========================================================================== Name: Example Domain URL: https://ipaserver1.example.com:9445 ========================================================================== [root@ipaserver1 ~]# [root@ipaserver1 ~]# /sbin/service pki-cad start; curl -v http://localhost:9180 Starting pki-ca: [ OK ] * About to connect() to localhost port 9180 (#0) * Trying ::1... Connection refused * Trying 127.0.0.1... Connection refused * couldn't connect to host * Closing connection #0 curl: (7) couldn't connect to host [root@ipaserver1 ~]# but if I wait 1 or 2 seconds more...: [root@ipaserver1 ~]# curl -v http://localhost:9180 * About to connect() to localhost port 9180 (#0) * Trying ::1... connected * Connected to localhost (::1) port 9180 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.21.0 (x86_64-redhat-linux-gnu) libcurl/7.21.0 NSS/3.12.8.0 zlib/1.2.5 libidn/1.18 libssh2/1.2.4 > Host: localhost:9180 > Accept: */* > < HTTP/1.1 302 Moved Temporarily < Server: Apache-Coyote/1.1 < Set-Cookie: JSESSIONID=B0198DEA772E316FDF913592F6091001; Path=/ < Location: http://localhost:9180/ca/ee/ca < Content-Type: text/html < Content-Length: 0 < Date: Mon, 31 Jan 2011 23:21:43 GMT < * Connection #0 to host localhost left intact * Closing connection #0 [root@ipaserver1 ~]#
Created attachment 476514 [details] Fix for startup of Tomcat instances . . .
# cd pki # svn stat M base/common/scripts/functions # svn commit Sending base/common/scripts/functions Transmitting file data . Committed revision 1807.
Changed "netstat -an" to "netstat -antl": Index: base/common/scripts/functions =================================================================== --- base/common/scripts/functions (revision 1807) +++ base/common/scripts/functions (working copy) @@ -702,7 +702,7 @@ port=`grep '^pkicreate.unsecure_port=' ${pki_instance_configuration_file} | cut -b25- -` while [ $count -lt $tries ] do - netstat -an | grep ${port} > /dev/null + netstat -antl | grep ${port} > /dev/null netrv=$? if [ $netrv -eq 0 ] ; then break; # cd pki # svn status | grep -v ^$ | grep -v ^P | grep -v ^X | grep -v ^? M base/common/scripts/functions # svn commit Sending base/common/scripts/functions Transmitting file data . Committed revision 1809.
I'd have probably used cut -d= -f2 but this works.
There was an email I wrote to Rob and Ade which captured the issues before Matt jumped in and provided a fix. So although this bug is now closed I thought I would paste the email in here just to capture some information for historical purposes. I'm also adding a response from Ade --- jdennis 1/26/2010 --- So I've done some research and thought I would share what I found. The tomcat6 initscript starts the jvm in the background and immediately writes the pid from the background process into the pidfile. It does not wait, as long as the jvm starts you'll have a valid pid (for some duration) and the init script will consider this success. FWIW this is backward from how many daemons work. Most daemons write the pid file themselves once they've successfully initialized. It's not normal for the initscript to write the pid file for the obvious reason the initscript can't know what the process is doing. I did remove the "wait" code from pki-cad, mostly because it was silly, it was testing the existence of the pidfile which (from above) is instantly created when the jvm is invoked. In other words it wasn't doing anything useful. I checked the tomcat6 BootStrap code which loads the instance, there does not seem to be any support for either writing the pid file or providing any indication of start up completion :-( I checked our initialization code (CMSEngine.java). There is a flag (isStarted) returned by the method isInRunningState(). It's used internally only and it's state is not exported into the environment. However I'm not 100% sure that isInRunningState() is actually accurate to an external process because the tomcat instance might need to complete other initialization after invoking our start up, but logically I would expect by the time it invokes our start up code it's ready to serve. So I think it's probably an O.K. indicator (if it were visible). At this point I have two suggestions (based on the premise we don't want to sleep a fixed amount of time). 1) When CMSEngine thinks it's fully initialized it could write an "alternative" pid file. The external test to see if the server is up would be to check if the alternative pid file is present and contains the same pid as the normal pid file and the process is still running of course. 2) Try to connect to one of the connector sockets, if the connection succeeds we're up. Or maybe even have a servlet whose only purpose is to provide status (I didn't check, maybe we already have such a servlet, do we?). Rob, is that what you were trying with curl? --- alee 1/26/2010 --- I like #2. And I believe there is already a servlet that can be used for this purpose. The servlet getStatus.java - accessible from the admin port as: https://dhcp231-121.rdu.redhat.com:9305/ca/admin/ca/getStatus returns the following: <XMLResponse> <State>1</State> <Type>CA</Type> </XMLResponse> Type is the type of the subsystem (CA, KRA etc.) and State is whether the configuration has been successfully completed or not (1 means configured, 0 means not). It does not return the value of isInRunningState() - although it could easily be modified to do so, and it is likely that you would not be able to reach this servlet unless initialization was complete. There are also some servlets which check the value of isInRunningState() and exit out if we are not yet ready. We could use one of those servlets - or add that check to getStatus. Examples of those servlets include: https://dhcp231-121.rdu.redhat.com:9304/ca/ee/dynamicVars.js (ee port) https://dhcp231-121.rdu.redhat.com:9303/ca/agent/dynamicVars.js (agent port) https://dhcp231-121.rdu.redhat.com:9304/ca/admin/dynamicVars.js (admin port)
This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping