Bug 1005446
| Summary: | ipa-ca-install fails with CA not starting after 120 seconds. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michael Gregg <mgregg> | ||||||
| Component: | ipa | Assignee: | Martin Kosek <mkosek> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Namita Soman <nsoman> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.0 | CC: | alee, mkosek, nkinder, nsoman, pviktori, rcritten, spoore | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | ipa-3.3.2-2.el7 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-06-13 10:15:21 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 795046 [details]
client install log from replica
As you can see, the ca install log ends with a python exception that looks like it was caused when the CA does in fact not return:
2013-09-07T02:29:06Z DEBUG request headers {'date': 'Sat, 07 Sep 2013 02:29:06 GMT', 'content-length': '219', 'content-type': 'text/html; charset=iso-8859-1', 'server': 'Apache/2.4.6 (Red Hat) mod_auth_kerb/5.4 mod_nss/2.4.4 NSS/3.15 mod_wsgi/3.4 Python/2.7.5'}
2013-09-07T02:29:06Z DEBUG request body '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>404 Not Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p>The requested URL /ca/admin/ca/getStatus was not found on this server.</p>\n</body></html>\n'
2013-09-07T02:29:06Z DEBUG The CA status is: check interrupted
2013-09-07T02:29:06Z DEBUG Waiting for CA to start...
2013-09-07T02:29:07Z DEBUG File "/usr/lib/python2.7/site-packages/ipaserver/install/installutils.py", line 622, in run_script
return_value = main_function()
File "/usr/sbin/ipa-ca-install", line 186, in main
config, dogtag_master_ds_port, postinstall=True)
File "/usr/lib/python2.7/site-packages/ipaserver/install/cainstance.py", line 1680, in install_replica_ca
subject_base=config.subject_base)
File "/usr/lib/python2.7/site-packages/ipaserver/install/cainstance.py", line 478, in configure_instance
self.start_creation(runtime=210)
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 364, in start_creation
method()
File "/usr/lib/python2.7/site-packages/ipaserver/install/cainstance.py", line 484, in __start
self.start()
File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 268, in start
self.service.start(instance_name, capture_output=capture_output, wait=wait)
File "/usr/lib/python2.7/site-packages/ipapython/platform/fedora16/service.py", line 170, in start
self.__wait_until_running()
File "/usr/lib/python2.7/site-packages/ipapython/platform/fedora16/service.py", line 164, in __wait_until_running
raise RuntimeError('CA did not start in %ss' % timeout)
2013-09-07T02:29:07Z DEBUG The ipa-ca-install command failed, exception: RuntimeError: CA did not start in 120s
What looks suspicious to me is the discrepancy between checked ports & the port where the real check is happening: 2013-09-07T02:27:05Z DEBUG args=/bin/systemctl is-active pki-tomcatd.target 2013-09-07T02:27:05Z DEBUG Process finished, return code=0 2013-09-07T02:27:05Z DEBUG stdout=active 2013-09-07T02:27:05Z DEBUG stderr= 2013-09-07T02:27:05Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120 2013-09-07T02:27:07Z DEBUG Waiting until the CA is running 2013-09-07T02:27:07Z DEBUG request 'https://qe-blade-05.testrelm.com:443/ca/admin/ca/getStatus' 2013-09-07T02:27:07Z DEBUG request body '' 2013-09-07T02:27:07Z DEBUG request status 404 2013-09-07T02:27:07Z DEBUG request reason_phrase u'Not Found' 2013-09-07T02:27:07Z DEBUG request headers {'date': 'Sat, 07 Sep 2013 02:27:07 GMT', 'content-length': '219', 'content-type': 'text/html; charset=iso-8859-1', 'server': 'Apache/2.4.6 (Red Hat) mod_auth_kerb/5.4 mod_nss/2.4.4 NSS/3.15 mod_wsgi/3.4 Python/2.7.5'} 2013-09-07T02:27:07Z DEBUG request body '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>404 Not Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p>The requested URL /ca/admin/ca/getStatus was not found on this server.</p>\n</body></html>\n' 2013-09-07T02:27:07Z DEBUG The CA status is: check interrupted Petr, is this by design? 2013-09-09T11:20:17Z DEBUG stdout=Loading deployment configuration from /tmp/tmpDxJPWq. Installing CA into /var/lib/pki/pki-tomcat. Storing deployment configuration into /etc/sysconfig/pki/tomcat/pki-tomcat/ca/deployment.cfg. ========================================================================== INSTALLATION SUMMARY ========================================================================== Administrator's username: admin Administrator's PKCS #12 file: /root/.dogtag/pki-tomcat/ca_admin_cert.p12 Administrator's certificate nickname: ipa-ca-agent Administrator's certificate database: /tmp/tmp-KNcbaH To check the status of the subsystem: systemctl status pki-tomcatd\@pki-tomcat.service To restart the subsystem: systemctl restart pki-tomcatd\@pki-tomcat.service The URL for the subsystem is: https://vm-052.idm.lab.bos.redhat.com:8443/ca ========================================================================== 2013-09-09T11:20:17Z DEBUG stderr=pkispawn : WARNING ....... unable to validate security domain user/password through REST interface. Interface not available 2013-09-09T11:20:17Z DEBUG completed creating ca instance 2013-09-09T11:20:17Z DEBUG duration: 79 seconds 2013-09-09T11:20:17Z DEBUG [3/18]: stopping certificate server instance to update CS.cfg 2013-09-09T11:20:17Z DEBUG Starting external process 2013-09-09T11:20:17Z DEBUG args=/bin/systemctl stop pki-tomcatd.target 2013-09-09T11:20:19Z DEBUG Process finished, return code=0 2013-09-09T11:20:19Z DEBUG stdout= 2013-09-09T11:20:19Z DEBUG stderr= 2013-09-09T11:20:19Z DEBUG duration: 1 seconds 2013-09-09T11:20:19Z DEBUG [4/18]: disabling nonces 2013-09-09T11:20:19Z DEBUG duration: 0 seconds 2013-09-09T11:20:19Z DEBUG [5/18]: set up CRL publishing 2013-09-09T11:20:19Z DEBUG Starting external process 2013-09-09T11:20:19Z DEBUG args=/usr/sbin/selinuxenabled 2013-09-09T11:20:19Z DEBUG Process finished, return code=0 2013-09-09T11:20:19Z DEBUG stdout= 2013-09-09T11:20:19Z DEBUG stderr= 2013-09-09T11:20:19Z DEBUG Starting external process 2013-09-09T11:20:19Z DEBUG args=/usr/sbin/restorecon /var/lib/ipa/pki-ca/publish 2013-09-09T11:20:19Z DEBUG Process finished, return code=0 2013-09-09T11:20:19Z DEBUG stdout= 2013-09-09T11:20:19Z DEBUG stderr= 2013-09-09T11:20:19Z DEBUG duration: 0 seconds 2013-09-09T11:20:19Z DEBUG [6/18]: starting certificate server instance 2013-09-09T11:20:19Z DEBUG Starting external process 2013-09-09T11:20:19Z DEBUG args=/bin/systemctl start pki-tomcatd.target 2013-09-09T11:20:19Z DEBUG Process finished, return code=0 2013-09-09T11:20:19Z DEBUG stdout= 2013-09-09T11:20:19Z DEBUG stderr= 2013-09-09T11:20:19Z DEBUG Starting external process 2013-09-09T11:20:19Z DEBUG args=/bin/systemctl is-active pki-tomcatd.target 2013-09-09T11:20:19Z DEBUG Process finished, return code=0 2013-09-09T11:20:19Z DEBUG stdout=active 2013-09-09T11:20:19Z DEBUG stderr= 2013-09-09T11:20:19Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120 2013-09-09T11:22:19Z DEBUG File "/usr/lib/python2.7/site-packages/ipaserver/install/installutils.py", line 622, in run_script return_value = main_function() File "/usr/sbin/ipa-ca-install", line 186, in main config, dogtag_master_ds_port, postinstall=True) Ade, I just tried installing a 7.0 replica for RHEL-6.5 master as above and it seems to me that CA is not running: # ps ax | grep tomcat 16883 pts/0 S+ 0:00 grep --color=auto tomcat Even manual startup seems non-functional: # /bin/systemctl start pki-tomcatd.target # /bin/systemctl is-active pki-tomcatd.target active # /bin/systemctl status pki-tomcatd.target pki-tomcatd.target - PKI Tomcat Server Loaded: loaded (/usr/lib/systemd/system/pki-tomcatd.target; disabled) Active: active since Mon 2013-09-09 07:20:19 EDT; 16min ago Sep 09 07:20:19 vm-052.idm.lab.bos.redhat.com systemd[1]: Starting PKI Tomcat Server. Sep 09 07:20:19 vm-052.idm.lab.bos.redhat.com systemd[1]: Reached target PKI Tomcat Server. Sep 09 07:35:29 vm-052.idm.lab.bos.redhat.com systemd[1]: Reached target PKI Tomcat Server. Sep 09 07:36:42 vm-052.idm.lab.bos.redhat.com systemd[1]: Reached target PKI Tomcat Server. # ps ax | grep tomcat 16883 pts/0 S+ 0:00 grep --color=auto tomcat Also this line looked suspicious: 2013-09-09T11:20:17Z DEBUG stderr=pkispawn : WARNING ....... unable to validate security domain user/password through REST interface. Interface not available Can it be related? Can you please advise what could be the reason of these replica install failures? Martin, those are two different checks; ports [8080, 8443] are the internal Dogtag ports, on port 443 (HTTPS) we then ask for status via our Apache proxy. The bug happens because the pki service is not properly set up to always start when the target is started. (Usually this manifests itself when the service fails, which happens often because it can return a non-zero exit code that systemd is not set up to recognize.) This is already fixed in Dogtag upstream as https://fedorahosted.org/pki/ticket/716 Thanks for info. I suspected that this is a systemd issue. Moving to pki-ca so that they can close the bug when the upstream fix gets to 7.0. Ade, I would ask for this to be done rather sooner than later so that testing is not blocked. Upstream ticket: https://fedorahosted.org/pki/ticket/740 Still failing despite new builds: ipa-server-3.3.1-3.el7.x86_64 pki-ca-10.0.5-1.el7.noarch [root@hp-sl230sgen8-01 ~]# ipa-ca-install /tmp/replica-info-hp-sl230sgen8-01.testrelm.com.gpg Directory Manager (existing master) password: Run connection check to master Check connection from replica to remote master 'ipaqa64vmg.testrelm.com': Directory Service: Unsecure port (389): OK Directory Service: Secure port (636): OK Kerberos KDC: TCP (88): OK Kerberos Kpasswd: TCP (464): OK HTTP Server: Unsecure port (80): OK HTTP Server: Secure port (443): OK The following list of ports use UDP protocol and would need to be checked manually: Kerberos KDC: UDP (88): SKIPPED Kerberos Kpasswd: UDP (464): SKIPPED Connection from replica to master is OK. Start listening on required ports for remote master check Get credentials to log in to remote master admin password: Check SSH connection to remote master Execute check on remote master Check connection from master to remote replica 'hp-sl230sgen8-01.testrelm.com': Directory Service: Unsecure port (389): OK Directory Service: Secure port (636): OK Kerberos KDC: TCP (88): OK Kerberos KDC: UDP (88): WARNING Kerberos Kpasswd: TCP (464): OK Kerberos Kpasswd: UDP (464): WARNING HTTP Server: Unsecure port (80): OK HTTP Server: Secure port (443): OK The following UDP ports could not be verified as open: 88, 464 This can happen if they are already bound to an application and ipa-replica-conncheck cannot attach own UDP responder. Connection from master to replica is OK. Connection check OK Configuring certificate server (pki-tomcatd): Estimated time 3 minutes 30 seconds [1/18]: creating certificate server user [2/18]: configuring certificate server instance [3/18]: stopping certificate server instance to update CS.cfg [4/18]: disabling nonces [5/18]: set up CRL publishing [6/18]: starting certificate server instance Your system may be partly configured. Run /usr/sbin/ipa-server-install --uninstall to clean up. CA did not start in 120s mkosek, Your original suspicion was well founded. I have reproduced the problem and found the following: 1. dogtag does in fact come up. You can see this by doing ps -ef |grep java 2. You can also try to get to the getStatus servlets as follows: wget https://localhost:8443/ca/admin/ca/getStatus --no-check-certificate or use a browser. 3. Trying to get to the servlet through port 443, though, gives a 404. The reason for this is that the file ipa-pki-proxy.conf needs to be copied to /etc/httpd/conf.d first and httpd restarted. And there may be other httpd config to be performed -- certs to be imported? The point though is that you are trying to confirm dogtag status through the apache port when apache has not yet been configured to pass requests to dogtag. Switching bug component back to IPA. Upstream ticket: https://fedorahosted.org/freeipa/ticket/3964 Fixed upstream: master: https://fedorahosted.org/freeipa/changeset/dbfa7157ccf0c467e33f742176fc82a789c72e2e ipa-3-3: https://fedorahosted.org/freeipa/changeset/75fe2f675e24228ba21b9c9a043b51902bb44860 Verified against ipa-server-3.3.3-4.el7.x86_64 On master: Setup complete Next steps: 1. You must make sure these network ports are open: TCP Ports: * 80, 443: HTTP/HTTPS * 389, 636: LDAP/LDAPS * 88, 464: kerberos * 53: bind UDP Ports: * 88, 464: kerberos * 53: bind * 123: ntp 2. You can now obtain a kerberos ticket using the command: 'kinit admin' This ticket will allow you to use the IPA tools (e.g., ipa user-add) and the web user interface. Be sure to back up the CA certificate stored in /root/cacert.p12 This file is required to create replicas. The password for this file is the Directory Manager password [root@qe-blade-01 ~]# ipa-replica-prepare --ip-address=10.16.4.19 mgmt9.testrelm.com Directory Manager (existing master) password: On slave: [root@mgmt9 rhqa_ipa]# ipa-ca-install -p Secret123 -w Secret123 --skip-conncheck --unattended /tmp/replica-info-mgmt9.testrelm.com.gpg Configuring certificate server (pki-tomcatd): Estimated time 3 minutes 30 seconds [1/18]: creating certificate server user [2/18]: configuring certificate server instance [3/18]: stopping certificate server instance to update CS.cfg [4/18]: disabling nonces [5/18]: set up CRL publishing [6/18]: starting certificate server instance [7/18]: importing CA chain to RA certificate database [8/18]: fixing RA database permissions [9/18]: setting up signing cert profile [10/18]: set certificate subject base [11/18]: enabling Subject Key Identifier [12/18]: enabling CRL and OCSP extensions for certificates [13/18]: setting audit signing renewal to 2 years [14/18]: configuring certificate server to start on boot [15/18]: configure certmonger for renewals [16/18]: configure clone certificate renewals [17/18]: configure Server-Cert certificate renewal [18/18]: Configure HTTP to proxy connections Done configuring certificate server (pki-tomcatd). Restarting the directory and certificate servers [root@mgmt9 rhqa_ipa]# This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |
Created attachment 795045 [details] ipa ca install log file Description of problem: install is one master, and one replica After the master is installed, attempting to run ipa-ca-install on replica fails. Version-Release number of selected component (if applicable): ipa-server-3.3.1-2.el7.x86_64 How reproducible: always Steps to Reproduce: # On master: 1. /usr/sbin/ipa-server-install --setup-dns --forwarder=10.11.5.19 --hostname=qe-blade-04.testrelm.com -r TESTRELM.COM -n testrelm.com -p Secret123 -P Secret123 -a Secret123 -U 2. ipa-replica-prepare --ip-address=10.16.76.36 qe-blade-05.testrelm.com 3. copy gpg file to Replica # On Replica 4. ipa-replica-install --setup-dns /tmp/replica-info-qe-blade-05.testrelm.com.gpg 5. ipa-ca-install -p Secret123 -w Secret123 --skip-conncheck --unattended /tmp/replica-info-qe-blade-05.testrelm.com.gpg Actual results: [root@qe-blade-05 ~]# ipa-ca-install -p Secret123 -w Secret123 --skip-conncheck --unattended /tmp/replica-info-qe-blade-05.testrelm.com.gpg Configuring certificate server (pki-tomcatd): Estimated time 3 minutes 30 seconds [1/18]: creating certificate server user [2/18]: configuring certificate server instance [3/18]: stopping certificate server instance to update CS.cfg [4/18]: disabling nonces [5/18]: set up CRL publishing [6/18]: starting certificate server instance Your system may be partly configured. Run /usr/sbin/ipa-server-install --uninstall to clean up. CA did not start in 120s Expected results: Additional info: