Created attachment 1779495 [details]
Logs and command outputs
Description of problem:
When ipaupgrade runs, PKI Tomcat's CA tries to connect to 389 Directory Server's ns-slapd (LDAP) and fails with connection refused because ns-slapd happens to be restarting in that 4 second time window. CA dies at this moment. Later on, ipaupgrade tries to connect to CA and fails with internal server error, and errors out.
Version-Release number of selected component (if applicable):
ipa --version: VERSION: 4.8.7, API_VERSION: 2.239
Red Hat Enterprise Linux 8.3 (Ootpa)
Happens every time I try to run "ipactl restart". Haven't tried this on a fresh system.
Steps to Reproduce:
1. ipactl restart
Failure reported and all services are stopped.
Success reported and all services are running.
I had need to run "ipactl restart", and got this message:
IPA version error: data needs to be upgraded (expected version '4.8.7-16.module+el8.3.0+10289+b6566038', current version '4.8.7-13.module+el8.3.0+8376+0bba7131')
Automatically running upgrade, for details see /var/log/ipaupgrade.log
Be patient, this may take a few minutes.
After a little time, this completes with failure:
[Migrating certificate profiles to LDAP]
IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.
Unexpected error - see /var/log/ipaupgrade.log for details:
RemoteRetrieveError: Failed to authenticate to CA REST API
2021-05-04T18:17:39Z DEBUG SUCCESS: port: 8443
2021-05-04T18:17:39Z DEBUG Start of email@example.com complete
2021-05-04T18:17:40Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv@SEMI-CASHSTAR-NET.service']
2021-05-04T18:17:42Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv@SEMI-CASHSTAR-NET.service']
2021-05-04T18:17:45Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv@SEMI-CASHSTAR-NET.service']
2021-05-04T18:17:55Z DEBUG request GET https://ipa01.semi.cashstar.net:8443/ca/rest/account/login
2021-05-04T18:17:55Z DEBUG request body ''
2021-05-04T18:17:55Z DEBUG response status 500
CA subsystem unavailable. Check CA debug log.
2021-05-04 18:17:43 [main] INFO: Initializing CA subsystem
2021-05-04 18:17:43 [main] FINE: LdapBoundConnection: Connecting to ipa01.semi.cashstar.net:636 with client cert auth
2021-05-04 18:17:43 [main] SEVERE: Unable to create socket: java.net.ConnectException: Connection refused (Connection refused)
2021-05-04 18:17:43 [main] SEVERE: Shutting down.
2021-05-04 18:17:43 [main] INFO: Shutting down CA subsystem
journalctl --unit dirsrv@SEMI-CASHSTAR-NET.service
May 04 18:17:42 ipa01.semi.cashstar.net systemd: Starting 389 Directory Server SEMI-CASHSTAR-NET....
May 04 18:17:44 ipa01.semi.cashstar.net ns-slapd: [04/May/2021:18:17:44.029650333 +0000] - INFO - slapd_extract_cert - CA CERT NAME: SEMI.CASHSTAR.NET IPA CA
I ran this to verify:
while true; do date; netstat -pan | egrep "636|389" | grep " LISTEN "; sleep 1; done
Tue May 4 18:17:40 UTC 2021
tcp6 0 0 :::636 :::* LISTEN 893452/ns-slapd
tcp6 0 0 :::389 :::* LISTEN 893452/ns-slapd
Tue May 4 18:17:41 UTC 2021
Tue May 4 18:17:42 UTC 2021
Tue May 4 18:17:43 UTC 2021
Tue May 4 18:17:44 UTC 2021
Tue May 4 18:17:45 UTC 2021
tcp6 0 0 :::636 :::* LISTEN 893792/ns-slapd
tcp6 0 0 :::389 :::* LISTEN 893792/ns-slapd
The LDAP directory server ns-slapd is not listening on port 636 at 18:17:43 because dirsrv@SEMI-CASHSTAR-NET.service is still starting up at that time. And at that moment, CA tries to connect to ns-slapd on port 636, the connection is refused, and CA shuts down. And then 12 seconds later, ipa-server-upgrade tries to connect to CA and gets an Internal Server Error response. I think PKI Tomcat, which got started by ipa-server-upgrade at 18:17:39, just so happened to start CA at 18:17:43, which is right in the middle of the wrong time. This looks like a race condition. A race condition which is reliably happening every time for me.
I figured out a workaround that works on my system.
Edit this file:
Near the top, add this:
Find this line:
logger.info('[Verifying that KDC configuration is using ipa-kdb backend]')
And add this right afterword:
logger.info('But first, I sleep so that I don\'t stop 389 Directory Server LDAP ns-slapd while PKI Tomcat CA is trying to use it.')
logger.info('I have woken.')
Save your changes, and run "ipactl restart" (or maybe ipa-server-upgrade would work).
On my system, 20 seconds worked, and 4 was not enough. The idea is that "systemctl stop dirsrv@SEMI-CASHSTAR-NET.service" should not run until "[main] INFO: CA subsystem started" shows up in the ca debug log. And waiting 20 seconds is one dirty way to get it done.