Bug 1956991 - During ipa-server-upgrade, CA tries to connect to ns-slapd right after directory server is restarted, and fails
Summary: During ipa-server-upgrade, CA tries to connect to ns-slapd right after direct...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: ipa
Version: 8.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: beta
: ---
Assignee: Thomas Woerner
QA Contact: ipa-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-04 19:49 UTC by Paul Nickerson
Modified: 2021-05-18 01:14 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
Logs and command outputs (304.35 KB, application/x-xz)
2021-05-04 19:49 UTC, Paul Nickerson
no flags Details

Description Paul Nickerson 2021-05-04 19:49:48 UTC
Created attachment 1779495 [details]
Logs and command outputs

Description of problem:
When ipaupgrade runs, PKI Tomcat's CA tries to connect to 389 Directory Server's ns-slapd (LDAP) and fails with connection refused because ns-slapd happens to be restarting in that 4 second time window. CA dies at this moment. Later on, ipaupgrade tries to connect to CA and fails with internal server error, and errors out.

Version-Release number of selected component (if applicable):
ipa --version: VERSION: 4.8.7, API_VERSION: 2.239
Red Hat Enterprise Linux 8.3 (Ootpa)

How reproducible:
Happens every time I try to run "ipactl restart". Haven't tried this on a fresh system.

Steps to Reproduce:
1. ipactl restart

Actual results:
Failure reported and all services are stopped.

Expected results:
Success reported and all services are running.

Additional info:

I had need to run "ipactl restart", and got this message:
IPA version error: data needs to be upgraded (expected version '4.8.7-16.module+el8.3.0+10289+b6566038', current version '4.8.7-13.module+el8.3.0+8376+0bba7131')
Automatically running upgrade, for details see /var/log/ipaupgrade.log
Be patient, this may take a few minutes.

After a little time, this completes with failure:
[Migrating certificate profiles to LDAP]
IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.
Unexpected error - see /var/log/ipaupgrade.log for details:
RemoteRetrieveError: Failed to authenticate to CA REST API

/var/log/ipaupgrade.log
2021-05-04T18:17:39Z DEBUG SUCCESS: port: 8443
2021-05-04T18:17:39Z DEBUG Start of pki-tomcatd@pki-tomcat.service complete
2021-05-04T18:17:40Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv@SEMI-CASHSTAR-NET.service']
2021-05-04T18:17:42Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv@SEMI-CASHSTAR-NET.service']
2021-05-04T18:17:45Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv@SEMI-CASHSTAR-NET.service']
2021-05-04T18:17:55Z DEBUG request GET https://ipa01.semi.cashstar.net:8443/ca/rest/account/login
2021-05-04T18:17:55Z DEBUG request body ''
2021-05-04T18:17:55Z DEBUG response status 500
CA subsystem unavailable. Check CA debug log.

/var/log/pki/pki-tomcat/ca/debug.2021-05-04.log
2021-05-04 18:17:43 [main] INFO: Initializing CA subsystem
2021-05-04 18:17:43 [main] FINE: LdapBoundConnection: Connecting to ipa01.semi.cashstar.net:636 with client cert auth
2021-05-04 18:17:43 [main] SEVERE: Unable to create socket: java.net.ConnectException: Connection refused (Connection refused)
2021-05-04 18:17:43 [main] SEVERE: Shutting down.
2021-05-04 18:17:43 [main] INFO: Shutting down CA subsystem

journalctl --unit dirsrv@SEMI-CASHSTAR-NET.service
May 04 18:17:42 ipa01.semi.cashstar.net systemd[1]: Starting 389 Directory Server SEMI-CASHSTAR-NET....
May 04 18:17:44 ipa01.semi.cashstar.net ns-slapd[893792]: [04/May/2021:18:17:44.029650333 +0000] - INFO - slapd_extract_cert - CA CERT NAME: SEMI.CASHSTAR.NET IPA CA

I ran this to verify:
while true; do date; netstat -pan | egrep "636|389" | grep " LISTEN "; sleep 1; done
Tue May  4 18:17:40 UTC 2021
tcp6       0      0 :::636                  :::*                    LISTEN      893452/ns-slapd     
tcp6       0      0 :::389                  :::*                    LISTEN      893452/ns-slapd     
Tue May  4 18:17:41 UTC 2021
Tue May  4 18:17:42 UTC 2021
Tue May  4 18:17:43 UTC 2021
Tue May  4 18:17:44 UTC 2021
Tue May  4 18:17:45 UTC 2021
tcp6       0      0 :::636                  :::*                    LISTEN      893792/ns-slapd     
tcp6       0      0 :::389                  :::*                    LISTEN      893792/ns-slapd     

The LDAP directory server ns-slapd is not listening on port 636 at 18:17:43 because dirsrv@SEMI-CASHSTAR-NET.service is still starting up at that time. And at that moment, CA tries to connect to ns-slapd on port 636, the connection is refused, and CA shuts down. And then 12 seconds later, ipa-server-upgrade tries to connect to CA and gets an Internal Server Error response. I think PKI Tomcat, which got started by ipa-server-upgrade at 18:17:39, just so happened to start CA at 18:17:43, which is right in the middle of the wrong time. This looks like a race condition. A race condition which is reliably happening every time for me.

Comment 1 Paul Nickerson 2021-05-04 20:34:03 UTC
I figured out a workaround that works on my system.

Edit this file:
/usr/lib/python3.6/site-packages/ipaserver/install/server/upgrade.py

Near the top, add this:
import time

Find this line:
    logger.info('[Verifying that KDC configuration is using ipa-kdb backend]')
And add this right afterword:
    logger.info('But first, I sleep so that I don\'t stop 389 Directory Server LDAP ns-slapd while PKI Tomcat CA is trying to use it.')
    time.sleep(20)
    logger.info('I have woken.')

Save your changes, and run "ipactl restart" (or maybe ipa-server-upgrade would work).

On my system, 20 seconds worked, and 4 was not enough. The idea is that "systemctl stop dirsrv@SEMI-CASHSTAR-NET.service" should not run until "[main] INFO: CA subsystem started" shows up in the ca debug log. And waiting 20 seconds is one dirty way to get it done.


Note You need to log in before you can comment on or make changes to this bug.