RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1956991 - During ipa-server-upgrade, CA tries to connect to ns-slapd right after directory server is restarted, and fails
Summary: During ipa-server-upgrade, CA tries to connect to ns-slapd right after direct...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: ipa
Version: 8.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: beta
: ---
Assignee: Florence Blanc-Renaud
QA Contact: ipa-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-04 19:49 UTC by Paul Nickerson
Modified: 2022-11-04 07:27 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-04 07:27:47 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs and command outputs (304.35 KB, application/x-xz)
2021-05-04 19:49 UTC, Paul Nickerson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FREEIPA-7246 0 None None None 2021-11-08 13:17:45 UTC

Description Paul Nickerson 2021-05-04 19:49:48 UTC
Created attachment 1779495 [details]
Logs and command outputs

Description of problem:
When ipaupgrade runs, PKI Tomcat's CA tries to connect to 389 Directory Server's ns-slapd (LDAP) and fails with connection refused because ns-slapd happens to be restarting in that 4 second time window. CA dies at this moment. Later on, ipaupgrade tries to connect to CA and fails with internal server error, and errors out.

Version-Release number of selected component (if applicable):
ipa --version: VERSION: 4.8.7, API_VERSION: 2.239
Red Hat Enterprise Linux 8.3 (Ootpa)

How reproducible:
Happens every time I try to run "ipactl restart". Haven't tried this on a fresh system.

Steps to Reproduce:
1. ipactl restart

Actual results:
Failure reported and all services are stopped.

Expected results:
Success reported and all services are running.

Additional info:

I had need to run "ipactl restart", and got this message:
IPA version error: data needs to be upgraded (expected version '4.8.7-16.module+el8.3.0+10289+b6566038', current version '4.8.7-13.module+el8.3.0+8376+0bba7131')
Automatically running upgrade, for details see /var/log/ipaupgrade.log
Be patient, this may take a few minutes.

After a little time, this completes with failure:
[Migrating certificate profiles to LDAP]
IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.
Unexpected error - see /var/log/ipaupgrade.log for details:
RemoteRetrieveError: Failed to authenticate to CA REST API

/var/log/ipaupgrade.log
2021-05-04T18:17:39Z DEBUG SUCCESS: port: 8443
2021-05-04T18:17:39Z DEBUG Start of pki-tomcatd complete
2021-05-04T18:17:40Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv']
2021-05-04T18:17:42Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv']
2021-05-04T18:17:45Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv']
2021-05-04T18:17:55Z DEBUG request GET https://ipa01.semi.cashstar.net:8443/ca/rest/account/login
2021-05-04T18:17:55Z DEBUG request body ''
2021-05-04T18:17:55Z DEBUG response status 500
CA subsystem unavailable. Check CA debug log.

/var/log/pki/pki-tomcat/ca/debug.2021-05-04.log
2021-05-04 18:17:43 [main] INFO: Initializing CA subsystem
2021-05-04 18:17:43 [main] FINE: LdapBoundConnection: Connecting to ipa01.semi.cashstar.net:636 with client cert auth
2021-05-04 18:17:43 [main] SEVERE: Unable to create socket: java.net.ConnectException: Connection refused (Connection refused)
2021-05-04 18:17:43 [main] SEVERE: Shutting down.
2021-05-04 18:17:43 [main] INFO: Shutting down CA subsystem

journalctl --unit dirsrv
May 04 18:17:42 ipa01.semi.cashstar.net systemd[1]: Starting 389 Directory Server SEMI-CASHSTAR-NET....
May 04 18:17:44 ipa01.semi.cashstar.net ns-slapd[893792]: [04/May/2021:18:17:44.029650333 +0000] - INFO - slapd_extract_cert - CA CERT NAME: SEMI.CASHSTAR.NET IPA CA

I ran this to verify:
while true; do date; netstat -pan | egrep "636|389" | grep " LISTEN "; sleep 1; done
Tue May  4 18:17:40 UTC 2021
tcp6       0      0 :::636                  :::*                    LISTEN      893452/ns-slapd     
tcp6       0      0 :::389                  :::*                    LISTEN      893452/ns-slapd     
Tue May  4 18:17:41 UTC 2021
Tue May  4 18:17:42 UTC 2021
Tue May  4 18:17:43 UTC 2021
Tue May  4 18:17:44 UTC 2021
Tue May  4 18:17:45 UTC 2021
tcp6       0      0 :::636                  :::*                    LISTEN      893792/ns-slapd     
tcp6       0      0 :::389                  :::*                    LISTEN      893792/ns-slapd     

The LDAP directory server ns-slapd is not listening on port 636 at 18:17:43 because dirsrv is still starting up at that time. And at that moment, CA tries to connect to ns-slapd on port 636, the connection is refused, and CA shuts down. And then 12 seconds later, ipa-server-upgrade tries to connect to CA and gets an Internal Server Error response. I think PKI Tomcat, which got started by ipa-server-upgrade at 18:17:39, just so happened to start CA at 18:17:43, which is right in the middle of the wrong time. This looks like a race condition. A race condition which is reliably happening every time for me.

Comment 1 Paul Nickerson 2021-05-04 20:34:03 UTC
I figured out a workaround that works on my system.

Edit this file:
/usr/lib/python3.6/site-packages/ipaserver/install/server/upgrade.py

Near the top, add this:
import time

Find this line:
    logger.info('[Verifying that KDC configuration is using ipa-kdb backend]')
And add this right afterword:
    logger.info('But first, I sleep so that I don\'t stop 389 Directory Server LDAP ns-slapd while PKI Tomcat CA is trying to use it.')
    time.sleep(20)
    logger.info('I have woken.')

Save your changes, and run "ipactl restart" (or maybe ipa-server-upgrade would work).

On my system, 20 seconds worked, and 4 was not enough. The idea is that "systemctl stop dirsrv" should not run until "[main] INFO: CA subsystem started" shows up in the ca debug log. And waiting 20 seconds is one dirty way to get it done.

Comment 6 sjansen 2021-12-16 17:50:23 UTC
I can confirm this issue on several installations, after latest IPA server update (RHEL 8.5 and also on some clones) the CA is dead after an upgrade. Web GUI "show Unable to communicate with CMS (403)"

Comment 7 Florence Blanc-Renaud 2022-04-06 17:31:04 UTC
(In reply to sjansen from comment #6)
> I can confirm this issue on several installations, after latest IPA server
> update (RHEL 8.5 and also on some clones) the CA is dead after an upgrade.
> Web GUI "show Unable to communicate with CMS (403)"

In your specific case, the error may be related to a different issue. Can you check which version of pki-servlet-engine is installed on your system + the content of /etc/pki/pki-tomcat/server.xml? You will find more information in https://access.redhat.com/solutions/6632811, but in short the update to RHEL 8.5 may have updated the file /etc/pki/pki-tomcat/server.xml while it should not.
With pki-servlet-engine-9.0.30 and below: /etc/pki/pki-tomcat/server.xml should contain requiredSecret=xxx in the section defining the AJP connector.
With pki-servlet-engine-9.0.31 and above: /etc/pki/pki-tomcat/server.xml should contain secret=xxx
This is already tracked in BZ #2029023 and BZ #2061458.

The issue described in this bugzilla is different and related to the startup of services and timeouts.

Comment 8 sjansen 2022-04-22 10:10:12 UTC
(In reply to Florence Blanc-Renaud from comment #7)

> In your specific case, the error may be related to a different issue. Can
> you check which version of pki-servlet-engine is installed on your system +
> the content of /etc/pki/pki-tomcat/server.xml? You will find more
> information in https://access.redhat.com/solutions/6632811, but in short the
> update to RHEL 8.5 may have updated the file /etc/pki/pki-tomcat/server.xml
> while it should not.
> With pki-servlet-engine-9.0.30 and below: /etc/pki/pki-tomcat/server.xml
> should contain requiredSecret=xxx in the section defining the AJP connector.
> With pki-servlet-engine-9.0.31 and above: /etc/pki/pki-tomcat/server.xml
> should contain secret=xxx
> This is already tracked in BZ #2029023 and BZ #2061458.
> 
> The issue described in this bugzilla is different and related to the startup
> of services and timeouts.

Hi Florance,

i am suprised i got tagged for this issue because its quite some time ago. Looks like this issue got moved around a bit, my orignal issue was in 2029023. From my side the issue got solved in 2029023 by editing server.xml to remove auth, it never reappeared on any of my freeipa servers or i never got another pki update since i changed server.xml.

Comment 10 RHEL Program Management 2022-11-04 07:27:47 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.