Bug 1980810

Summary: glibc: Trigger systemd re-exec on updates to help CI test systems
Product: Red Hat Enterprise Linux 9 Reporter: Kaleem <ksiddiqu>
Component: glibcAssignee: glibc team <glibc-bugzilla>
Status: CLOSED CURRENTRELEASE QA Contact: Sergey Kolosov <skolosov>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0CC: ashankar, codonell, dj, edewata, frenaud, fweimer, glibc-bugzilla, jorton, mnewsome, pfrankli, sipoyare, skolosov
Target Milestone: betaKeywords: FutureFeature, ImprovesTestability, ManyUsersImpacted, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.33.9000-42.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-07 21:42:00 UTC Type: Enhancement
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1980039    
Bug Blocks:    

Description Kaleem 2021-07-09 15:12:56 UTC
Description of problem:
With latest IPA/PKI-core buids, dogtag instance creation failing during IPA install

Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes
  [1/28]: configuring certificate server instance
Failed to configure CA instance
See the installation logs and the following files/directories for more information:
  /var/log/pki/pki-tomcat
  [error] RuntimeError: CA configuration failed.
CA configuration failed.
The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information
[root@master ~]#


Version-Release number of selected component (if applicable):
[root@master ~]# rpm -q ipa-server pki-ca
ipa-server-4.9.6-1.el9.x86_64
pki-ca-11.0.0-0.3.alpha1.el9.noarch
[root@master ~]#

How reproducible:
Always

Steps to Reproduce:
1. IPA install with latest nightly build

Actual results:
IPA install fails

Expected results:
IPA install should be successful

Additional info:

[root@master pki]# systemctl status pki-tomcatd
× pki-tomcatd - PKI Tomcat Server pki-tomcat
     Loaded: loaded (/usr/lib/systemd/system/pki-tomcatd@.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Fri 2021-07-09 10:46:12 EDT; 8min ago
    Process: 24185 ExecStartPre=/usr/sbin/pki-server upgrade pki-tomcat (code=exited, status=217/USER)
        CPU: 2ms

Jul 09 10:46:12 master.testrealm.test systemd[1]: Starting PKI Tomcat Server pki-tomcat...
Jul 09 10:46:12 master.testrealm.test systemd[24185]: pki-tomcatd: Failed to determine user credentials: No such process
Jul 09 10:46:12 master.testrealm.test systemd[24185]: pki-tomcatd: Failed at step USER spawning /usr/sbin/pki-server: No such process
Jul 09 10:46:12 master.testrealm.test systemd[1]: pki-tomcatd: Control process exited, code=exited, status=217/USER
Jul 09 10:46:12 master.testrealm.test systemd[1]: pki-tomcatd: Failed with result 'exit-code'.
Jul 09 10:46:12 master.testrealm.test systemd[1]: Failed to start PKI Tomcat Server pki-tomcat.
[root@master pki]#

@fcami told about https://www.dogtagpki.org/wiki/Systemd but this did not helped as it happened during IPA install.

Comment 1 Florence Blanc-Renaud 2021-07-09 15:36:51 UTC
The installation was succeeding with RHEL-9.0.0-20210707.2 but is failing with RHEL-9.0.0-20210708.3

The updated packages between the 2 versions are:
crypto-policies
dnf
filesystem
glibc
libnl3
libtirpc
numactl-libs
perl-Text-Tabs+Wrap
python3-dnf
python3-dnf-plugins-core
slapi-nis
yum
yum-utils

Hope it helps narrowing the investigations.

Comment 2 Florence Blanc-Renaud 2021-07-09 19:34:22 UTC
The issue comes from the latest glibc version (glibc-2.33.9000-36.el9.x86_64).
Downgrading to glibc-common-2.33-14.el9.x86_64 fixes the installation issue.

Comment 3 Florian Weimer 2021-07-09 19:47:05 UTC
Could you provide more information on what actually fails? How can we reproduce this? Thanks.

Comment 4 Florence Blanc-Renaud 2021-07-09 20:36:19 UTC
The failure happens during IPA server installation. In order to reproduce:
dnf install ipa-server ipa-server-dns
ipa-server-install —domain ipa.test —realm IPA.TEST —setup-dns —auto-forwarder -a Secret123 -p Secret123 -U

The installation fails while configuring the Certificate server. ipa-server-install is internally calling pkispawn, that sets up the Certificate server and restarts it with systemctl start pki-tomcatd@pki-tomcat.

Edewata may provide more details on the exact step failing when starting pki, based on the journal logs:
Jul 09 10:46:12 master.testrealm.test systemd[24185]: pki-tomcatd: Failed to determine user credentials: No such process
Jul 09 10:46:12 master.testrealm.test systemd[24185]: pki-tomcatd: Failed at step USER spawning /usr/sbin/pki-server: No such process

Unfortunately I’m not familiar enough with pki internals to point to the exact call that is failing.

Comment 5 Endi Sukma Dewata 2021-07-09 20:43:53 UTC
I don't have any info about this particular installation scenario.
Could someone provide the pkispawn log and CA/KRA debug log?

Comment 6 Carlos O'Donell 2021-07-09 21:22:58 UTC
(In reply to Florence Blanc-Renaud from comment #4)
> The failure happens during IPA server installation. In order to reproduce:
> dnf install ipa-server ipa-server-dns
> ipa-server-install —domain ipa.test —realm IPA.TEST —setup-dns
> —auto-forwarder -a Secret123 -p Secret123 -U
> 
> The installation fails while configuring the Certificate server.
> ipa-server-install is internally calling pkispawn, that sets up the
> Certificate server and restarts it with systemctl start
> pki-tomcatd@pki-tomcat.

With the -36 glibc...

rpm -qa | grep glibc
glibc-langpack-en-2.33.9000-36.el9.x86_64
glibc-common-2.33.9000-36.el9.x86_64
glibc-2.33.9000-36.el9.x86_64
glibc-gconv-extra-2.33.9000-36.el9.x86_64

And with compose RHEL-9.0.0-20210708.3 the commands you state work.

ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns --auto-forwarder -a Secret123 -p Secret123 -U
...
Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes
  [1/28]: configuring certificate server instance
  [2/28]: stopping certificate server instance to update CS.cfg
  [3/28]: backing up CS.cfg
  [4/28]: Add ipa-pki-wait-running
  [5/28]: secure AJP connector
  [6/28]: reindex attributes
  [7/28]: exporting Dogtag certificate store pin
  [8/28]: disabling nonces
  [9/28]: set up CRL publishing
  [10/28]: enable PKIX certificate path discovery and validation
  [11/28]: authorizing RA to modify profiles
  [12/28]: authorizing RA to manage lightweight CAs
  [13/28]: Ensure lightweight CAs container exists
  [14/28]: starting certificate server instance
  [15/28]: configure certmonger for renewals
  [16/28]: requesting RA certificate from CA
  [17/28]: publishing the CA certificate
  [18/28]: adding RA agent as a trusted user
  [19/28]: configure certificate renewals
  [20/28]: Configure HTTP to proxy connections
  [21/28]: updating IPA configuration
  [22/28]: enabling CA instance
  [23/28]: importing IPA certificate profiles
  [24/28]: migrating certificate profiles to LDAP
  [25/28]: adding default CA ACL
  [26/28]: adding 'ipa' CA entry
  [27/28]: configuring certmonger renewal for lightweight CAs
  [28/28]: deploying ACME service
Done configuring certificate server (pki-tomcatd).
...
==============================================================================
Setup complete

Next steps:
	1. You must make sure these network ports are open:
		TCP Ports:
		  * 80, 443: HTTP/HTTPS
		  * 389, 636: LDAP/LDAPS
		  * 88, 464: kerberos
		  * 53: bind
		UDP Ports:
		  * 88, 464: kerberos
		  * 53: bind
		  * 123: ntp

	2. You can now obtain a kerberos ticket using the command: 'kinit admin'
	   This ticket will allow you to use the IPA tools (e.g., ipa user-add)
	   and the web user interface.

Be sure to back up the CA certificates stored in /root/cacert.p12
These files are required to create replicas. The password for these
files is the Directory Manager password
The ipa-server-install command was successful
[root@intel-chiefriver-04 ~]# echo $?
0

I don't have an experience here with this software so we're relying on you to provide guidance.

Comment 7 Florian Weimer 2021-07-09 21:28:50 UTC
Actual command line: ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns --auto-forwarder -a Secret123 -p Secret123 -U

Same for me: Installation succeeded for me on s390x. And three times on x86-64 (on different systems).

Comment 8 Florence Blanc-Renaud 2021-07-11 13:37:55 UTC
Here are my steps:
1minutetip 1MT-RHEL-9.0.0-20210706.3-Beta

# dnf install -y ipa-server ipa-server-dns
# ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns --auto-forwarder -a Secret123 -p Secret123 -U
=> installation was successful

# ipa-server-install --uninstall -U

Add repo for RHEL-9.0.0-20210708.3-Beta:
# cat /etc/yum.repos.d/faulty.repo
[rhel-latest]
name=rhel-latest
baseurl=http://download.eng.rdu2.redhat.com/rhel-9/nightly/RHEL-9-Beta/RHEL-9.0.0-20210708.3/compose/BaseOS/x86_64/os/
enabled=1
gpgcheck=0

[rhel-AppStream-latest]
name=rhel-AppStream-latest
baseurl=http://download.eng.rdu2.redhat.com/rhel-9/nightly/RHEL-9-Beta/RHEL-9.0.0-20210708.3/compose/AppStream/x86_64/os/
enabled=1
gpgcheck=0

# dnf update -y glibc
[...]
Upgraded:
  glibc-2.33.9000-36.el9.x86_64                          glibc-common-2.33.9000-36.el9.x86_64                          glibc-langpack-en-2.33.9000-36.el9.x86_64                         
Installed:
  glibc-gconv-extra-2.33.9000-36.el9.x86_64                                                                                                                                                

Complete!
# ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns --auto-forwarder -a Secret123 -p Secret123 -U
[...]
Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes
  [1/28]: configuring certificate server instance
Failed to configure CA instance
See the installation logs and the following files/directories for more information:
  /var/log/pki/pki-tomcat
  [error] RuntimeError: CA configuration failed.
CA configuration failed.
The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information


Uploading the logs files (ipaserver-install.log, pki-ca-spawn.log and journal)

Comment 10 Florian Weimer 2021-07-11 14:01:54 UTC
(In reply to Florence Blanc-Renaud from comment #8)
> # dnf update -y glibc

> # ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns

Is there a reboot between the two steps?

If not, this is probably the cause. It's conceptually similar to bug 1927040. However, we are not treating this instance here as a bug because you are upgrading across different major releases of glibc, which is currently not supportable without a reboot.

Comment 11 Joe Orton 2021-07-12 09:12:58 UTC
We are seeing a significant number of tests failing in CI across our team with similar systemd errors to comment 4 at the moment.

Comment 13 Pavel Zhukov 2021-07-12 09:20:49 UTC
(In reply to Florian Weimer from comment #10)
> (In reply to Florence Blanc-Renaud from comment #8)
> > # dnf update -y glibc
> 
> > # ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns
> 
> Is there a reboot between the two steps?
> 
> If not, this is probably the cause. It's conceptually similar to bug
> 1927040. However, we are not treating this instance here as a bug because
> you are upgrading across different major releases of glibc, which is
> currently not supportable without a reboot.

The problem is glibc is pulled by packages as a dependencies in CI without reboot which breaks all CI tests (for example cyrus-imapd).

Comment 14 Florian Weimer 2021-07-12 09:39:29 UTC
I was under the impression that buildroot and compose were now in sync, so if the buildroot contains the new glibc, the compose should as well.

The new glibc is actually in some of the composes (I checked a nightly compose recently and it was already included). Which CI systems suffer from this, and how can we make sure they use a current compose?

Comment 21 Florian Weimer 2021-07-12 12:10:00 UTC
Thanks for reporting back. I'm closing this bug because it's not a glibc issue, it's an infrastructure problem.

Comment 22 Florian Weimer 2021-07-13 12:16:24 UTC
We can call systemctl daemon-reexec during package updates. This should help a little bit.

Comment 24 Florian Weimer 2021-07-13 13:47:57 UTC
Note to QE: Successful re-exec can be verified by checking the inode number in “lsof -p 1” against “ls -li /usr/lib64/libc.so.6”. They should match.