Red Hat Bugzilla – Bug 1596467
IPA upgrade fails for latest ipa package
Last modified: 2018-10-30 06:15:20 EDT
Description of problem: IPA upgrade fails for latest ipa package, when upgraded from RHEL74z (ipa-server-4.5.0-22.el7_4.x86_64) Version-Release number of selected component (if applicable): ipa-server-4.5.0-22.el7_4.x86_64 kernel-3.10.0-693.35.1.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Setup IPA-server at RHEL74z version 2. Setup repo for RHEL76 and upgrade ipa server using command "yum -y update ipa-server* sssd*" (in my case ipa-server-4.6.4-2.el7.x86_64) 3. Check ipa server logs at /var/log/ipaupgrade.log. Actual results: IPA upgrade fails # yum -y update ipa-server* sssd* (console) Job for ipa.service failed because the control process exited with error code. See "systemctl status ipa.service" and "journalctl -xe" for details. warning: %posttrans(ipa-server-4.6.4-2.el7.x86_64) scriptlet failed, exit status 1 Non-fatal POSTTRANS scriptlet failure in rpm package ipa-server-4.6.4-2.el7.x86_64 # tail /var/log/ipaupgrade.log 2018-06-29T04:27:44Z DEBUG The CA status is: check interrupted due to error: Retrieving CA status failed with status 500 2018-06-29T04:27:44Z DEBUG Waiting for CA to start... 2018-06-29T04:27:45Z ERROR IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually. 2018-06-29T04:27:45Z DEBUG File "/usr/lib/python2.7/site-packages/ipapython/admintool.py", line 174, in execute return_value = self.run() File "/usr/lib/python2.7/site-packages/ipaserver/install/ipa_server_upgrade.py", line 54, in run raise admintool.ScriptError(str(e)) 2018-06-29T04:27:45Z DEBUG The ipa-server-upgrade command failed, exception: ScriptError: CA did not start in 300.0s 2018-06-29T04:27:45Z ERROR CA did not start in 300.0s 2018-06-29T04:27:45Z ERROR The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information [root@idm-qe-02 ca]# # tail /var/log/pki/ pki-ca-spawn.20180628235426.log pki-server-upgrade-10.5.9.log pki-upgrade-10.5.9.log pki-server-upgrade-10.4.1.log pki-tomcat/ server/ [root@idm-qe-02 ca]# tail /var/log/pki/pki-tomcat/ca/debug at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:652) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:679) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1966) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [29/Jun/2018:00:22:47][localhost-startStop-1]: CMS.start(): shutdown server [29/Jun/2018:00:22:47][localhost-startStop-1]: CMSEngine.shutdown() # [root@idm-qe-02 ca]# journalctl -xe Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at com.netscape.cms.tomcat.ProxyRealm.backgroundProcess(ProxyRea Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase.backgroundProcess(Cont Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.StandardContext.backgroundProcess(St Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:07 idm-qe-02.testrelm.test server[28432]: at java.lang.Thread.run(Thread.java:748) Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: WARNING: Exception processing realm com.netscape.cms.tomcat.Prox Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: javax.ws.rs.ServiceUnavailableException: Subsystem unavailable Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at com.netscape.cms.tomcat.ProxyRealm.backgroundProcess(ProxyRea Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase.backgroundProcess(Cont Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.StandardContext.backgroundProcess(St Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:17 idm-qe-02.testrelm.test server[28432]: at java.lang.Thread.run(Thread.java:748) Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: WARNING: Exception processing realm com.netscape.cms.tomcat.Prox Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: javax.ws.rs.ServiceUnavailableException: Subsystem unavailable Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at com.netscape.cms.tomcat.ProxyRealm.backgroundProcess(ProxyRea Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase.backgroundProcess(Cont Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.StandardContext.backgroundProcess(St Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at org.apache.catalina.core.ContainerBase$ContainerBackgroundPro Jun 29 00:32:27 idm-qe-02.testrelm.test server[28432]: at java.lang.Thread.run(Thread.java:748) lines 1117-1144/1144 (END) Expected results: ipa-upgrade should be successful and no error should be observed Additional info: The ipa upgrade is successful from: 1. RHEL 71z > RHEL 76 2. RHEL 72z > RHEL 76 3. RHEL 73z > RHEL 76 4. RHEL 75z > RHEL 76
Similar issue is observed when upgraded from RHEL 70 (ipa-server-3.3.3-28.el7.x86_64) > RHEL 76 ---------------------- Job for ipa.service failed because the control process exited with error code. See "systemctl status ipa.service" and "journalctl -xe" for details. warning: %posttrans(ipa-server-4.6.4-2.el7.x86_64) scriptlet failed, exit status 1 Non-fatal POSTTRANS scriptlet failure in rpm package ipa-server-4.6.4-2.el7.x86_64 --------------------------
Same failure observed in linear upgrade: Path : 70z > 71z >72z >73z > 740-day > 74update1 > 74update2 > 74z > 75 > 75z> 76 Where upgrading from 75z to 76 fails.
Hi, the upgrade seems to fail because PKI did not restart. The debug log shows authentication issues from Dogtag to 389-ds: Could not connect to LDAP server host idm-qe-02.testrelm.test port 636 Error netscape.ldap.LDAPException: Authentication failed (48) Can you check the steps detailed in https://floblanc.wordpress.com/2017/09/11/troubleshooting-freeipa-pki-tomcatd-fails-to-start/ or provide access to the machine?
Created attachment 1459265 [details] stacktrace of wedged ns-slapd process Looks like ns-slapd is wedged. The process has been running since Jul 9, the date of the bug report. Fortunately systemctl stop dirsrv@TESTRELM-TEST failed to kill the process so I was able to get a trace.
- At some point CA failed to start 2018-07-09T13:59:38Z DEBUG Waiting for CA to start... 2018-07-09T13:59:39Z DEBUG Waiting for CA to start... ... 2018-07-09T14:04:29Z ERROR CA did not start in 300.0s - Because being unable to establish a SSL connection [09/Jul/2018:19:29:37.529233748 +051800] conn=4 fd=64 slot=64 SSL connection from 10.65.206.145 to 10.65.206.145 [09/Jul/2018:19:29:37.562014317 +051800] conn=4 SSL failed to obtain channel info; Netscape Portable Runtime error -8187 (security library: invalid arguments.) [09/Jul/2018:19:29:37.608394481 +051800] conn=4 op=0 BIND dn="" method=sasl version=3 mech=EXTERNAL [09/Jul/2018:19:29:37.609818574 +051800] conn=4 op=0 RESULT err=48 tag=97 nentries=0 etime=0.0071245337 - The version of NSS is nss-3.28.4-15.el7_4.x86_64 - DS 1.3.8 requires NSS >= 3.34 But 3.28 installed nss-3.28.4-15.el7_4.x86_64 389-ds-base-1.3.8.4-3.el7.x86_64 - It is looking like it is hitting https://bugzilla.redhat.com/show_bug.cgi?id=1538740 Next steps: - need to check how 389-ds-base-1.3.8.4-3.el7.x86_64 can install without updating nss version to >= 3.34
This bug is fixed in 389-ds-base-1.3.8.4-5.el7 (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=726515) There is an added dependency on NSS >= 3.34
Actually 389-ds-base-1.3.8.4-5.el7 introduced a regression for IPA because of unhashed#user#password being no longer generated/logged. The build containing the fix for NSS >= 3.34 is available without 1.3.8.4-5.el7
Verified. Version :: Starting: [root@rhel7-1 ~]# rpm -q ipa-server 389-ds-base ipa-server-4.5.0-22.el7_4.x86_64 389-ds-base-1.3.6.1-29.el7_4.x86_64 Ending: Results :: Started with 7.4z. [root@rhel7-1 ~]# ipa-server-install --setup-dns --forwarder=192.168.122.1 --reverse-zone=122.168.192.in-addr.arpa. --allow-zone-overlap --hostname=rhel7-1.example.com --ip-address=192.168.122.71 --ds-password=Secret123 --admin-password=Secret123 --domain=example.com --realm=example.com --unattended ... Setup complete Next steps: 1. You must make sure these network ports are open: TCP Ports: * 80, 443: HTTP/HTTPS * 389, 636: LDAP/LDAPS * 88, 464: kerberos * 53: bind UDP Ports: * 88, 464: kerberos * 53: bind * 123: ntp 2. You can now obtain a kerberos ticket using the command: 'kinit admin' This ticket will allow you to use the IPA tools (e.g., ipa user-add) and the web user interface. Be sure to back up the CA certificates stored in /root/cacert.p12 These files are required to create replicas. The password for these files is the Directory Manager password [root@rhel7-1 ~]# rpm -q ipa-server 389-ds-base ipa-server-4.5.0-22.el7_4.x86_64 389-ds-base-1.3.6.1-29.el7_4.x86_64 [root@rhel7-1 ~]# yum -y update ... Complete! Will attach full yum update output separately. [root@rhel7-1 ~]# tail /var/log/ipaupgrade.log 2018-08-08T19:16:40Z DEBUG Process finished, return code=0 2018-08-08T19:16:40Z DEBUG stdout= 2018-08-08T19:16:40Z DEBUG stderr= 2018-08-08T19:16:40Z DEBUG Stop of pki-tomcatd@pki-tomcat.service complete 2018-08-08T19:16:40Z INFO The IPA services were upgraded 2018-08-08T19:16:40Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state' 2018-08-08T19:16:40Z DEBUG Saving StateFile to '/var/lib/ipa/sysupgrade/sysupgrade.state' 2018-08-08T19:16:40Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state' 2018-08-08T19:16:40Z DEBUG Saving StateFile to '/var/lib/ipa/sysupgrade/sysupgrade.state' 2018-08-08T19:16:40Z INFO The ipa-server-upgrade command was successful [root@rhel7-1 ~]# tail /var/log/pki/pki-tomcat/ca/debug [08/Aug/2018:14:16:52][Timer-0]: masterConn is connected: true [08/Aug/2018:14:16:52][Timer-0]: getConn: conn is connected true [08/Aug/2018:14:16:52][Timer-0]: getConn: mNumConns now 2 [08/Aug/2018:14:16:52][RequestRecoverThread]: returnConn: mNumConns now 5 [08/Aug/2018:14:16:52][Timer-0]: SecurityDomainSessionTable: No active sessions. [08/Aug/2018:14:16:52][Timer-0]: returnConn: mNumConns now 3 [08/Aug/2018:14:16:52][http-bio-8080-exec-17]: according to ccMode, authorization for servlet: caGetStatus is LDAP based, not XML {1}, use default authz mgr: {2}. [08/Aug/2018:14:16:52][http-bio-8080-exec-17]: CMSServlet:service() uri = /ca/admin/ca/getStatus [08/Aug/2018:14:16:52][http-bio-8080-exec-17]: CMSServlet: caGetStatus start to service. [08/Aug/2018:14:16:52][http-bio-8080-exec-17]: CMSServlet: curDate=Wed Aug 08 14:16:52 CDT 2018 id=caGetStatus time=79 [root@rhel7-1 ~]# kinit admin Password for admin@EXAMPLE.COM: [root@rhel7-1 ~]# ipa user-find -------------- 1 user matched -------------- User login: admin Last name: Administrator Home directory: /home/admin Login shell: /bin/bash Principal alias: admin@EXAMPLE.COM UID: 1931000000 GID: 1931000000 Account disabled: False ---------------------------- Number of entries returned 1 ---------------------------- [root@rhel7-1 ~]# ipa user-show admin User login: admin Last name: Administrator Home directory: /home/admin Login shell: /bin/bash Principal alias: admin@EXAMPLE.COM UID: 1931000000 GID: 1931000000 Account disabled: False Password: True Member of groups: admins, trust admins Kerberos keys available: True [root@rhel7-1 ~]# ipa host-find -------------- 1 host matched -------------- Host name: rhel7-1.example.com Principal name: host/rhel7-1.example.com@EXAMPLE.COM Principal alias: host/rhel7-1.example.com@EXAMPLE.COM SSH public key fingerprint: SHA256:f7uOSukkkDnSdASX2A7wo6lhVjs4c6RYxkCVuC/lHVQ (ssh-rsa), SHA256:RZUPKIVDKtXM1I08DFixCL8xPgsxz/cazqQg+2attG4 (ecdsa-sha2-nistp256), SHA256:WkNPdmezBW2YEHPBKM+z9xbvEHbP6oLaFw+XnXETdG8 (ssh-ed25519) ---------------------------- Number of entries returned 1 ---------------------------- [root@rhel7-1 ~]# ipa dnszone-find Zone name: 122.168.192.in-addr.arpa. Active zone: TRUE Authoritative nameserver: rhel7-1.example.com. Administrator e-mail address: hostmaster.example.com. SOA serial: 1533755804 SOA refresh: 3600 SOA retry: 900 SOA expire: 1209600 SOA minimum: 3600 Allow query: any; Allow transfer: none; Zone name: example.com. Active zone: TRUE Authoritative nameserver: rhel7-1.example.com. Administrator e-mail address: hostmaster.example.com. SOA serial: 1533755804 SOA refresh: 3600 SOA retry: 900 SOA expire: 1209600 SOA minimum: 3600 Allow query: any; Allow transfer: none; ---------------------------- Number of entries returned 2 ---------------------------- [root@rhel7-1 ~]# rpm -q ipa-server 389-ds-base ipa-server-4.6.4-3.el7.x86_64 389-ds-base-1.3.8.4-9.el7.x86_64
The ticket also needs to be verified for upgrade path from 7.0 to 7.6. As per comment#5. Hence moving in to ON_QA.
Tested the upgrade path from RHEL7.0 to 7.6 and the upgrade fails. Hence marking the bug as FAILEDQA. Attaching the logs for reference. For the record upgrade is successful for the below path. - RHEL 70z > RHEL 76 (FAILED) - RHEL 71z > RHEL 76 (PASSED) - RHEL 72z > RHEL 76 (PASSED) - RHEL 73z > RHEL 76 (PASSED) - RHEL 74z > RHEL 76 (PASSED) - RHEL 75z > RHEL 76 (PASSED)
Since the original issue has been fixed and the failure seen during upgrade from 7.0 ---> 7.6 is a different bug, closing this bug as VERIFIED and logging a new bug for the issue mentioned in comment #22 - RHEL 71z > RHEL 76 (PASSED) - RHEL 72z > RHEL 76 (PASSED) - RHEL 73z > RHEL 76 (PASSED) - RHEL 74z > RHEL 76 (PASSED) - RHEL 75z > RHEL 76 (PASSED)
For the upgrade failure for RHEL7.0 --> RHEL7.6 a new bug has been logged. https://bugzilla.redhat.com/show_bug.cgi?id=1615305
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3127