Bug 1850290
Summary: | FreeIPA server upgrade from Fedora 31/32 to Rawhide (33) often fails | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> | ||||||
Component: | java-1.8.0-openjdk | Assignee: | Deepak Bhole <dbhole> | ||||||
Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | abokovoy, ahughes, dbhole, jerboaa, jvanek, msrb, mvala, robatino, sgehwolf | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | openqa | ||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-07-18 01:18:16 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1766775 | ||||||||
Attachments: |
|
Description
Adam Williamson
2020-06-23 23:20:31 UTC
This seems to be caused by tomcat@pki failing to start. Exception I see in the log is: 2020-06-23 11:09:33 [https-jsse-nio-8443-exec-6] WARNING: Invalid base64: [ipara]: java.lang.IllegalArgumentException: Last unit does not have enough valid bits java.lang.IllegalArgumentException: Last unit does not have enough valid bits at java.util.Base64$Decoder.decode0(Base64.java:734) at java.util.Base64$Decoder.decode(Base64.java:526) at java.util.Base64$Decoder.decode(Base64.java:549) at org.mozilla.jss.netscape.security.util.Utils.base64decode(Utils.java:417) at org.dogtagpki.server.authentication.AuthToken.getInByteArray(AuthToken.java:113) at org.dogtagpki.server.authentication.AuthToken.getInStringArray(AuthToken.java:198) at com.netscape.cms.servlet.cert.CertProcessor.setAuthTokenIntoRequest(CertProcessor.java:132) at com.netscape.cms.servlet.cert.CertProcessor.populateRequests(CertProcessor.java:345) at com.netscape.cms.servlet.cert.EnrollmentProcessor.processEnrollment(EnrollmentProcessor.java:189) at com.netscape.cms.servlet.cert.EnrollmentProcessor.processEnrollment(EnrollmentProcessor.java:97) at com.netscape.cms.servlet.profile.ProfileSubmitServlet.processEnrollment(ProfileSubmitServlet.java:276) at com.netscape.cms.servlet.profile.ProfileSubmitServlet.process(ProfileSubmitServlet.java:130) at com.netscape.cms.servlet.base.CMSServlet.service(CMSServlet.java:493) at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAsPrivileged(Subject.java:549) at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314) at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:170) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:225) at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAsPrivileged(Subject.java:549) at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314) at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:253) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:191) at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) at com.netscape.cms.tomcat.ExternalAuthenticationValve.invoke(ExternalAuthenticationValve.java:82) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) From /var/log/messages: Jun 23 11:31:50 ipa001 ipactl[717]: CalledProcessError: CalledProcessError(Command ['/bin/systemctl', 'start', 'pki-tomcatd'] returned non-zero exit status 1: 'Job for pki-tomcatd failed because a timeout was exceeded.\nSee "systemctl status pki-tomcatd" and "journalctl -xe" for details.\n') And from /var/log/ipaupgrade.log 2020-06-23T15:28:46Z DEBUG Starting external process 2020-06-23T15:28:46Z DEBUG args=['/usr/sbin/tomcat', 'version'] 2020-06-23T15:28:47Z DEBUG Process finished, return code=0 2020-06-23T15:28:47Z DEBUG stdout=Server version: Apache Tomcat/9.0.36 Server built: Dec 12 1969 11:45:33 UTC Server number: 9.0.36.0 OS Name: Linux OS Version: 5.8.0-0.rc2.20200622git625d3449788f.1.fc33.x86_64 Architecture: amd64 JVM Version: 1.8.0_262-ea-b01 JVM Vendor: Oracle Corporation 2020-06-23T15:28:47Z DEBUG stderr= 2020-06-23T15:28:47Z DEBUG Starting external process 2020-06-23T15:28:47Z DEBUG args=['/bin/systemctl', 'start', 'pki-tomcatd'] 2020-06-23T15:31:50Z DEBUG Process finished, return code=1 2020-06-23T15:31:50Z DEBUG stdout= 2020-06-23T15:31:50Z DEBUG stderr=Job for pki-tomcatd failed because a timeout was exceeded. See "systemctl status pki-tomcatd" and "journalctl -xe" for details. 2020-06-23T15:31:50Z ERROR IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually. I also see this in /var/log/pki/pki-tomcat/ca/debug*.log: 2020-06-23 11:07:16 [https-jsse-nio-8443-exec-3] SEVERE: Object certificate not found. Error Certificate not found: caSigningCert cert-pki-ca: Certificate not found: caSigningCert cert-pki-ca Certificate not found: caSigningCert cert-pki-ca: Certificate not found: caSigningCert cert-pki-ca at com.netscape.ca.SigningUnit.init(SigningUnit.java:163) at com.netscape.ca.CertificateAuthority.initSigUnit(CertificateAuthority.java:1741) at com.netscape.ca.CertificateAuthority.init(CertificateAuthority.java:540) at com.netscape.cmscore.apps.CMSEngine.reinit(CMSEngine.java:1014) at org.dogtagpki.server.ca.CAConfigurator.reinitSubsystems(CAConfigurator.java:170) at com.netscape.cms.servlet.csadmin.Configurator.setupDatabase(Configurator.java:447) at org.dogtagpki.server.ca.CAConfigurator.setupDatabase(CAConfigurator.java:76) at org.dogtagpki.server.rest.SystemConfigService.setupDatabase(SystemConfigService.java:137) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:140) at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:295) at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:249) at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:236) at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:406) at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:213) at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:228) at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56) at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAsPrivileged(Subject.java:549) at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314) at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:170) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:225) at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282) at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAsPrivileged(Subject.java:549) at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314) at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:253) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:191) at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149) at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) at com.netscape.cms.tomcat.ExternalAuthenticationValve.invoke(ExternalAuthenticationValve.java:82) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) Caused by: org.mozilla.jss.crypto.ObjectNotFoundException: Certificate not found: caSigningCert cert-pki-ca at org.mozilla.jss.CryptoManager.findCertByNicknameNative(Native Method) at org.mozilla.jss.CryptoManager.findCertByNickname(CryptoManager.java:865) at com.netscape.ca.SigningUnit.init(SigningUnit.java:159) ... 72 more I don't see anything attributing this to OpenJDK. There seem to be failures throughout the system e.g. Jun 23 11:09:46 ipa001 certmonger[10065]: 2020-06-23 11:09:46 [10065] error:0D06407A:asn1 encoding routines:a2d_ASN1_OBJECT:first num too large Jun 23 11:09:53 ipa001 certmonger[10105]: Error obtaining initial credentials: Generic error (see e-text). Jun 23 11:09:53 ipa001 certmonger[10105]: Error setting up ccache at the client: Generic error (see e-text). Jun 23 11:11:53 ipa001 dnf[11039]: - Curl error (6): Couldn't resolve host name for https://dl.fedoraproject.org/pub/fedora/linux/releases/32/Modular/x86_64/os/repodata/repomd.xml [Could not resolve host: dl.fedoraproject.org] Jun 23 11:16:37 ipa001 named-pkcs11[10722]: LDAP error: Can't contact LDAP server: ldap_sync_poll() failed Jun 23 11:16:37 ipa001 named-pkcs11[10722]: ldap_syncrepl will reconnect in 60 seconds Jun 23 11:28:15 ipa001 setroubleshootd[797]: error: Unable to open sqlite database /var/lib/rpm/rpmdb.sqlite: unable to open database file Hum. The test passed again today. It's definitely strange that it failed twice in a row exactly the same way with yesterday's compose, but with today's it worked, at least. I'll keep an eye on it and investigate more if it breaks again. FWIW, today's test did log a *ton* of AVCs, which may have been related to the failure. I'll file a separate selinux-policy bug for these, but you can see them at https://openqa.fedoraproject.org/tests/620050/file/_console_avc_crash-avcs.txt . Well, since I last updated this, the test has failed again every time it's run: https://openqa.fedoraproject.org/tests/621392#next_previous ditto a nearly-identical test which upgrades from F31 (rather than F32): https://openqa.fedoraproject.org/tests/621355#next_previous so there is definitely still some kind of problem here :( Each test failure has its logs attached on the Logs & Assets tab, so we can poke through to try and find out some common thread. Note the logs will get garbage collected after a couple of weeks, if we don't have this figured out soon I'll try and remember to attach a few more samples. I tried a run with SELinux in permissive mode, but still failed: https://openqa.fedoraproject.org/tests/621579 Really having trouble getting a handle on this. It seems like the problem is tomcat, but I can't really find anything to say *why*. It starts to start up, then...just...doesn't: [adamw@adam tmp]$ journalctl --file var-new/log/journal/153ded1a9d5a467b846b7bbd9d256fa9/system.journal -b-1 -u pki-tomcatd --no-pager -- Logs begin at Tue 2020-06-30 13:15:34 PDT, end at Tue 2020-06-30 13:50:01 PDT. -- Jun 30 13:40:55 ipa001.domain.local systemd[1]: Starting PKI Tomcat Server pki-tomcat... Jun 30 13:40:57 ipa001.domain.local server[13559]: Java virtual machine used: /usr/lib/jvm/jre-1.8.0-openjdk/bin/java Jun 30 13:40:57 ipa001.domain.local server[13559]: classpath used: /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/lib/java/commons-daemon.jar Jun 30 13:40:57 ipa001.domain.local server[13559]: main class used: org.apache.catalina.startup.Bootstrap Jun 30 13:40:57 ipa001.domain.local server[13559]: flags used: Jun 30 13:40:57 ipa001.domain.local server[13559]: options used: -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.security.manager -Djava.security.policy==/var/lib/pki/pki-tomcat/conf/catalina.policy Jun 30 13:40:57 ipa001.domain.local server[13559]: arguments used: start Jun 30 13:40:57 ipa001.domain.local ipa-pki-wait-running[13560]: pki.client: /usr/libexec/ipa/ipa-pki-wait-running:60: The subsystem in PKIConnection.__init__() has been deprecated (https://www.dogtagpki.org/wiki/PKI_10.8_Python_Changes). Jun 30 13:40:57 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Created connection http://ipa001.domain.local:8080/ca Jun 30 13:40:57 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Connection failed: HTTPConnectionPool(host='ipa001.domain.local', port=8080): Max retries exceeded with url: /ca/admin/ca/getStatus (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2546d9caf0>: Failed to establish a new connection: [Errno 111] Connection refused')) Jun 30 13:40:58 ipa001.domain.local java[13559]: usr/lib/api/apiutil.c Could not open /run/lock/opencryptoki/LCK..APIlock Jun 30 13:40:58 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Connection failed: HTTPConnectionPool(host='ipa001.domain.local', port=8080): Max retries exceeded with url: /ca/admin/ca/getStatus (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2546d9cf70>: Failed to establish a new connection: [Errno 111] Connection refused')) Jun 30 13:40:59 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Connection failed: HTTPConnectionPool(host='ipa001.domain.local', port=8080): Max retries exceeded with url: /ca/admin/ca/getStatus (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2546d57370>: Failed to establish a new connection: [Errno 111] Connection refused')) eventually ipa-pki-wait-running gets bored and times out. Hum, well, here's a thing: looking more closely at the history of the two relevant tests, I think this started happening earlier than I thought. The test failed twice in Fedora-Rawhide-20200601.n.1 in a way that looks very similar: https://openqa.fedoraproject.org/tests/609532 https://openqa.fedoraproject.org/tests/609584 and there are periodic failures since then. The F31 -> Rawhide test failed in the same module in Fedora-Rawhide-20200529.n.2 and even in Fedora-Rawhide-20200516.n.0 ; those logs have been garbage collected now so I can't tell for sure if they were the same. I'll attach the logs from the 2020-06-01 failure before they get GCed... Created attachment 1699394 [details]
tarball of /var/log from the failed test on 2020-06-01
so the test has now passed every day for the last week. I officially give up. If it keeps passing next week I guess I'll just close the bug again. This kept passing for a while, now it's failing again, but it's because of a known bug https://bugzilla.redhat.com/show_bug.cgi?id=1857043 . So I'll close this again for now. |