Bug 1850290 - FreeIPA server upgrade from Fedora 31/32 to Rawhide (33) often fails
Summary: FreeIPA server upgrade from Fedora 31/32 to Rawhide (33) often fails
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: java-1.8.0-openjdk
Version: rawhide
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Deepak Bhole
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks: F33BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2020-06-23 23:20 UTC by Adam Williamson
Modified: 2020-07-18 01:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-18 01:18:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
tarball of /var/log from the failed test (4.75 MB, application/gzip)
2020-06-23 23:20 UTC, Adam Williamson
no flags Details
tarball of /var/log from the failed test on 2020-06-01 (4.61 MB, application/octet-stream)
2020-06-30 22:37 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2020-06-23 23:20:31 UTC
Created attachment 1698514 [details]
tarball of /var/log from the failed test

We have a test in openQA which does this:

1. Boot the previous release to the one being tested (in this case, Fedora 32) and deploy the system as a FreeIPA server
2. Similarly boot a second system to the previous release and deploy it as a FreeIPA client
3. Upgrade the server system to the release being tested (in this case, Rawhide); check that IPA is still running after the upgrade
4. Upgrade the client as well
5. Run a series of checks to make sure everything works after the upgrade

Up till Fedora-Rawhide-20200622.n.0 this test was passing. Since Fedora-Rawhide-20200623.n.0 today, it's failing. It fails at step 3: the system upgrade of the server fails, but ipa.service then fails to start. The system logs show that ipa-server-upgrade fails, apparently due to issues with the certificate server (dogtag) and/or the LDAP server (389-ds). I'm attaching a tarball of /var/log .

We're filing this against Java because of all the things that changed in the Fedora-Rawhide-20200623.n.0 compose it seems like the most likely suspect. There were a *lot* of at least *potentially* related things that changed, though. Aside from java-1.8.0-openjdk, other things that changed include bind, glibc, the kernel and openssl. Having looked at them, we don't *think* those changes are likely to be the cause, though.

This violates the Beta criteria: "It must be possible to successfully complete a direct upgrade from a fully updated installation of each of the last two stable Fedora Server releases with the system configured as a FreeIPA domain controller or postgresql server as specified in the relevant criteria. The upgraded system must meet all relevant release criteria, including criteria relating to functionality of the server software" - https://fedoraproject.org/wiki/Fedora_33_Beta_Release_Criteria#Server_upgrade_requirements , so proposing as a Beta blocker.

Comment 1 Severin Gehwolf 2020-06-24 18:50:35 UTC
This seems to be caused by tomcat@pki failing to start. Exception I see in the log is:

2020-06-23 11:09:33 [https-jsse-nio-8443-exec-6] WARNING: Invalid base64: [ipara]: java.lang.IllegalArgumentException: Last unit does not have enough valid bits
java.lang.IllegalArgumentException: Last unit does not have enough valid bits
	at java.util.Base64$Decoder.decode0(Base64.java:734)
	at java.util.Base64$Decoder.decode(Base64.java:526)
	at java.util.Base64$Decoder.decode(Base64.java:549)
	at org.mozilla.jss.netscape.security.util.Utils.base64decode(Utils.java:417)
	at org.dogtagpki.server.authentication.AuthToken.getInByteArray(AuthToken.java:113)
	at org.dogtagpki.server.authentication.AuthToken.getInStringArray(AuthToken.java:198)
	at com.netscape.cms.servlet.cert.CertProcessor.setAuthTokenIntoRequest(CertProcessor.java:132)
	at com.netscape.cms.servlet.cert.CertProcessor.populateRequests(CertProcessor.java:345)
	at com.netscape.cms.servlet.cert.EnrollmentProcessor.processEnrollment(EnrollmentProcessor.java:189)
	at com.netscape.cms.servlet.cert.EnrollmentProcessor.processEnrollment(EnrollmentProcessor.java:97)
	at com.netscape.cms.servlet.profile.ProfileSubmitServlet.processEnrollment(ProfileSubmitServlet.java:276)
	at com.netscape.cms.servlet.profile.ProfileSubmitServlet.process(ProfileSubmitServlet.java:130)
	at com.netscape.cms.servlet.base.CMSServlet.service(CMSServlet.java:493)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAsPrivileged(Subject.java:549)
	at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314)
	at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:170)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:225)
	at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145)
	at java.security.AccessController.doPrivileged(Native Method)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAsPrivileged(Subject.java:549)
	at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314)
	at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:191)
	at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145)
	at java.security.AccessController.doPrivileged(Native Method)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
	at com.netscape.cms.tomcat.ExternalAuthenticationValve.invoke(ExternalAuthenticationValve.java:82)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

Comment 2 Severin Gehwolf 2020-06-24 19:11:22 UTC
From /var/log/messages:

Jun 23 11:31:50 ipa001 ipactl[717]: CalledProcessError: CalledProcessError(Command ['/bin/systemctl', 'start', 'pki-tomcatd'] returned non-zero exit status 1: 'Job for pki-tomcatd failed because a timeout was exceeded.\nSee "systemctl status pki-tomcatd" and "journalctl -xe" for details.\n')

And from /var/log/ipaupgrade.log

2020-06-23T15:28:46Z DEBUG Starting external process
2020-06-23T15:28:46Z DEBUG args=['/usr/sbin/tomcat', 'version']
2020-06-23T15:28:47Z DEBUG Process finished, return code=0
2020-06-23T15:28:47Z DEBUG stdout=Server version: Apache Tomcat/9.0.36
Server built:   Dec 12 1969 11:45:33 UTC
Server number:  9.0.36.0
OS Name:        Linux
OS Version:     5.8.0-0.rc2.20200622git625d3449788f.1.fc33.x86_64
Architecture:   amd64
JVM Version:    1.8.0_262-ea-b01
JVM Vendor:     Oracle Corporation

2020-06-23T15:28:47Z DEBUG stderr=
2020-06-23T15:28:47Z DEBUG Starting external process
2020-06-23T15:28:47Z DEBUG args=['/bin/systemctl', 'start', 'pki-tomcatd']
2020-06-23T15:31:50Z DEBUG Process finished, return code=1
2020-06-23T15:31:50Z DEBUG stdout=
2020-06-23T15:31:50Z DEBUG stderr=Job for pki-tomcatd failed because a timeout was exceeded.
See "systemctl status pki-tomcatd" and "journalctl -xe" for details.

2020-06-23T15:31:50Z ERROR IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.



I also see this in /var/log/pki/pki-tomcat/ca/debug*.log:

2020-06-23 11:07:16 [https-jsse-nio-8443-exec-3] SEVERE: Object certificate not found. Error Certificate not found: caSigningCert cert-pki-ca: Certificate not found: caSigningCert cert-pki-ca
Certificate not found: caSigningCert cert-pki-ca: Certificate not found: caSigningCert cert-pki-ca
	at com.netscape.ca.SigningUnit.init(SigningUnit.java:163)
	at com.netscape.ca.CertificateAuthority.initSigUnit(CertificateAuthority.java:1741)
	at com.netscape.ca.CertificateAuthority.init(CertificateAuthority.java:540)
	at com.netscape.cmscore.apps.CMSEngine.reinit(CMSEngine.java:1014)
	at org.dogtagpki.server.ca.CAConfigurator.reinitSubsystems(CAConfigurator.java:170)
	at com.netscape.cms.servlet.csadmin.Configurator.setupDatabase(Configurator.java:447)
	at org.dogtagpki.server.ca.CAConfigurator.setupDatabase(CAConfigurator.java:76)
	at org.dogtagpki.server.rest.SystemConfigService.setupDatabase(SystemConfigService.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:140)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:295)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:249)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:236)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:406)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:213)
	at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:228)
	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAsPrivileged(Subject.java:549)
	at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314)
	at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:170)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:225)
	at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145)
	at java.security.AccessController.doPrivileged(Native Method)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:282)
	at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:279)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAsPrivileged(Subject.java:549)
	at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:314)
	at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:191)
	at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:47)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:149)
	at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:145)
	at java.security.AccessController.doPrivileged(Native Method)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
	at com.netscape.cms.tomcat.ExternalAuthenticationValve.invoke(ExternalAuthenticationValve.java:82)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.mozilla.jss.crypto.ObjectNotFoundException: Certificate not found: caSigningCert cert-pki-ca
	at org.mozilla.jss.CryptoManager.findCertByNicknameNative(Native Method)
	at org.mozilla.jss.CryptoManager.findCertByNickname(CryptoManager.java:865)
	at com.netscape.ca.SigningUnit.init(SigningUnit.java:159)
	... 72 more

Comment 3 Andrew John Hughes 2020-06-24 19:55:39 UTC
I don't see anything attributing this to OpenJDK. There seem to be failures throughout the system e.g.

Jun 23 11:09:46 ipa001 certmonger[10065]: 2020-06-23 11:09:46 [10065] error:0D06407A:asn1 encoding routines:a2d_ASN1_OBJECT:first num too large
Jun 23 11:09:53 ipa001 certmonger[10105]: Error obtaining initial credentials: Generic error (see e-text).
Jun 23 11:09:53 ipa001 certmonger[10105]: Error setting up ccache at the client: Generic error (see e-text).
Jun 23 11:11:53 ipa001 dnf[11039]:  - Curl error (6): Couldn't resolve host name for https://dl.fedoraproject.org/pub/fedora/linux/releases/32/Modular/x86_64/os/repodata/repomd.xml [Could not resolve host: dl.fedoraproject.org]
Jun 23 11:16:37 ipa001 named-pkcs11[10722]: LDAP error: Can't contact LDAP server: ldap_sync_poll() failed
Jun 23 11:16:37 ipa001 named-pkcs11[10722]: ldap_syncrepl will reconnect in 60 seconds
Jun 23 11:28:15 ipa001 setroubleshootd[797]: error: Unable to open sqlite database /var/lib/rpm/rpmdb.sqlite: unable to open database file

Comment 4 Adam Williamson 2020-06-24 21:45:43 UTC
Hum. The test passed again today. It's definitely strange that it failed twice in a row exactly the same way with yesterday's compose, but with today's it worked, at least. I'll keep an eye on it and investigate more if it breaks again.

Comment 5 Adam Williamson 2020-06-24 21:47:12 UTC
FWIW, today's test did log a *ton* of AVCs, which may have been related to the failure. I'll file a separate selinux-policy bug for these, but you can see them at https://openqa.fedoraproject.org/tests/620050/file/_console_avc_crash-avcs.txt .

Comment 6 Adam Williamson 2020-06-30 19:30:40 UTC
Well, since I last updated this, the test has failed again every time it's run:

https://openqa.fedoraproject.org/tests/621392#next_previous

ditto a nearly-identical test which upgrades from F31 (rather than F32):

https://openqa.fedoraproject.org/tests/621355#next_previous

so there is definitely still some kind of problem here :( Each test failure has its logs attached on the Logs & Assets tab, so we can poke through to try and find out some common thread. Note the logs will get garbage collected after a couple of weeks, if we don't have this figured out soon I'll try and remember to attach a few more samples.

Comment 7 Adam Williamson 2020-06-30 20:52:16 UTC
I tried a run with SELinux in permissive mode, but still failed:

https://openqa.fedoraproject.org/tests/621579

Comment 8 Adam Williamson 2020-06-30 22:03:13 UTC
Really having trouble getting a handle on this. It seems like the problem is tomcat, but I can't really find anything to say *why*. It starts to start up, then...just...doesn't:

[adamw@adam tmp]$ journalctl --file var-new/log/journal/153ded1a9d5a467b846b7bbd9d256fa9/system.journal -b-1 -u pki-tomcatd --no-pager
-- Logs begin at Tue 2020-06-30 13:15:34 PDT, end at Tue 2020-06-30 13:50:01 PDT. --
Jun 30 13:40:55 ipa001.domain.local systemd[1]: Starting PKI Tomcat Server pki-tomcat...
Jun 30 13:40:57 ipa001.domain.local server[13559]: Java virtual machine used: /usr/lib/jvm/jre-1.8.0-openjdk/bin/java
Jun 30 13:40:57 ipa001.domain.local server[13559]: classpath used: /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/lib/java/commons-daemon.jar
Jun 30 13:40:57 ipa001.domain.local server[13559]: main class used: org.apache.catalina.startup.Bootstrap
Jun 30 13:40:57 ipa001.domain.local server[13559]: flags used:
Jun 30 13:40:57 ipa001.domain.local server[13559]: options used: -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager     -Djava.security.manager     -Djava.security.policy==/var/lib/pki/pki-tomcat/conf/catalina.policy
Jun 30 13:40:57 ipa001.domain.local server[13559]: arguments used: start
Jun 30 13:40:57 ipa001.domain.local ipa-pki-wait-running[13560]: pki.client: /usr/libexec/ipa/ipa-pki-wait-running:60: The subsystem in PKIConnection.__init__() has been deprecated (https://www.dogtagpki.org/wiki/PKI_10.8_Python_Changes).
Jun 30 13:40:57 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Created connection http://ipa001.domain.local:8080/ca
Jun 30 13:40:57 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Connection failed: HTTPConnectionPool(host='ipa001.domain.local', port=8080): Max retries exceeded with url: /ca/admin/ca/getStatus (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2546d9caf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Jun 30 13:40:58 ipa001.domain.local java[13559]: usr/lib/api/apiutil.c Could not open /run/lock/opencryptoki/LCK..APIlock
Jun 30 13:40:58 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Connection failed: HTTPConnectionPool(host='ipa001.domain.local', port=8080): Max retries exceeded with url: /ca/admin/ca/getStatus (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2546d9cf70>: Failed to establish a new connection: [Errno 111] Connection refused'))
Jun 30 13:40:59 ipa001.domain.local ipa-pki-wait-running[13560]: ipa-pki-wait-running: Connection failed: HTTPConnectionPool(host='ipa001.domain.local', port=8080): Max retries exceeded with url: /ca/admin/ca/getStatus (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2546d57370>: Failed to establish a new connection: [Errno 111] Connection refused'))

eventually ipa-pki-wait-running gets bored and times out.

Comment 9 Adam Williamson 2020-06-30 22:36:38 UTC
Hum, well, here's a thing: looking more closely at the history of the two relevant tests, I think this started happening earlier than I thought. The test failed twice in Fedora-Rawhide-20200601.n.1 in a way that looks very similar:

https://openqa.fedoraproject.org/tests/609532
https://openqa.fedoraproject.org/tests/609584

and there are periodic failures since then. The F31 -> Rawhide test failed in the same module in Fedora-Rawhide-20200529.n.2 and even in Fedora-Rawhide-20200516.n.0 ; those logs have been garbage collected now so I can't tell for sure if they were the same.

I'll attach the logs from the 2020-06-01 failure before they get GCed...

Comment 10 Adam Williamson 2020-06-30 22:37:22 UTC
Created attachment 1699394 [details]
tarball of /var/log from the failed test on 2020-06-01

Comment 11 Adam Williamson 2020-07-11 01:50:36 UTC
so the test has now passed every day for the last week. I officially give up. If it keeps passing next week I guess I'll just close the bug again.

Comment 12 Adam Williamson 2020-07-18 01:18:16 UTC
This kept passing for a while, now it's failing again, but it's because of a known bug https://bugzilla.redhat.com/show_bug.cgi?id=1857043 . So I'll close this again for now.


Note You need to log in before you can comment on or make changes to this bug.