In Fedora-Rawhide-20200714.n.0, FreeIPA server deployment failed in all openQA tests that attempt it. This worked fine in 20200713.n.0. Failure appears to be this: Jul 14 06:44:39 localhost systemd[1]: Starting PKI Tomcat Server pki-tomcat... Jul 14 06:44:41 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pki-tomcatd@pki-tomcat comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jul 14 06:44:41 localhost systemd[1]: Started PKI Tomcat Server pki-tomcat. Jul 14 06:44:41 localhost server[29213]: Java virtual machine used: /usr/lib/jvm/jre-1.8.0-openjdk/bin/java Jul 14 06:44:41 localhost server[29213]: classpath used: /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/lib/java/commons-daemon.jar Jul 14 06:44:41 localhost server[29213]: main class used: org.apache.catalina.startup.Bootstrap Jul 14 06:44:41 localhost server[29213]: flags used: Jul 14 06:44:41 localhost server[29213]: options used: -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.security.manager -Djava.security.policy==/var/lib/pki/pki-tomcat/conf/catalina.policy Jul 14 06:44:41 localhost server[29213]: arguments used: start Jul 14 06:44:42 localhost server[29213]: SEVERE: Error loading descriptors from [jar:file:/usr/share/java/tomcat/catalina.jar!/org/apache/catalina/core/mbeans-descriptors.xml] Jul 14 06:44:42 localhost server[29213]: SEVERE: Error loading descriptors from [jar:file:/usr/share/java/tomcat/catalina.jar!/org/apache/catalina/mbeans-descriptors.xml] Jul 14 06:44:42 localhost server[29213]: SEVERE: Error registering MBean Jul 14 06:44:42 localhost server[29213]: java.lang.ClassNotFoundException: org.apache.tomcat.util.modeler.modules.MbeansDescriptorsIntrospectionSource Jul 14 06:44:42 localhost server[29213]: #011at java.net.URLClassLoader.findClass(URLClassLoader.java:382) Jul 14 06:44:42 localhost server[29213]: #011at java.lang.ClassLoader.loadClass(ClassLoader.java:418) Jul 14 06:44:42 localhost server[29213]: #011at java.lang.ClassLoader.loadClass(ClassLoader.java:351) Jul 14 06:44:42 localhost server[29213]: #011at java.lang.Class.forName0(Native Method) Jul 14 06:44:42 localhost server[29213]: #011at java.lang.Class.forName(Class.java:264) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.getModelerSource(Registry.java:737) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.load(Registry.java:611) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.findManagedBean(Registry.java:518) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.registerComponent(Registry.java:641) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.util.LifecycleMBeanBase.register(LifecycleMBeanBase.java:158) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.util.LifecycleMBeanBase.initInternal(LifecycleMBeanBase.java:59) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:1007) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Catalina.load(Catalina.java:690) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Catalina.load(Catalina.java:712) Jul 14 06:44:42 localhost server[29213]: #011at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Jul 14 06:44:42 localhost server[29213]: #011at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Jul 14 06:44:42 localhost server[29213]: #011at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Jul 14 06:44:42 localhost server[29213]: #011at java.lang.reflect.Method.invoke(Method.java:498) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:302) Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:472) Jul 14 06:44:42 localhost server[29213]: WARNING: Failed to register object [StandardServer[8005]] with name [Catalina:type=Server] during component initialisation ...blahblahblah, like all Java tracebacks it goes on forever and ever. I'll attach the whole /var/log tarball from openQA, errors are visible in var/log/messages . Proposing as a Beta blocker, violates Basic criterion "It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages..." - https://fedoraproject.org/wiki/Basic_Release_Criteria#FreeIPA_server_requirements
Created attachment 1701140 [details] /var/log tarball from failed test
Yeah, pretty sure this is tomcat, because new tomcat builds got submitted for F31 and F32 as well, and failed their openQA tests too: https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=31&build=Update-FEDORA-2020-ca61f96f84&groupid=2 https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=32&build=Update-FEDORA-2020-e8a57db181&groupid=2
Boo. Why are all the logfiles empty in the attachment? The class still lives in the tomcat-coyote.jar and there's nothing I see to help me narrow down what the problem could be :(
"Boo. Why are all the logfiles empty in the attachment?" That's just how it comes. The test isn't doing anything special to cause that to happen, it's just booting a clean system and doing a typical deployment.
One thing I can do is some manual bisecting with scratch builds. it would help if we can identify a few suspect commits, ideally ones that can be cleanly reverted for testing. On a quick look at the changelog, I'm thinking possibly the fixes for https://bz.apache.org/bugzilla/show_bug.cgi?id=64506 and https://bz.apache.org/bugzilla/show_bug.cgi?id=64541 may be suspects?
I tried reverting a few likely-looking upstream commits - c4019272d , 6efae2f5c , 15377cbe5 , 5efa602b0a - but none of those did the trick. Others that I might want to try don't revert cleanly and I don't want to do all the work to unpick them just on a guess. Gonna leave this for more knowledgeable folks for now.
This issue is caused by changes made to the OSGi manifests upstream so that the packages aren't included in the tomcat-coyote.jar. The commit is https://github.com/apache/tomcat/commit/393c022c87e5cbebf1b96c3e1e7aa3b2ab4d5b74#diff-84aafb5950b47b3aa681bdf5cd1ebb6e. If the issue isn't resolved upstream I can produce a patch that should resolve the issue.
aha, I think that one was in my "things I wanted to try reverting but which didn't revert cleanly" bucket :) thanks! A patch would definitely be appreciated if there isn't a prompt upstream reversion, we would like this fixed in Rawhide ASAP.
er, I meant 'upstream resolution', not 'upstream reversion'.
Discussed during the 2020-07-20 blocker review meeting: [0] The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following Basic criterion: "It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages." [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-07-20/f33-blocker-review.2020-07-20-16.18.txt
FEDORA-2020-f897a68801 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
Looks like it's not entirely fixed, we get different class not found errors now: Jul 21 14:19:14 localhost pki-server[29251]: [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory] Jul 21 14:19:14 localhost pki-server[29251]: Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory Jul 21 14:19:18 localhost server[29370]: Caused by: java.lang.ClassNotFoundException: org.apache.tomcat.util.http.fileupload.impl.InvalidContentTypeException
Created attachment 1701978 [details] /var/log tarball from failed test (updated 2020-07-21, after initial fix) sorry, that was an extract: [adamw@adam tmp]$ grep ClassNotFound var/log/messages Jul 21 14:19:14 localhost pki-server[29251]: [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory] Jul 21 14:19:14 localhost pki-server[29251]: Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory Jul 21 14:19:18 localhost server[29370]: Caused by: java.lang.ClassNotFoundException: org.apache.tomcat.util.http.fileupload.impl.InvalidContentTypeException I'm attaching the full log.
I verified that the classes aren't present in the when I build while having the aqute-bnd and aqute-bndlib packages installed from Fedora (version 3.5), however when I remove those and use the aqute-bnd version (5.1) that Tomcat downloads, they are present which leads me to believe that the issue is caused by some bug in the older version. I saw that there's a PR to update Fedora's bnd library to 4.3 (https://src.fedoraproject.org/rpms/aqute-bnd/pull-request/6) and that version seems to fix the issue for me locally :)
Unfortunately that PR isn't mergeable right now...will ping Fabio to rebase it...thanks.
Yeah, I looked at the Tomcat build failure and noted that there are no classes in the biz.aQute.bnd.jar, just the META-INF...maybe the suspected OSGi bug got them too! I commented on the PR, but given the age of the build maybe we should look for an alternative to address this particular problem? I could just not run the addOSGi step in the ant tasks that build the Tomcat jars temporarily until we can consume the upgrade. The OSGi step was only added a few releases ago, so it shouldn't break anything. Thoughts?
My thoughts, I'm afraid, are "I don't know anywhere near enough about Java to have any useful thoughts" :)
FEDORA-2020-3c8c0646b9 has been pushed to the Fedora ELN stable repository. If problem still persists, please make note of it in this bug report.
Reopening since there is no reason to believe that the ELN build fixes the problem in any way better than the rawhide build.
FYI I submitted a new build earlier (tomcat-9.0.37-3) which should contain all the classes that were there before the update. I removed all of the OSGi metadata steps in ant so that it doesn't remove any classes. If the aqute-bnd update to 4.3 is successful, then I can add the OSGi step back in. Otherwise I'll have to manually add the OSGi metadata back to the jars if someone is using them and the latest build breaks their install.
awesome, thanks for the work, I appreciate you taking the trouble.
No problem. If you can confirm when the issue is resolved then I'll port the changes back to fc32. Let's hope that nobody is utilizing the jars as OSGi bundles :)
Well...we can wait for the next Rawhide compose, but Java 11 has been merged now, so we might be broken in other ways. If so, I'll see if I can figure out another way to test. Actually, easiest thing to do might be to test an F32 scratch build - I have a mechanism for doing that.
So yeah, more or less what I predicted seems to have come to pass - deployment is now failing on a thing that looks like incompatibility with Java 11: https://bugzilla.redhat.com/show_bug.cgi?id=1866570 so basically we can assume this is fixed and close it, or wait till the Java 11 incompatibility is resolved somehow to make sure.
Hum. Current F32 -> Rawhide upgrade tests are still showing this error when booted after upgrade: Aug 07 12:20:31 ipa001.domain.local pki-server[1298]: [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory] Aug 07 12:20:31 ipa001.domain.local pki-server[1298]: Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory full log tarball: https://openqa.fedoraproject.org/tests/636836/file/role_deploy_domain_controller_check-var_log.tar.gz
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle. Changing version to 33.
OK, openQA tests show FreeIPA deployment works on current F33 composes, so this is definitely resolved. Closing.