Bug 1857043

Summary: FreeIPA server deployment fails in Fedora-Rawhide-20200714.n.0 due to pki-tomcat failing to run with "java.lang.ClassNotFoundException: org.apache.tomcat.util.modeler.modules.MbeansDescriptorsIntrospectionSource"
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: tomcatAssignee: Ivan Afonichev <ivan.afonichev>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 33CC: abokovoy, alee, coolsvap, csutherl, gmarr, huwang, ivan.afonichev, java-sig-commits, jpazdziora, krzysztof.daniel, robatino
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: openqa AcceptedBlocker
Fixed In Version: tomcat-9.0.37-2.fc33 tomcat-9.0.37-2.eln103 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-24 17:19:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1766775    
Attachments:
Description Flags
/var/log tarball from failed test
none
/var/log tarball from failed test (updated 2020-07-21, after initial fix) none

Description Adam Williamson 2020-07-15 02:03:56 UTC
In Fedora-Rawhide-20200714.n.0, FreeIPA server deployment failed in all openQA tests that attempt it. This worked fine in 20200713.n.0. Failure appears to be this:

Jul 14 06:44:39 localhost systemd[1]: Starting PKI Tomcat Server pki-tomcat...
Jul 14 06:44:41 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pki-tomcatd@pki-tomcat comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jul 14 06:44:41 localhost systemd[1]: Started PKI Tomcat Server pki-tomcat.
Jul 14 06:44:41 localhost server[29213]: Java virtual machine used: /usr/lib/jvm/jre-1.8.0-openjdk/bin/java
Jul 14 06:44:41 localhost server[29213]: classpath used: /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/lib/java/commons-daemon.jar
Jul 14 06:44:41 localhost server[29213]: main class used: org.apache.catalina.startup.Bootstrap
Jul 14 06:44:41 localhost server[29213]: flags used:
Jul 14 06:44:41 localhost server[29213]: options used: -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager     -Djava.security.manager     -Djava.security.policy==/var/lib/pki/pki-tomcat/conf/catalina.policy
Jul 14 06:44:41 localhost server[29213]: arguments used: start
Jul 14 06:44:42 localhost server[29213]: SEVERE: Error loading descriptors from [jar:file:/usr/share/java/tomcat/catalina.jar!/org/apache/catalina/core/mbeans-descriptors.xml]
Jul 14 06:44:42 localhost server[29213]: SEVERE: Error loading descriptors from [jar:file:/usr/share/java/tomcat/catalina.jar!/org/apache/catalina/mbeans-descriptors.xml]
Jul 14 06:44:42 localhost server[29213]: SEVERE: Error registering MBean
Jul 14 06:44:42 localhost server[29213]: java.lang.ClassNotFoundException: org.apache.tomcat.util.modeler.modules.MbeansDescriptorsIntrospectionSource
Jul 14 06:44:42 localhost server[29213]: #011at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
Jul 14 06:44:42 localhost server[29213]: #011at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
Jul 14 06:44:42 localhost server[29213]: #011at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
Jul 14 06:44:42 localhost server[29213]: #011at java.lang.Class.forName0(Native Method)
Jul 14 06:44:42 localhost server[29213]: #011at java.lang.Class.forName(Class.java:264)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.getModelerSource(Registry.java:737)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.load(Registry.java:611)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.findManagedBean(Registry.java:518)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.tomcat.util.modeler.Registry.registerComponent(Registry.java:641)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.util.LifecycleMBeanBase.register(LifecycleMBeanBase.java:158)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.util.LifecycleMBeanBase.initInternal(LifecycleMBeanBase.java:59)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:1007)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Catalina.load(Catalina.java:690)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Catalina.load(Catalina.java:712)
Jul 14 06:44:42 localhost server[29213]: #011at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Jul 14 06:44:42 localhost server[29213]: #011at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Jul 14 06:44:42 localhost server[29213]: #011at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Jul 14 06:44:42 localhost server[29213]: #011at java.lang.reflect.Method.invoke(Method.java:498)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:302)
Jul 14 06:44:42 localhost server[29213]: #011at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:472)
Jul 14 06:44:42 localhost server[29213]: WARNING: Failed to register object [StandardServer[8005]] with name [Catalina:type=Server] during component initialisation

...blahblahblah, like all Java tracebacks it goes on forever and ever. I'll attach the whole /var/log tarball from openQA, errors are visible in var/log/messages .

Proposing as a Beta blocker, violates Basic criterion "It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages..." - https://fedoraproject.org/wiki/Basic_Release_Criteria#FreeIPA_server_requirements

Comment 1 Adam Williamson 2020-07-15 02:04:52 UTC
Created attachment 1701140 [details]
/var/log tarball from failed test

Comment 2 Adam Williamson 2020-07-15 15:35:51 UTC
Yeah, pretty sure this is tomcat, because new tomcat builds got submitted for F31 and F32 as well, and failed their openQA tests too:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=31&build=Update-FEDORA-2020-ca61f96f84&groupid=2
https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=32&build=Update-FEDORA-2020-e8a57db181&groupid=2

Comment 3 Coty Sutherland 2020-07-15 15:50:43 UTC
Boo. Why are all the logfiles empty in the attachment? The class still lives in the tomcat-coyote.jar and there's nothing I see to help me narrow down what the problem could be :(

Comment 4 Adam Williamson 2020-07-15 16:00:09 UTC
"Boo. Why are all the logfiles empty in the attachment?"

That's just how it comes. The test isn't doing anything special to cause that to happen, it's just booting a clean system and doing a typical deployment.

Comment 5 Adam Williamson 2020-07-15 16:04:48 UTC
One thing I can do is some manual bisecting with scratch builds. it would help if we can identify a few suspect commits, ideally ones that can be cleanly reverted for testing. On a quick look at the changelog, I'm thinking possibly the fixes for https://bz.apache.org/bugzilla/show_bug.cgi?id=64506 and https://bz.apache.org/bugzilla/show_bug.cgi?id=64541 may be suspects?

Comment 6 Adam Williamson 2020-07-15 19:51:18 UTC
I tried reverting a few likely-looking upstream commits - c4019272d , 6efae2f5c , 15377cbe5 , 5efa602b0a - but none of those did the trick. Others that I might want to try don't revert cleanly and I don't want to do all the work to unpick them just on a guess. Gonna leave this for more knowledgeable folks for now.

Comment 7 Coty Sutherland 2020-07-20 18:20:27 UTC
This issue is caused by changes made to the OSGi manifests upstream so that the packages aren't included in the tomcat-coyote.jar. The commit is https://github.com/apache/tomcat/commit/393c022c87e5cbebf1b96c3e1e7aa3b2ab4d5b74#diff-84aafb5950b47b3aa681bdf5cd1ebb6e. If the issue isn't resolved upstream I can produce a patch that should resolve the issue.

Comment 8 Adam Williamson 2020-07-20 18:25:55 UTC
aha, I think that one was in my "things I wanted to try reverting but which didn't revert cleanly" bucket :) thanks! A patch would definitely be appreciated if there isn't a prompt upstream reversion, we would like this fixed in Rawhide ASAP.

Comment 9 Adam Williamson 2020-07-20 18:26:12 UTC
er, I meant 'upstream resolution', not 'upstream reversion'.

Comment 10 Geoffrey Marr 2020-07-20 19:01:17 UTC
Discussed during the 2020-07-20 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following Basic criterion:

"It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages."

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-07-20/f33-blocker-review.2020-07-20-16.18.txt

Comment 11 Fedora Update System 2020-07-20 20:17:48 UTC
FEDORA-2020-f897a68801 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 Adam Williamson 2020-07-21 18:25:06 UTC
Looks like it's not entirely fixed, we get different class not found errors now:

Jul 21 14:19:14 localhost pki-server[29251]: [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
Jul 21 14:19:14 localhost pki-server[29251]: Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory
Jul 21 14:19:18 localhost server[29370]: Caused by: java.lang.ClassNotFoundException: org.apache.tomcat.util.http.fileupload.impl.InvalidContentTypeException

Comment 13 Adam Williamson 2020-07-21 18:25:49 UTC
Created attachment 1701978 [details]
/var/log tarball from failed test (updated 2020-07-21, after initial fix)

sorry, that was an extract:

[adamw@adam tmp]$ grep ClassNotFound var/log/messages 
Jul 21 14:19:14 localhost pki-server[29251]: [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
Jul 21 14:19:14 localhost pki-server[29251]: Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory
Jul 21 14:19:18 localhost server[29370]: Caused by: java.lang.ClassNotFoundException: org.apache.tomcat.util.http.fileupload.impl.InvalidContentTypeException

I'm attaching the full log.

Comment 14 Coty Sutherland 2020-07-21 20:21:29 UTC
I verified that the classes aren't present in the when I build while having the aqute-bnd and aqute-bndlib packages installed from Fedora (version 3.5), however when I remove those and use the aqute-bnd version (5.1) that Tomcat downloads, they are present which leads me to believe that the issue is caused by some bug in the older version. I saw that there's a PR to update Fedora's bnd library to 4.3 (https://src.fedoraproject.org/rpms/aqute-bnd/pull-request/6) and that version seems to fix the issue for me locally :)

Comment 15 Adam Williamson 2020-07-21 20:36:00 UTC
Unfortunately that PR isn't mergeable right now...will ping Fabio to rebase it...thanks.

Comment 16 Coty Sutherland 2020-07-21 20:47:47 UTC
Yeah, I looked at the Tomcat build failure and noted that there are no classes in the biz.aQute.bnd.jar, just the META-INF...maybe the suspected OSGi bug got them too! I commented on the PR, but given the age of the build maybe we should look for an alternative to address this particular problem? I could just not run the addOSGi step in the ant tasks that build the Tomcat jars temporarily until we can consume the upgrade. The OSGi step was only added a few releases ago, so it shouldn't break anything. Thoughts?

Comment 17 Adam Williamson 2020-07-21 21:51:56 UTC
My thoughts, I'm afraid, are "I don't know anywhere near enough about Java to have any useful thoughts" :)

Comment 18 Fedora Update System 2020-07-24 13:38:09 UTC
FEDORA-2020-3c8c0646b9 has been pushed to the Fedora ELN stable repository.
If problem still persists, please make note of it in this bug report.

Comment 19 Jan Pazdziora 2020-07-24 17:28:03 UTC
Reopening since there is no reason to believe that the ELN build fixes the problem in any way better than the rawhide build.

Comment 20 Coty Sutherland 2020-07-24 18:09:36 UTC
FYI I submitted a new build earlier (tomcat-9.0.37-3) which should contain all the classes that were there before the update. I removed all of the OSGi metadata steps in ant so that it doesn't remove any classes. If the aqute-bnd update to 4.3 is successful, then I can add the OSGi step back in. Otherwise I'll have to manually add the OSGi metadata back to the jars if someone is using them and the latest build breaks their install.

Comment 21 Adam Williamson 2020-07-24 19:01:36 UTC
awesome, thanks for the work, I appreciate you taking the trouble.

Comment 22 Coty Sutherland 2020-07-24 20:22:29 UTC
No problem. If you can confirm when the issue is resolved then I'll port the changes back to fc32. Let's hope that nobody is utilizing the jars as OSGi bundles :)

Comment 23 Adam Williamson 2020-07-24 20:50:41 UTC
Well...we can wait for the next Rawhide compose, but Java 11 has been merged now, so we might be broken in other ways. If so, I'll see if I can figure out another way to test.

Actually, easiest thing to do might be to test an F32 scratch build - I have a mechanism for doing that.

Comment 24 Adam Williamson 2020-08-06 00:08:45 UTC
So yeah, more or less what I predicted seems to have come to pass - deployment is now failing on a thing that looks like incompatibility with Java 11:

https://bugzilla.redhat.com/show_bug.cgi?id=1866570

so basically we can assume this is fixed and close it, or wait till the Java 11 incompatibility is resolved somehow to make sure.

Comment 25 Adam Williamson 2020-08-07 21:22:59 UTC
Hum. Current F32 -> Rawhide upgrade tests are still showing this error when booted after upgrade:

Aug 07 12:20:31 ipa001.domain.local pki-server[1298]: [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
Aug 07 12:20:31 ipa001.domain.local pki-server[1298]: Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory

full log tarball: https://openqa.fedoraproject.org/tests/636836/file/role_deploy_domain_controller_check-var_log.tar.gz

Comment 26 Ben Cotton 2020-08-11 15:21:48 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 27 Adam Williamson 2020-08-24 17:19:23 UTC
OK, openQA tests show FreeIPA deployment works on current F33 composes, so this is definitely resolved. Closing.