Description of problem: OpenJDK 8, 11, 17 builds started to fail on rawhide x86_64 when they hit a virtualized builder (RHEL 8.5 with RHEL 8.5 qemu-kvm/libvirt on the host; F34 as the guest VM). See also this fedora-infra issue: https://pagure.io/fedora-infrastructure/issue/10348 Version-Release number of selected component (if applicable): # rpm -q libvirt libvirt-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64 How reproducible: 100% if run on a virtualized VM builder. Steps to Reproduce: 1. Set up a RHEL 8.5 host on an Intel Xeon Gold (Cascadelake) 2. Install virt tools for hosting VMs (from RHEL 8.5) 3. Install F34 in a guest VM 4. Try to do a mock build of java-1.8.0-openjdk for rawhide on the F34 guest VM (fedpkg clone -a java-1.8.0-openjdk && cd java-1.8.0-openjdk && fedpkg mockbuild --no-cleanup-after) Actual results: Random, cryptic build failures. Some example(s): ERROR: compileproperties: IO error writing to file /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/jdk/gensrc/sun/util/resources/cs/LocaleNames_cs.java EXCEPTION: java.io.FileNotFoundException: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/jdk/gensrc/sun/util/resources/cs/LocaleNames_cs.java (No such file or directory) java.io.FileNotFoundException: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/jdk/gensrc/sun/util/resources/cs/LocaleNames_cs.java (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:101) at build.tools.compileproperties.CompileProperties.createFile(CompileProperties.java:269) at build.tools.compileproperties.CompileProperties.main(CompileProperties.java:195) and/or: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/make/linux/makefiles/rules.make:149: Building os_perf_linux.o (from /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/os/linux/vm/os_perf_linux.cpp) (/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/os/linux/vm/os_perf_linux.cpp newer) gmake[6]: *** No rule to make target 's.os_posix.cpp', needed by 'os_posix.cpp'. Stop. gmake[6]: *** Waiting for unfinished jobs.... Expected results: Successful build of rawhide in mock. Additional info: It doesn't reproduce on a Fedora host (F34) and Fedora 34 guest. It doesn't seem to be qemu related (of the host). It doesn't reproduce when the build happens on a physical machine (not virtualized). Failures have been seen with qemu-kvm 6.0.0 and 4.2.0 It might be xfs or libvirt storage related as the failures seem to change and are about files not being present where they should be there.
*** Bug 2026398 has been marked as a duplicate of this bug. ***
It's unlikely that this would be caused by libvirt. But would you please attach the guest XML, libvirt debug logs (https://libvirt.org/kbase/debuglogs.html), possibly also the domain log, usually located in /var/log/libvirt/qemu directory? It would be good to narrow down the reproducer to some simple use case. For example you indicated that it might be related to xfs. Have you tried some xfstests tool? Or other fs stress tool? Thanks.
(In reply to Jaroslav Suchanek from comment #3) > It's unlikely that this would be caused by libvirt. We are only able to reproduce in a virtualized setup. The one described in comment 0. So if not reproducible on a physical machine, where should we be looking? > But would you please > attach the guest XML, libvirt debug logs > (https://libvirt.org/kbase/debuglogs.html), possibly also the domain log, > usually located in /var/log/libvirt/qemu directory? OK, thanks. I'll gather this info. > It would be good to narrow down the reproducer to some simple use case. For > example you indicated that it might be related to xfs. Have you tried some > xfstests tool? Or other fs stress tool? What exactly do you have in mind? I know little about those.
Created attachment 1843456 [details] f34 guest xml
Created attachment 1843457 [details] F34 guest qemu cmd log
Created attachment 1843461 [details] libvirt debug log
Note that libvirt doesn't do much besides instructing qemu to use the disk image that is configured in the XML in terms of storage handling. Also it's not really clear from this BZ or the linked pagure issue where multiple build failures are linked what the underlying issue is. Namely I wasn't able to find the JAVA error reported above (some of the logs are no longer present), and I've seen only gmake errors or some internal test errors: Examples: https://kojipkgs.fedoraproject.org//work/tasks/7748/78757748/build.log gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/gensrc/adfiles/ad_x86_gen.cpp', needed by '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/_build-info.marker'. Stop. In the above example it's a make error. The missing file seems to be generated by an internal tool, unfortunately there's nothing which would hint why. On other platforms it seems to be generating other platform specific code and that works. Note that other platforms seem to be using virt as well for build. https://kojipkgs.fedoraproject.org//work/tasks/7368/78877368/build.log gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/classLoaderDataGraph.d'. Stop. https://kojipkgs.fedoraproject.org//work/tasks/6045/78836045/build.log gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/classLoaderDataGraph.d'. Stop. Both are the same, I wasn't able to find any mention of error regarding to that file. https://kojipkgs.fedoraproject.org//work/tasks/7925/78767925/build.log /build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/classes/_the.BUILD_JDK_batch.tmp /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:1: error: illegal character: '#' # This properties file is used to create a PropertyResourceBundle ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:1: error: class, interface, or enum expected # This properties file is used to create a PropertyResourceBundle ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:2: error: illegal character: '#' # It contains Locale specific strings used be the Synth Look and Feel. ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:3: error: illegal character: '#' # Currently, the following components need this for support: ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:4: error: illegal character: '#' # ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:5: error: illegal character: '#' # FileChooser ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:6: error: illegal character: '#' # ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:7: error: illegal character: '#' # When this file is read in, the strings are put into the ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:8: error: illegal character: '#' # defaults table. This is an implementation detail of the current ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:9: error: illegal character: '#' # workings of Swing. DO NOT DEPEND ON THIS. ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:10: error: illegal character: '#' # This may change in future versions of Swing as we improve localization ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:11: error: illegal character: '#' # support. ^ /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:12: error: illegal character: '#' # This one seems to be some bug in input files? Either way, neither the linked cases nor the description in this bug have anything which would even hint to a virtualization problem here. Even if there is something wrong in the storage layer in virt, it's at block level, so it wouldn't impact individual files vanishing but rather block errors which are unlikely to manifest without filesystem corruption. For now it IMO makes no sense to reasign this to any other component in the virt layer until it's clear what the root of the problem is and this BZ is not doing that sufficiently. In case you'll encounter a filesystem or block error hint e.g. in the guest's kernel log or anything which wouldn't be also explainable by a build system failure, please attach it here. Please also reassign this BZ to the JDK component for now at least until the root cause is know.
(In reply to Peter Krempa from comment #8) > Note that libvirt doesn't do much besides instructing qemu to use the disk > image that is configured in the XML in terms of storage handling. > > Also it's not really clear from this BZ or the linked pagure issue where > multiple build failures are linked what the underlying issue is. Your guess is as good as mine as to what the underlying issue is. Either way, we - the JDK team - would need to figure it out. Very little to go on by, though. On the JDK side, nothing changed. See also this for a bit of history: https://koschei.fedoraproject.org/package/java-1.8.0-openjdk?collection=f36 Seems like since the update of builders (Nov 8) it fails on virtualized build VMs. > Namely I > wasn't able to find the JAVA error reported above (some of the logs are no > longer present), and I've seen only gmake errors or some internal test > errors: Right, the thing is failures are fairly random. They don't necessarily look alike from one affected system to the next. The only consistency was build failure :-/ > > Examples: > > > https://kojipkgs.fedoraproject.org//work/tasks/7748/78757748/build.log > > gmake[3]: *** No rule to make target > '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/ > build/jdk17.build-slowdebug-main/hotspot/variant-server/gensrc/adfiles/ > ad_x86_gen.cpp', needed by > '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/ > build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/_build- > info.marker'. Stop. > > In the above example it's a make error. The missing file seems to be > generated by an internal tool, unfortunately there's nothing which would > hint why. On other platforms it seems to be generating other platform > specific code and that works. Note that other platforms seem to be using > virt as well for build. > > https://kojipkgs.fedoraproject.org//work/tasks/7368/78877368/build.log > > gmake[3]: *** No rule to make target > '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/ > build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/ > classLoaderDataGraph.d'. Stop. > > https://kojipkgs.fedoraproject.org//work/tasks/6045/78836045/build.log > > gmake[3]: *** No rule to make target > '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/ > build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/ > classLoaderDataGraph.d'. Stop. > > Both are the same, I wasn't able to find any mention of error regarding to > that file. > > > > > https://kojipkgs.fedoraproject.org//work/tasks/7925/78767925/build.log > > /build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8. > build-slowdebug/jdk/classes/_the.BUILD_JDK_batch.tmp > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:1: error: illegal character: '#' > # This properties file is used to create a PropertyResourceBundle > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:1: error: class, interface, or enum expected > # This properties file is used to create a PropertyResourceBundle > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:2: error: illegal character: '#' > # It contains Locale specific strings used be the Synth Look and Feel. > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:3: error: illegal character: '#' > # Currently, the following components need this for support: > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:4: error: illegal character: '#' > # > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:5: error: illegal character: '#' > # FileChooser > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:6: error: illegal character: '#' > # > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:7: error: illegal character: '#' > # When this file is read in, the strings are put into the > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:8: error: illegal character: '#' > # defaults table. This is an implementation detail of the current > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:9: error: illegal character: '#' > # workings of Swing. DO NOT DEPEND ON THIS. > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:10: error: illegal character: '#' > # This may change in future versions of Swing as we improve localization > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:11: error: illegal character: '#' > # support. > ^ > /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/ > jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/ > metal_zh_HK.java:12: error: illegal character: '#' > # > > This one seems to be some bug in input files? > > Either way, neither the linked cases nor the description in this bug have > anything which would even hint to a virtualization problem here. Even if > there is something wrong in the storage layer in virt, it's at block level, > so it wouldn't impact individual files vanishing but rather block errors > which are unlikely to manifest without filesystem corruption. > > For now it IMO makes no sense to reasign this to any other component in the > virt layer until it's clear what the root of the problem is and this BZ is > not doing that sufficiently. > > In case you'll encounter a filesystem or block error hint e.g. in the > guest's kernel log or anything which wouldn't be also explainable by a build > system failure, please attach it here. > > Please also reassign this BZ to the JDK component for now at least until the > root cause is know. We were unable to reproduce in the following environments: - Physical host, perform build on x86_64 (via 'fedpkg mockbuild`) - Virtualized environment with: Host F34, guest F34, mockbuild in guest F34 VM. We were able to reproduce in the following environment: - Virtualized environment with: Host RHEL 8.5, guest F34, mockbuild in guest F34 VM. https://pagure.io/fedora-infrastructure/issue/10348#comment-762507 mentions Koji builders got updated on November 8. After consultation with Fedora infra folks, I was told builders updated to RHEL 8.5. This is when the failures started happening for us. Also, for java-1.8.0-openjdk we've observed a build fail (when we happened to get a virtualized builder in koji): https://koji.fedoraproject.org/koji/taskinfo?taskID=78504433 A more recent one is: https://koji.fedoraproject.org/koji/taskinfo?taskID=79226399 The actual error for this was: + sed 's/\(separated by \)[;:]/\1:/g' /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/share/vm/Xusage.txt Makefile:576: Building /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt (from /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/share/vm/Xusage.txt) (/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/share/vm/Xusage.txt newer) mv /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt.temp /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt + mv /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt.temp /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/libjvm.so', needed by 'generic_export'. Stop. gmake[3]: Leaving directory '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/make' gmake[2]: *** [Makefile:300: export_debug] Error 2 gmake[2]: Leaving directory '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/make' gmake[1]: *** [HotspotWrapper.gmk:45: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/_hotspot.timestamp] Error 2 gmake[1]: Leaving directory '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/make' make: *** [/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk//make/Main.gmk:110: hotspot-only] Error 2 The exact same NVR passed when built on a physical machine: https://koji.fedoraproject.org/koji/buildinfo?buildID=1853322 The thing is, failures are absolutely random. It's different from JDK version to JDK version, all of them rather cryptic and unclear why they're happening. They don't happen on physical machines. My hope was people from the virtualization team could help us narrow it down... I have *very* little experience debugging virtualization problems.
So a few items to note: Fedora builders are all Fedora 34. Some of them are vm's, some of them are bare hardware nodes. All of them are F34. The builder nodes that are vm's are mostly running with RHEL8.5 as the L0 hypervisor (in the case of x86_64, ppc64le, aarch64 and armv7) with just s390x using a Fedora 35 L0. On Nov 8th/9th we upgraded the RHEL hypervisors to 8.5 and applied other updates to the f34 builders. This sure sounds like a RHEL8.5 kernel or qemu bug. Currently we are using the normal rhel8.5 qemu... it was suggested we should use the virt specific one instead. I am hoping to work on that soon. I don't know if that will affect/fix this issue or not. Also soon we are planning on moving builders to F35. I don't know if that will affect this issue or not. This will happen in the next week or so. Happy to let you know when I make changes so you can try or gather more info from the L0 / L1 instances. But also see related bug that already has most of this info: https://bugzilla.redhat.com/show_bug.cgi?id=2022075
(In reply to Kevin Fenzi from comment #10) > Happy to let you know when I make changes so you can try or gather more info > from the L0 / L1 instances. Yes, please!
Whatever the bug is, it looks like it only triggers with glibc >= 2.34.9000-15.fc36 in the L1 VMs mock. I'll try to confirm this, but something in newer glibc seems to trigger it. So this gets even more weird. At least that seems to explain why it would not fail builds for F35 on same virtualized builders (RHEL 8.5 hosts with F34 VMs). Question is how to trigger a build on koji.stg.fedoraproject.org which, so I'm told are still on the old setup. That would clarify whether this is a glibc issue, or a virt issue. It appears to be an issue on the virtualization layer somehow though, as newer glibc alone isn't enough to trigger the issue (builds pass on physical machines with newer glibc).
I'm not sure what "old setup" you mean? staging is setup just like prod... although we are starting working on deploying f35 builders there to test things out before rolling that to prod now.
ok. I have updated qemu on all the x86 builder hosts and rebooted them. All of buildvm-x86-* should now be running with the qemu from advanced virt. Did you see this issue only on x86_64? or was it also on other arches?
(In reply to Kevin Fenzi from comment #14) > ok. I have updated qemu on all the x86 builder hosts and rebooted them. All > of buildvm-x86-* should now be running with the qemu from advanced virt. > > Did you see this issue only on x86_64? or was it also on other arches? x86_64 & rawhide on the Cascadelake VM setup only. This scratch just failed: https://koji.fedoraproject.org/koji/taskinfo?taskID=79491184 The exact same package succeeded on non-VM hardware on rawhide a while ago: https://koji.fedoraproject.org/koji/buildinfo?buildID=1853322 As Severin says, it seems to only happen when we have the perfect storm of this virtualised hardware and the newer rawhide buildroot. The same packages are building fine for F35 (where we've had to effectively move our work to for now).
(In reply to Kevin Fenzi from comment #13) > I'm not sure what "old setup" you mean? staging is setup just like prod... > although we are starting working on deploying f35 builders there to test > things out before rolling that to prod now. Mikolaj told me that koji.stg.fedoraproject.org is still on RHEL 8.4 (doesn't have the November 8/9 update yet). Trying to build on that setup as we speak...
(In reply to Severin Gehwolf from comment #16) > (In reply to Kevin Fenzi from comment #13) > > I'm not sure what "old setup" you mean? staging is setup just like prod... > > although we are starting working on deploying f35 builders there to test > > things out before rolling that to prod now. > > Mikolaj told me that koji.stg.fedoraproject.org is still on RHEL 8.4 > (doesn't have the November 8/9 update yet). Trying to build on that setup as > we speak... Unfortunately, it won't let me log in there for some reason (my Fedora user is 'jerboaa').
<mock-chroot> sh-5.1# rpm -qa | grep glibc glibc-common-2.34.9000-15.fc36.x86_64 glibc-gconv-extra-2.34.9000-15.fc36.x86_64 glibc-minimal-langpack-2.34.9000-15.fc36.x86_64 glibc-2.34.9000-15.fc36.x86_64 glibc-headers-x86-2.34.9000-15.fc36.noarch glibc-devel-2.34.9000-15.fc36.x86_64 With these glibc packages in a custom mock java-1.8.0-openjdk builds fine on this virt-setup, while with 2.34.9000-16 and better it fails. Question is what changed in 2.34.9000-16 that might trigger this bug on cascadelake virtualized systems.
Upstream commits that went into glibc-2.34.9000-16: - Auto-sync with upstream branch master, commit 79d0fc65395716c1d95931064c7bf37852203c66. - benchtests: Add acosf function to bench-math - benchtests: Improve bench-memcpy-random - Disable -Waggressive-loop-optimizations warnings in tst-dynarray.c - Fix compiler issue with mmap_internal - Check if linker also support -mtls-dialect=gnu2 - Fix LIBC_PROG_BINUTILS for -fuse-ld=lld - elf: Disable ifuncmain{1,5,5pic,5pie} when using LLD - Handle NULL input to malloc_usable_size [BZ #28506] - x86_64: Add memcmpeq.S to fix disable-multi-arch build - login: Add back libutil as an empty library - riscv: Fix incorrect jal with HIDDEN_JUMPTARGET - x86_64: Add evex optimized __memcmpeq in memcmpeq-evex.S - x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.S - x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.S - x86_64: Add support for __memcmpeq using sse2, avx2, and evex - Benchtests: Add benchtests for __memcmpeq - String: Add __memcmpeq as build target - NEWS: Add item for __memcmpeq - String: Add tests for __memcmpeq - String: Add hidden defs for __memcmpeq() to enable internal usage - String: Add support for __memcmpeq() ABI on all targets - configure: Don't check LD -v --help for LIBC_LINKER_FEATURE - elf: Make global.out depend on reldepmod4.so [BZ #28457] - x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S - bench-math: Sort and put each bench per line - x86_64: Add missing libmvec ABI tests - elf: Fix e6fd79f379 build with --enable-tunables=no - elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645) - elf: Testing infrastructure for ld.so DSO sorting (BZ #17645) - iconv: Use TIMEOUTFACTOR for iconv test timeout - posix: Remove alloca usage for internal fnmatch implementation - Add alloc_align attribute to memalign et al - linux: Fix a possibly non-constant expression in _Static_assert - x86-64: Add sysdeps/x86_64/fpu/Makeconfig Since it's x86-64-specific, I'd focus there. __memcmpeq is a new API, so it's not going to affect existing packages. That leaves “x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S” or perhaps a bug in the __memcmpeq integration (it's a new function, but maybe memcmp was changed inadvertently).
Based on the information above I'll reassign this to qemu for now as they will certainly have a better understanding what's going on.
Can we get /lib64/ld-linux-x86-64.so.2 --list-diagnostics output from the affected machine? I want to double-check which memcmp implementation is selected. Thanks.
(In reply to Florian Weimer from comment #21) > Can we get > > /lib64/ld-linux-x86-64.so.2 --list-diagnostics > > output from the affected machine? I want to double-check which memcmp > implementation is selected. Thanks. Sorry, to clarify, that has to be done in the guest, but it can be in a chroot (rawhide chroot would actually be preferred).
We're hitting very similar sounding problems in F36 rawhide with EDK2 builds, where we get bizarre/inexplicable errors from make when trying to resolve targets. make: *** No rule to make target '/builddir/build/BUILD/edk2-e1999b264f1f/Build/Ovmf3264/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/e_des.obj', needed by '/builddir/build/BUILD/edk2-e1999b264f1f/Build/Ovmf3264/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/OpensslLib.lib'. Stop. despite the rules for that target clearly existing when dumping the makefile contents and all pre-requisites existing too. Only reproduces for me inside Koji, never any local machine or VM I have access to, and only in the 6 CPU koji VMs. (In reply to Florian Weimer from comment #21) > Can we get > > /lib64/ld-linux-x86-64.so.2 --list-diagnostics > > output from the affected machine? I want to double-check which memcmp > implementation is selected. Thanks. On the assumption that my EDK2 issue is likely the same bug as seen with openjdk, I added this command to the edk2.spec %build. You can see the output in this build log from a failing build: https://kojipkgs.fedoraproject.org//work/tasks/2306/79512306/build.log
(In reply to Florian Weimer from comment #21) > Can we get > > /lib64/ld-linux-x86-64.so.2 --list-diagnostics > > output from the affected machine? I want to double-check which memcmp > implementation is selected. Thanks. Here you go (that's from a rawhide chroot on the guest F34 VM): dl_discover_osversion=0x50b0c dl_dst_lib="lib64" dl_hwcap=0x6 dl_hwcap_important=0x6 dl_hwcap2=0x2 dl_hwcaps_subdirs="x86-64-v4:x86-64-v3:x86-64-v2" dl_hwcaps_subdirs_active=0x7 dl_osversion=0x0 dl_pagesize=0x1000 dl_platform="haswell" dl_profile_output="/var/tmp" dl_string_platform=0x32 dso.ld="ld-linux-x86-64.so.2" dso.libc="libc.so.6" env_filtered[0x0]="SHELL" env_filtered[0x1]="HISTCONTROL" env_filtered[0x2]="HISTSIZE" env_filtered[0x3]="HOSTNAME" env_filtered[0x4]="container_host_version_id" env_filtered[0x5]="PWD" env_filtered[0x6]="LOGNAME" env_filtered[0x7]="container" env_filtered[0x8]="HOME" env[0x9]="LANG=C.UTF-8" env_filtered[0xa]="LS_COLORS" env_filtered[0xb]="PROMPT_COMMAND" env_filtered[0xc]="TERM" env_filtered[0xd]="USER" env_filtered[0xe]="NOTIFY_SOCKET" env_filtered[0xf]="SHLVL" env_filtered[0x10]="container_host_id" env_filtered[0x11]="PS1" env_filtered[0x12]="DEBUGINFOD_URLS" env_filtered[0x13]="which_declare" env_filtered[0x14]="container_host_variant_id" env[0x15]="PATH=/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin" env_filtered[0x16]="MAIL" env_filtered[0x17]="container_uuid" env_filtered[0x18]="BASH_FUNC_which%%" env_filtered[0x19]="_" path.prefix="/usr" path.rtld="/lib64/ld-linux-x86-64.so.2" path.sysconfdir="/etc" path.system_dirs[0x0]="/lib64/" path.system_dirs[0x1]="/usr/lib64/" version.release="development" version.version="2.34.9000" auxv[0x0].a_type=0x21 auxv[0x0].a_val=0x7ffe9f38b000 auxv[0x1].a_type=0x10 auxv[0x1].a_val=0xf8bfbff auxv[0x2].a_type=0x6 auxv[0x2].a_val=0x1000 auxv[0x3].a_type=0x11 auxv[0x3].a_val=0x64 auxv[0x4].a_type=0x3 auxv[0x4].a_val=0x7fc98a7a6040 auxv[0x5].a_type=0x4 auxv[0x5].a_val=0x38 auxv[0x6].a_type=0x5 auxv[0x6].a_val=0xb auxv[0x7].a_type=0x7 auxv[0x7].a_val=0x0 auxv[0x8].a_type=0x8 auxv[0x8].a_val=0x0 auxv[0x9].a_type=0x9 auxv[0x9].a_val=0x7fc98a7a7090 auxv[0xa].a_type=0xb auxv[0xa].a_val=0x0 auxv[0xb].a_type=0xc auxv[0xb].a_val=0x0 auxv[0xc].a_type=0xd auxv[0xc].a_val=0x0 auxv[0xd].a_type=0xe auxv[0xd].a_val=0x0 auxv[0xe].a_type=0x17 auxv[0xe].a_val=0x0 auxv[0xf].a_type=0x19 auxv[0xf].a_val=0x7ffe9f265c09 auxv[0x10].a_type=0x1a auxv[0x10].a_val=0x2 auxv[0x11].a_type=0x1f auxv[0x11].a_val="/lib64/ld-linux-x86-64.so.2" auxv[0x12].a_type=0xf auxv[0x12].a_val="x86_64" uname.sysname="Linux" uname.nodename="6fac65290c9e478583f88d92dee1be0c" uname.release="5.11.12-300.fc34.x86_64" uname.version="#1 SMP Wed Apr 7 16:31:13 UTC 2021" uname.machine="x86_64" uname.domainname="(none)" x86.cpu_features.basic.kind=0x1 x86.cpu_features.basic.max_cpuid=0xd x86.cpu_features.basic.family=0x6 x86.cpu_features.basic.model=0x55 x86.cpu_features.basic.stepping=0x4 x86.cpu_features.features[0x0].cpuid[0x0]=0x50654 x86.cpu_features.features[0x0].cpuid[0x1]=0x800 x86.cpu_features.features[0x0].cpuid[0x2]=0xfffab223 x86.cpu_features.features[0x0].cpuid[0x3]=0xf8bfbff x86.cpu_features.features[0x0].active[0x0]=0x0 x86.cpu_features.features[0x0].active[0x1]=0x0 x86.cpu_features.features[0x0].active[0x2]=0x7ed83203 x86.cpu_features.features[0x0].active[0x3]=0x7888110 x86.cpu_features.features[0x1].cpuid[0x0]=0x0 x86.cpu_features.features[0x1].cpuid[0x1]=0xd19f0fbb x86.cpu_features.features[0x1].cpuid[0x2]=0x1c x86.cpu_features.features[0x1].cpuid[0x3]=0xac000400 x86.cpu_features.features[0x1].active[0x0]=0x0 x86.cpu_features.features[0x1].active[0x1]=0xd18f0b38 x86.cpu_features.features[0x1].active[0x2]=0x18 x86.cpu_features.features[0x1].active[0x3]=0x0 x86.cpu_features.features[0x2].cpuid[0x0]=0x50654 x86.cpu_features.features[0x2].cpuid[0x1]=0x0 x86.cpu_features.features[0x2].cpuid[0x2]=0x121 x86.cpu_features.features[0x2].cpuid[0x3]=0x2c100800 x86.cpu_features.features[0x2].active[0x0]=0x0 x86.cpu_features.features[0x2].active[0x1]=0x0 x86.cpu_features.features[0x2].active[0x2]=0x121 x86.cpu_features.features[0x2].active[0x3]=0x8000000 x86.cpu_features.features[0x3].cpuid[0x0]=0xf x86.cpu_features.features[0x3].cpuid[0x1]=0x988 x86.cpu_features.features[0x3].cpuid[0x2]=0x0 x86.cpu_features.features[0x3].cpuid[0x3]=0x0 x86.cpu_features.features[0x3].active[0x0]=0x7 x86.cpu_features.features[0x3].active[0x1]=0x0 x86.cpu_features.features[0x3].active[0x2]=0x0 x86.cpu_features.features[0x3].active[0x3]=0x0 x86.cpu_features.features[0x4].cpuid[0x0]=0x0 x86.cpu_features.features[0x4].cpuid[0x1]=0x0 x86.cpu_features.features[0x4].cpuid[0x2]=0x0 x86.cpu_features.features[0x4].cpuid[0x3]=0x0 x86.cpu_features.features[0x4].active[0x0]=0x0 x86.cpu_features.features[0x4].active[0x1]=0x0 x86.cpu_features.features[0x4].active[0x2]=0x0 x86.cpu_features.features[0x4].active[0x3]=0x0 x86.cpu_features.features[0x5].cpuid[0x0]=0x302e x86.cpu_features.features[0x5].cpuid[0x1]=0x100d000 x86.cpu_features.features[0x5].cpuid[0x2]=0x0 x86.cpu_features.features[0x5].cpuid[0x3]=0x0 x86.cpu_features.features[0x5].active[0x0]=0x0 x86.cpu_features.features[0x5].active[0x1]=0x0 x86.cpu_features.features[0x5].active[0x2]=0x0 x86.cpu_features.features[0x5].active[0x3]=0x0 x86.cpu_features.features[0x6].cpuid[0x0]=0x0 x86.cpu_features.features[0x6].cpuid[0x1]=0x0 x86.cpu_features.features[0x6].cpuid[0x2]=0x0 x86.cpu_features.features[0x6].cpuid[0x3]=0x0 x86.cpu_features.features[0x6].active[0x0]=0x0 x86.cpu_features.features[0x6].active[0x1]=0x0 x86.cpu_features.features[0x6].active[0x2]=0x0 x86.cpu_features.features[0x6].active[0x3]=0x0 x86.cpu_features.features[0x7].cpuid[0x0]=0x0 x86.cpu_features.features[0x7].cpuid[0x1]=0x0 x86.cpu_features.features[0x7].cpuid[0x2]=0x0 x86.cpu_features.features[0x7].cpuid[0x3]=0x0 x86.cpu_features.features[0x7].active[0x0]=0x0 x86.cpu_features.features[0x7].active[0x1]=0x0 x86.cpu_features.features[0x7].active[0x2]=0x0 x86.cpu_features.features[0x7].active[0x3]=0x0 x86.cpu_features.features[0x8].cpuid[0x0]=0x0 x86.cpu_features.features[0x8].cpuid[0x1]=0x0 x86.cpu_features.features[0x8].cpuid[0x2]=0x0 x86.cpu_features.features[0x8].cpuid[0x3]=0x0 x86.cpu_features.features[0x8].active[0x0]=0x0 x86.cpu_features.features[0x8].active[0x1]=0x0 x86.cpu_features.features[0x8].active[0x2]=0x0 x86.cpu_features.features[0x8].active[0x3]=0x0 x86.cpu_features.preferred.Fast_Rep_String=0x1 x86.cpu_features.preferred.Fast_Copy_Backward=0x0 x86.cpu_features.preferred.Slow_BSF=0x0 x86.cpu_features.preferred.Fast_Unaligned_Load=0x1 x86.cpu_features.preferred.Prefer_PMINUB_for_stringop=0x1 x86.cpu_features.preferred.Fast_Unaligned_Copy=0x1 x86.cpu_features.preferred.I586=0x1 x86.cpu_features.preferred.I686=0x1 x86.cpu_features.preferred.Slow_SSE4_2=0x0 x86.cpu_features.preferred.AVX_Fast_Unaligned_Load=0x1 x86.cpu_features.preferred.Prefer_MAP_32BIT_EXEC=0x0 x86.cpu_features.preferred.Prefer_No_VZEROUPPER=0x1 x86.cpu_features.preferred.Prefer_ERMS=0x0 x86.cpu_features.preferred.Prefer_No_AVX512=0x1 x86.cpu_features.preferred.MathVec_Prefer_No_AVX512=0x0 x86.cpu_features.preferred.Prefer_FSRM=0x0 x86.cpu_features.preferred.Avoid_Short_Distance_REP_MOVSB=0x0 x86.cpu_features.isa_1=0xf x86.cpu_features.xsave_state_size=0x9c0 x86.cpu_features.xsave_state_full_size=0xb00 x86.cpu_features.data_cache_size=0x8000 x86.cpu_features.shared_cache_size=0x1000000 x86.cpu_features.non_temporal_threshold=0xc00000 x86.cpu_features.rep_movsb_threshold=0x2000 x86.cpu_features.rep_movsb_stop_threshold=0xc00000 x86.cpu_features.rep_stosb_threshold=0x800 x86.cpu_features.level1_icache_size=0x8000 x86.cpu_features.level1_icache_linesize=0x40 x86.cpu_features.level1_dcache_size=0x8000 x86.cpu_features.level1_dcache_assoc=0x8 x86.cpu_features.level1_dcache_linesize=0x40 x86.cpu_features.level2_cache_size=0x200000 x86.cpu_features.level2_cache_assoc=0x8 x86.cpu_features.level2_cache_linesize=0x40 x86.cpu_features.level3_cache_size=0x1000000 x86.cpu_features.level3_cache_assoc=0x10 x86.cpu_features.level3_cache_linesize=0x40 x86.cpu_features.level4_cache_size=0x0 That's glibc glibc-2.34.9000-21.fc36.x86_64
Since my info is from a custom setup where we "seem" to reproduce, here is the --list-diagnostics output from Daniel's rawhide build. Pasting here as the build log might disappear: + /lib64/ld-linux-x86-64.so.2 --list-diagnostics dl_discover_osversion=0x50f04 dl_dst_lib="lib64" dl_hwcap=0x6 dl_hwcap_important=0x6 dl_hwcap2=0x2 dl_hwcaps_subdirs="x86-64-v4:x86-64-v3:x86-64-v2" dl_hwcaps_subdirs_active=0x7 dl_osversion=0x0 dl_pagesize=0x1000 dl_platform="haswell" dl_profile_output="/var/tmp" dl_string_platform=0x32 dso.ld="ld-linux-x86-64.so.2" dso.libc="libc.so.6" env_filtered[0x0]="SHELL" env_filtered[0x1]="RPM_SOURCE_DIR" env_filtered[0x2]="HISTCONTROL" env_filtered[0x3]="PKG_CONFIG_PATH" env_filtered[0x4]="HOSTNAME" env_filtered[0x5]="HISTSIZE" env_filtered[0x6]="PWD" env_filtered[0x7]="SOURCE_DATE_EPOCH" env_filtered[0x8]="LOGNAME" env_filtered[0x9]="RPM_ARCH" env_filtered[0xa]="HOME" env[0xb]="LANG=C" env_filtered[0xc]="RPM_LD_FLAGS" env_filtered[0xd]="PROMPT_COMMAND" env_filtered[0xe]="RPM_PACKAGE_RELEASE" env_filtered[0xf]="RPM_OS" env_filtered[0x10]="TERM" env_filtered[0x11]="LESSOPEN" env_filtered[0x12]="USER" env_filtered[0x13]="SHLVL" env_filtered[0x14]="RPM_BUILD_DIR" env_filtered[0x15]="RPM_OPT_FLAGS" env_filtered[0x16]="RPM_DOC_DIR" env_filtered[0x17]="RPM_PACKAGE_VERSION" env_filtered[0x18]="DEBUGINFOD_URLS" env_filtered[0x19]="which_declare" env_filtered[0x1a]="CONFIG_SITE" env[0x1b]="PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin" env_filtered[0x1c]="MAIL" env_filtered[0x1d]="RPM_BUILD_NCPUS" env_filtered[0x1e]="RPM_PACKAGE_NAME" env_filtered[0x1f]="RPM_BUILD_ROOT" env_filtered[0x20]="OLDPWD" env_filtered[0x21]="BASH_FUNC_which%%" env_filtered[0x22]="_" path.prefix="/usr" path.rtld="/lib64/ld-linux-x86-64.so.2" path.sysconfdir="/etc" path.system_dirs[0x0]="/lib64/" path.system_dirs[0x1]="/usr/lib64/" version.release="development" version.version="2.34.9000" auxv[0x0].a_type=0x21 auxv[0x0].a_val=0x7ffefe905000 auxv[0x1].a_type=0x33 auxv[0x1].a_val=0xe30 auxv[0x2].a_type=0x10 auxv[0x2].a_val=0xf8bfbff auxv[0x3].a_type=0x6 auxv[0x3].a_val=0x1000 auxv[0x4].a_type=0x11 auxv[0x4].a_val=0x64 auxv[0x5].a_type=0x3 auxv[0x5].a_val=0x7f5fcc3f5040 auxv[0x6].a_type=0x4 auxv[0x6].a_val=0x38 auxv[0x7].a_type=0x5 auxv[0x7].a_val=0xb auxv[0x8].a_type=0x7 auxv[0x8].a_val=0x0 auxv[0x9].a_type=0x8 auxv[0x9].a_val=0x0 auxv[0xa].a_type=0x9 auxv[0xa].a_val=0x7f5fcc3f6090 auxv[0xb].a_type=0xb auxv[0xb].a_val=0x3e8 auxv[0xc].a_type=0xc auxv[0xc].a_val=0x3e8 auxv[0xd].a_type=0xd auxv[0xd].a_val=0x1a9 auxv[0xe].a_type=0xe auxv[0xe].a_val=0x1a9 auxv[0xf].a_type=0x17 auxv[0xf].a_val=0x0 auxv[0x10].a_type=0x19 auxv[0x10].a_val=0x7ffefe8b5309 auxv[0x11].a_type=0x1a auxv[0x11].a_val=0x2 auxv[0x12].a_type=0x1f auxv[0x12].a_val="/lib64/ld-linux-x86-64.so.2" auxv[0x13].a_type=0xf auxv[0x13].a_val="x86_64" uname.sysname="Linux" uname.nodename="buildvm-x86-31.iad2.fedoraproject.org" uname.release="5.15.4-101.fc34.x86_64" uname.version="#1 SMP Tue Nov 23 18:58:50 UTC 2021" uname.machine="x86_64" uname.domainname="fedoraproject.org" x86.cpu_features.basic.kind=0x1 x86.cpu_features.basic.max_cpuid=0xd x86.cpu_features.basic.family=0x6 x86.cpu_features.basic.model=0x55 x86.cpu_features.basic.stepping=0x6 x86.cpu_features.features[0x0].cpuid[0x0]=0x50656 x86.cpu_features.features[0x0].cpuid[0x1]=0x5000800 x86.cpu_features.features[0x0].cpuid[0x2]=0xfffa3223 x86.cpu_features.features[0x0].cpuid[0x3]=0xf8bfbff x86.cpu_features.features[0x0].active[0x0]=0x0 x86.cpu_features.features[0x0].active[0x1]=0x0 x86.cpu_features.features[0x0].active[0x2]=0x7ed83203 x86.cpu_features.features[0x0].active[0x3]=0x7888110 x86.cpu_features.features[0x1].cpuid[0x0]=0x0 x86.cpu_features.features[0x1].cpuid[0x1]=0xd19f07ab x86.cpu_features.features[0x1].cpuid[0x2]=0x81c x86.cpu_features.features[0x1].cpuid[0x3]=0xac000400 x86.cpu_features.features[0x1].active[0x0]=0x0 x86.cpu_features.features[0x1].active[0x1]=0xd18f0328 x86.cpu_features.features[0x1].active[0x2]=0x818 x86.cpu_features.features[0x1].active[0x3]=0x0 x86.cpu_features.features[0x2].cpuid[0x0]=0x50656 x86.cpu_features.features[0x2].cpuid[0x1]=0x0 x86.cpu_features.features[0x2].cpuid[0x2]=0x121 x86.cpu_features.features[0x2].cpuid[0x3]=0x2c100800 x86.cpu_features.features[0x2].active[0x0]=0x0 x86.cpu_features.features[0x2].active[0x1]=0x0 x86.cpu_features.features[0x2].active[0x2]=0x121 x86.cpu_features.features[0x2].active[0x3]=0x8000000 x86.cpu_features.features[0x3].cpuid[0x0]=0xf x86.cpu_features.features[0x3].cpuid[0x1]=0x988 x86.cpu_features.features[0x3].cpuid[0x2]=0x0 x86.cpu_features.features[0x3].cpuid[0x3]=0x0 x86.cpu_features.features[0x3].active[0x0]=0x7 x86.cpu_features.features[0x3].active[0x1]=0x0 x86.cpu_features.features[0x3].active[0x2]=0x0 x86.cpu_features.features[0x3].active[0x3]=0x0 x86.cpu_features.features[0x4].cpuid[0x0]=0x0 x86.cpu_features.features[0x4].cpuid[0x1]=0x0 x86.cpu_features.features[0x4].cpuid[0x2]=0x0 x86.cpu_features.features[0x4].cpuid[0x3]=0x0 x86.cpu_features.features[0x4].active[0x0]=0x0 x86.cpu_features.features[0x4].active[0x1]=0x0 x86.cpu_features.features[0x4].active[0x2]=0x0 x86.cpu_features.features[0x4].active[0x3]=0x0 x86.cpu_features.features[0x5].cpuid[0x0]=0x302e x86.cpu_features.features[0x5].cpuid[0x1]=0x1009000 x86.cpu_features.features[0x5].cpuid[0x2]=0x0 x86.cpu_features.features[0x5].cpuid[0x3]=0x0 x86.cpu_features.features[0x5].active[0x0]=0x0 x86.cpu_features.features[0x5].active[0x1]=0x0 x86.cpu_features.features[0x5].active[0x2]=0x0 x86.cpu_features.features[0x5].active[0x3]=0x0 x86.cpu_features.features[0x6].cpuid[0x0]=0x0 x86.cpu_features.features[0x6].cpuid[0x1]=0x0 x86.cpu_features.features[0x6].cpuid[0x2]=0x0 x86.cpu_features.features[0x6].cpuid[0x3]=0x0 x86.cpu_features.features[0x6].active[0x0]=0x0 x86.cpu_features.features[0x6].active[0x1]=0x0 x86.cpu_features.features[0x6].active[0x2]=0x0 x86.cpu_features.features[0x6].active[0x3]=0x0 x86.cpu_features.features[0x7].cpuid[0x0]=0x0 x86.cpu_features.features[0x7].cpuid[0x1]=0x0 x86.cpu_features.features[0x7].cpuid[0x2]=0x0 x86.cpu_features.features[0x7].cpuid[0x3]=0x0 x86.cpu_features.features[0x7].active[0x0]=0x0 x86.cpu_features.features[0x7].active[0x1]=0x0 x86.cpu_features.features[0x7].active[0x2]=0x0 x86.cpu_features.features[0x7].active[0x3]=0x0 x86.cpu_features.features[0x8].cpuid[0x0]=0x0 x86.cpu_features.features[0x8].cpuid[0x1]=0x0 x86.cpu_features.features[0x8].cpuid[0x2]=0x0 x86.cpu_features.features[0x8].cpuid[0x3]=0x0 x86.cpu_features.features[0x8].active[0x0]=0x0 x86.cpu_features.features[0x8].active[0x1]=0x0 x86.cpu_features.features[0x8].active[0x2]=0x0 x86.cpu_features.features[0x8].active[0x3]=0x0 x86.cpu_features.preferred.Fast_Rep_String=0x1 x86.cpu_features.preferred.Fast_Copy_Backward=0x0 x86.cpu_features.preferred.Slow_BSF=0x0 x86.cpu_features.preferred.Fast_Unaligned_Load=0x1 x86.cpu_features.preferred.Prefer_PMINUB_for_stringop=0x1 x86.cpu_features.preferred.Fast_Unaligned_Copy=0x1 x86.cpu_features.preferred.I586=0x1 x86.cpu_features.preferred.I686=0x1 x86.cpu_features.preferred.Slow_SSE4_2=0x0 x86.cpu_features.preferred.AVX_Fast_Unaligned_Load=0x1 x86.cpu_features.preferred.Prefer_MAP_32BIT_EXEC=0x0 x86.cpu_features.preferred.Prefer_No_VZEROUPPER=0x0 x86.cpu_features.preferred.Prefer_ERMS=0x0 x86.cpu_features.preferred.Prefer_No_AVX512=0x1 x86.cpu_features.preferred.MathVec_Prefer_No_AVX512=0x0 x86.cpu_features.preferred.Prefer_FSRM=0x0 x86.cpu_features.preferred.Avoid_Short_Distance_REP_MOVSB=0x0 x86.cpu_features.isa_1=0xf x86.cpu_features.xsave_state_size=0x9c0 x86.cpu_features.xsave_state_full_size=0xb00 x86.cpu_features.data_cache_size=0x8000 x86.cpu_features.shared_cache_size=0x1000000 x86.cpu_features.non_temporal_threshold=0xc00000 x86.cpu_features.rep_movsb_threshold=0x2000 x86.cpu_features.rep_movsb_stop_threshold=0xc00000 x86.cpu_features.rep_stosb_threshold=0x800 x86.cpu_features.level1_icache_size=0x8000 x86.cpu_features.level1_icache_linesize=0x40 x86.cpu_features.level1_dcache_size=0x8000 x86.cpu_features.level1_dcache_assoc=0x8 x86.cpu_features.level1_dcache_linesize=0x40 x86.cpu_features.level2_cache_size=0x200000 x86.cpu_features.level2_cache_assoc=0x8 x86.cpu_features.level2_cache_linesize=0x40 x86.cpu_features.level3_cache_size=0x1000000 x86.cpu_features.level3_cache_assoc=0x10 x86.cpu_features.level3_cache_linesize=0x40 x86.cpu_features.level4_cache_size=0x0
In case it's useful. Here is a diff between '/lib64/ld-linux-x86-64.so.2 --list-diagnostics' of working glibc (2.34.9000-15.fc36) vs glibc where we see those issues (glibc_2.34.9000-21.fc36; release 16 and upwards, actually): $ diff -u list_diags_glibc* --- list_diags_glibc-2.34.9000-15.fc36.x86_64 2021-12-02 15:49:11.383652491 +0100 +++ list_diags_glibc_2.34.9000-21.fc36.x86_64.txt 2021-12-02 15:19:44.629731791 +0100 @@ -46,7 +46,7 @@ version.release="development" version.version="2.34.9000" auxv[0x0].a_type=0x21 -auxv[0x0].a_val=0x7ffca09a6000 +auxv[0x0].a_val=0x7ffe9f38b000 auxv[0x1].a_type=0x10 auxv[0x1].a_val=0xf8bfbff auxv[0x2].a_type=0x6 @@ -54,7 +54,7 @@ auxv[0x3].a_type=0x11 auxv[0x3].a_val=0x64 auxv[0x4].a_type=0x3 -auxv[0x4].a_val=0x7f0abe726040 +auxv[0x4].a_val=0x7fc98a7a6040 auxv[0x5].a_type=0x4 auxv[0x5].a_val=0x38 auxv[0x6].a_type=0x5 @@ -64,7 +64,7 @@ auxv[0x8].a_type=0x8 auxv[0x8].a_val=0x0 auxv[0x9].a_type=0x9 -auxv[0x9].a_val=0x7f0abe727090 +auxv[0x9].a_val=0x7fc98a7a7090 auxv[0xa].a_type=0xb auxv[0xa].a_val=0x0 auxv[0xb].a_type=0xc @@ -76,7 +76,7 @@ auxv[0xe].a_type=0x17 auxv[0xe].a_val=0x0 auxv[0xf].a_type=0x19 -auxv[0xf].a_val=0x7ffca0812059 +auxv[0xf].a_val=0x7ffe9f265c09 auxv[0x10].a_type=0x1a auxv[0x10].a_val=0x2 auxv[0x11].a_type=0x1f @@ -84,7 +84,7 @@ auxv[0x12].a_type=0xf auxv[0x12].a_val="x86_64" uname.sysname="Linux" -uname.nodename="53f4eb59030a49818a91e2e2add27c93" +uname.nodename="6fac65290c9e478583f88d92dee1be0c" uname.release="5.11.12-300.fc34.x86_64" uname.version="#1 SMP Wed Apr 7 16:31:13 UTC 2021" uname.machine="x86_64" @@ -96,7 +96,7 @@ x86.cpu_features.basic.stepping=0x4 x86.cpu_features.features[0x0].cpuid[0x0]=0x50654 x86.cpu_features.features[0x0].cpuid[0x1]=0x800 -x86.cpu_features.features[0x0].cpuid[0x2]=0xfffa3203 +x86.cpu_features.features[0x0].cpuid[0x2]=0xfffab223 x86.cpu_features.features[0x0].cpuid[0x3]=0xf8bfbff x86.cpu_features.features[0x0].active[0x0]=0x0 x86.cpu_features.features[0x0].active[0x1]=0x0 @@ -182,7 +182,6 @@ x86.cpu_features.preferred.Prefer_No_AVX512=0x1 x86.cpu_features.preferred.MathVec_Prefer_No_AVX512=0x0 x86.cpu_features.preferred.Prefer_FSRM=0x0 -x86.cpu_features.preferred.Prefer_AVX2_STRCMP=0x1 x86.cpu_features.preferred.Avoid_Short_Distance_REP_MOVSB=0x0 x86.cpu_features.isa_1=0xf x86.cpu_features.xsave_state_size=0x9c0 @@ -190,7 +189,7 @@ x86.cpu_features.data_cache_size=0x8000 x86.cpu_features.shared_cache_size=0x1000000 x86.cpu_features.non_temporal_threshold=0xc00000 -x86.cpu_features.rep_movsb_threshold=0x1000 +x86.cpu_features.rep_movsb_threshold=0x2000 x86.cpu_features.rep_movsb_stop_threshold=0xc00000 x86.cpu_features.rep_stosb_threshold=0x800 x86.cpu_features.level1_icache_size=0x8000
(In reply to Severin Gehwolf from comment #18) > With these glibc packages in a custom mock java-1.8.0-openjdk builds fine on > this virt-setup, while with 2.34.9000-16 and better it fails. Question is > what changed in 2.34.9000-16 that might trigger this bug on cascadelake > virtualized systems. In testing EDK2 builds in koji against an F36 rawhide side-tag, I'm hitting a different failure point - 4 builds have now passed on 2.34.9000-16, but I got a failure first time out on 2.34.9000-17 Upstream commit: d3bf2f5927d51258a51ac7fde04f4805f8ee294a - elf: Do not run DSO sorting if tunables is not enabled - riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN - x86-64: Replace movzx with movzbl - regex: Unnest nested functions in regcomp.c - Use Linux 5.15 in build-many-glibcs.py - elf: Assume disjointed .rela.dyn and .rela.plt for loader - i386: Explain why __HAVE_64B_ATOMICS has to be 0 - benchtests: Add hypotf - benchtests: Make hypot input random - arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors support - arm: Use internal symbol for _dl_argv on _dl_start_user - x86-64: Remove Prefer_AVX2_STRCMP - x86-64: Improve EVEX strcmp with masked load Based on the Koji VM hw_info.log, we have 'avx2' feature flag set, so the -16 build should have been using the AVX2 strcmp, while with -17 we should now be using the (improved) EVEX strcmp.
(In reply to Daniel Berrangé from comment #27) > (In reply to Severin Gehwolf from comment #18) > > With these glibc packages in a custom mock java-1.8.0-openjdk builds fine on > > this virt-setup, while with 2.34.9000-16 and better it fails. Question is > > what changed in 2.34.9000-16 that might trigger this bug on cascadelake > > virtualized systems. > > In testing EDK2 builds in koji against an F36 rawhide side-tag, I'm hitting > a different failure point - 4 builds have now passed on 2.34.9000-16, but I > got a failure first time out on 2.34.9000-17 > > Upstream commit: d3bf2f5927d51258a51ac7fde04f4805f8ee294a > > - elf: Do not run DSO sorting if tunables is not enabled > - riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN > - x86-64: Replace movzx with movzbl > - regex: Unnest nested functions in regcomp.c > - Use Linux 5.15 in build-many-glibcs.py > - elf: Assume disjointed .rela.dyn and .rela.plt for loader > - i386: Explain why __HAVE_64B_ATOMICS has to be 0 > - benchtests: Add hypotf > - benchtests: Make hypot input random > - arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors > support > - arm: Use internal symbol for _dl_argv on _dl_start_user > - x86-64: Remove Prefer_AVX2_STRCMP > - x86-64: Improve EVEX strcmp with masked load > > Based on the Koji VM hw_info.log, we have 'avx2' feature flag set, so the > -16 build should have been using the AVX2 strcmp, while with -17 we should > now be using the (improved) EVEX strcmp. That's probably true for us as well. Looking back at koshei[1] for java-1.8.0-openjdk the first failure was actually seen with 2.34.9000-17.fc36 on November 6. So the upgrade of the build hosts on Nov 8/9 might actually be a red herring. We don't yet know if the issue is present with RHEL 8.4 virt stack and a rawhide build with glibc 2.34.9000-17.fc36 [1] https://koschei.fedoraproject.org/package/java-1.8.0-openjdk?last_seen_ts=1636467944&collection=f36
Does it fail only on RHEL 8.5 hosts? Will it ever fail on Fedora 35 hosts?
Personally I've only been able to reproduce in the Koji builders.
(In reply to H.J. Lu from comment #29) > Does it fail only on RHEL 8.5 hosts? Will it ever fail on Fedora 35 hosts? I don't know. What I do know is that it doesn't fail for me on F34 (host) and F34 (guest) with rawhide mock in the guest. But that's not a cascadelake cpu on the host. So it's an unknown.
(In reply to H.J. Lu from comment #29) > Does it fail only on RHEL 8.5 hosts? Will it ever fail on Fedora 35 hosts? We know this is the type of setup that reproduces the issue: - Host hardware is Intel Xeon Gold CPU (Cascadelake) - Host OS is RHEL 8.5 (we haven't checked other versions, yet) - Guest VM is F34 (we haven't checked others yet; F35 would likely work too). - Fedora rawhide mock chroot with glibc 2.34.9000-17.fc36 or newer There might be others that reproduce too, but that's one we know for sure. It's also the setup Koji builders have.
I have Fedora 35 running on Cascadelake. I will try mock build for Fedora 35.
I've confirmed it will NOT reproduce on my Skylake-Server VMs. Since the EVEX impls of strcmp get selected off if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) && CPU_FEATURE_USABLE_P (cpu_features, BMI2)) return OPTIMIZE (evex); I decided to try force disabling them by hidding the AVX512VL flag: export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512VL With this set, I can get successful builds of EDK2 in Koji for F36. IOW points to a bug related to a glibc codepath that triggers off AVX512VL, most likely to be strcmp related due to the changes that appeared in the 2.34.9000-17 build that caused switch from AVX2 to EVEX strcmp impls.
This is caused by commit c46e9afb2df5fc9e39ff4d13777e4b4c26e04e55 Author: H.J. Lu <hjl.tools> Date: Fri Oct 29 12:40:20 2021 -0700 x86-64: Improve EVEX strcmp with masked load
(In reply to H.J. Lu from comment #35) > This is caused by > > commit c46e9afb2df5fc9e39ff4d13777e4b4c26e04e55 > Author: H.J. Lu <hjl.tools> > Date: Fri Oct 29 12:40:20 2021 -0700 > > x86-64: Improve EVEX strcmp with masked load Thanks, H.J. Reassigning to Fedora glibc.
I'm working on an emergency fix that removes the EVEX routines as IFUNC candidates.
(In reply to Daniel Berrangé from comment #34) > I decided to try force disabling them by hidding the AVX512VL flag: > > export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512VL > > With this set, I can get successful builds of EDK2 in Koji for F36. I can confirm. This work-around makes our builds pass again too. Thanks for this Daniel!
I've submitted an update to Bodhi: https://bodhi.fedoraproject.org/updates/FEDORA-2021-92837f7e1f Not sure why it's not showing up here.
Building OpenJDK in the side tag with the new glibc works: https://koji.fedoraproject.org/koji/taskinfo?taskID=79564840 Can we get this into the f36 buildroot?
It has been fixed on glibc master branch. Please verify it.
I botched the Bodhi push for glibc-2.34.9000-24.fc36 because I got confused by the OpenJDK update in it. I'm trying another push for glibc-2.34.9000-25.fc36. The real fix with glibc-2.34.9000-26.fc36 is also on its way.
FEDORA-2021-92837f7e1f has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2021-b4832cd9cb has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report.
Sorry for the confusion. I can confirm builds of OpenJDK 11 & 17 are now passing reliably with the glibc in f36/rawhide on the Copperlake VM.
Could somebody explain to me why we weren't able to reproduce this bug on physical hardware? Is that wrong? If so, what was the magic to be able to trigger it there?
(In reply to Severin Gehwolf from comment #46) > Could somebody explain to me why we weren't able to reproduce this bug on > physical hardware? Is that wrong? If so, what was the magic to be able to > trigger it there? I can reproduce it on a physical machine. I just got this when building java-17-openjdk (241e828cfe2ef51c61e5d0e544f613f3ee7bc960) against glibc-2.34.9000-22.fc36.x86_64: gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-9.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/classLoaderData.d'. Stop. This is a freshly-installed machine with a Xeon Gold 5118 (so not even Cascade Lake).
(In reply to Florian Weimer from comment #47) > (In reply to Severin Gehwolf from comment #46) > > Could somebody explain to me why we weren't able to reproduce this bug on > > physical hardware? Is that wrong? If so, what was the magic to be able to > > trigger it there? > > I can reproduce it on a physical machine. I just got this when building > java-17-openjdk (241e828cfe2ef51c61e5d0e544f613f3ee7bc960) against > glibc-2.34.9000-22.fc36.x86_64: > > gmake[3]: *** No rule to make target > '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-9.rolling.fc36.x86_64/ > build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/ > classLoaderData.d'. Stop. > > This is a freshly-installed machine with a Xeon Gold 5118 (so not even > Cascade Lake). I reproduced this issue on a Tiger Lake laptop by running $ mock -r fedora-36-x86_64 /tmp/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.src.rpm and extracted a glibc testcase from it. The glibc testcase failed on all AVX512 machines.
(In reply to Florian Weimer from comment #47) > (In reply to Severin Gehwolf from comment #46) > > Could somebody explain to me why we weren't able to reproduce this bug on > > physical hardware? Is that wrong? If so, what was the magic to be able to > > trigger it there? > > I can reproduce it on a physical machine. I just got this when building > java-17-openjdk (241e828cfe2ef51c61e5d0e544f613f3ee7bc960) against > glibc-2.34.9000-22.fc36.x86_64: > > gmake[3]: *** No rule to make target > '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-9.rolling.fc36.x86_64/ > build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/ > classLoaderData.d'. Stop. > > This is a freshly-installed machine with a Xeon Gold 5118 (so not even > Cascade Lake). Deterministically? On those rare occasions where we hit the physical machines in koji we got successful builds. Or does this physical model not reproduce? Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz Examples (of successful builds with affected glibc versions in BR): https://koji.fedoraproject.org/koji/taskinfo?taskID=79432315 https://koji.fedoraproject.org/koji/taskinfo?taskID=78753824 https://koji.fedoraproject.org/koji/taskinfo?taskID=79573328 https://koji.fedoraproject.org/koji/taskinfo?taskID=79566441 https://koji.fedoraproject.org/koji/taskinfo?taskID=79573327
(In reply to Severin Gehwolf from comment #51) > (In reply to Florian Weimer from comment #47) > > (In reply to Severin Gehwolf from comment #46) > > This is a freshly-installed machine with a Xeon Gold 5118 (so not even > > Cascade Lake). > > Deterministically? On those rare occasions where we hit the physical > machines in koji we got successful builds. > > Or does this physical model not reproduce? > > Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz > > Examples (of successful builds with affected glibc versions in BR): > > https://koji.fedoraproject.org/koji/taskinfo?taskID=79432315 > https://koji.fedoraproject.org/koji/taskinfo?taskID=78753824 > https://koji.fedoraproject.org/koji/taskinfo?taskID=79573328 > https://koji.fedoraproject.org/koji/taskinfo?taskID=79566441 > https://koji.fedoraproject.org/koji/taskinfo?taskID=79573327 Look at the 'hw_info.log' for the x86_64 task, eg for that first build we have https://kojipkgs.fedoraproject.org//work/tasks/2480/79432480/hw_info.log which does NOT report avx512 CPU flags, so it isn't affected by the bug.
(In reply to Daniel Berrangé from comment #52) > (In reply to Severin Gehwolf from comment #51) > > (In reply to Florian Weimer from comment #47) > > > (In reply to Severin Gehwolf from comment #46) > > > This is a freshly-installed machine with a Xeon Gold 5118 (so not even > > > Cascade Lake). > > > > Deterministically? On those rare occasions where we hit the physical > > machines in koji we got successful builds. > > > > Or does this physical model not reproduce? > > > > Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz > > > > Examples (of successful builds with affected glibc versions in BR): > > > > https://koji.fedoraproject.org/koji/taskinfo?taskID=79432315 > > https://koji.fedoraproject.org/koji/taskinfo?taskID=78753824 > > https://koji.fedoraproject.org/koji/taskinfo?taskID=79573328 > > https://koji.fedoraproject.org/koji/taskinfo?taskID=79566441 > > https://koji.fedoraproject.org/koji/taskinfo?taskID=79573327 > > Look at the 'hw_info.log' for the x86_64 task, eg for that first build we > have > > https://kojipkgs.fedoraproject.org//work/tasks/2480/79432480/hw_info.log > > which does NOT report avx512 CPU flags, so it isn't affected by the bug. OK, got it. Thanks!