Bug 2026399 - glibc: Regression in AVX-512 strcmp
Summary: glibc: Regression in AVX-512 strcmp
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Florian Weimer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2026398 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-24 14:47 UTC by Severin Gehwolf
Modified: 2021-12-07 13:00 UTC (History)
23 users (show)

Fixed In Version: glibc-2.34.9000-26.fc36
Clone Of:
Environment:
Last Closed: 2021-12-05 14:33:28 UTC
Type: Bug
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
f34 guest xml (6.74 KB, text/plain)
2021-11-24 17:20 UTC, Severin Gehwolf
no flags Details
F34 guest qemu cmd log (3.36 KB, text/plain)
2021-11-24 17:21 UTC, Severin Gehwolf
no flags Details
libvirt debug log (3.22 MB, text/plain)
2021-11-24 18:21 UTC, Severin Gehwolf
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FC-347 0 None None None 2021-12-03 17:38:19 UTC
Sourceware 28646 0 P2 NEW [2.35 Regression] mock -r fedora-36-x86_64 /tmp/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.src.rpm& fails to build 2021-12-03 17:15:39 UTC

Description Severin Gehwolf 2021-11-24 14:47:46 UTC
Description of problem:
OpenJDK 8, 11, 17 builds started to fail on rawhide x86_64 when they hit a virtualized builder (RHEL 8.5 with RHEL 8.5 qemu-kvm/libvirt on the host; F34 as the guest VM). See also this fedora-infra issue:
https://pagure.io/fedora-infrastructure/issue/10348

Version-Release number of selected component (if applicable):
# rpm -q libvirt
libvirt-6.0.0-37.module+el8.5.0+12162+40884dd2.x86_64

How reproducible:
100% if run on a virtualized VM builder.

Steps to Reproduce:
1. Set up a RHEL 8.5 host on an Intel Xeon Gold (Cascadelake)
2. Install virt tools for hosting VMs (from RHEL 8.5)
3. Install F34 in a guest VM
4. Try to do a mock build of java-1.8.0-openjdk for rawhide on the F34 guest VM (fedpkg clone -a java-1.8.0-openjdk && cd java-1.8.0-openjdk && fedpkg mockbuild --no-cleanup-after)

Actual results:
Random, cryptic build failures. Some example(s):

ERROR: compileproperties: IO error writing to file /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/jdk/gensrc/sun/util/resources/cs/LocaleNames_cs.java
EXCEPTION: java.io.FileNotFoundException: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/jdk/gensrc/sun/util/resources/cs/LocaleNames_cs.java (No such file or directory)
java.io.FileNotFoundException: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/jdk/gensrc/sun/util/resources/cs/LocaleNames_cs.java (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
        at build.tools.compileproperties.CompileProperties.createFile(CompileProperties.java:269)
        at build.tools.compileproperties.CompileProperties.main(CompileProperties.java:195)

and/or:

/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/make/linux/makefiles/rules.make:149: Building os_perf_linux.o  (from /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/os/linux/vm/os_perf_linux.cpp) (/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/os/linux/vm/os_perf_linux.cpp  newer)
gmake[6]: *** No rule to make target 's.os_posix.cpp', needed by 'os_posix.cpp'.  Stop.
gmake[6]: *** Waiting for unfinished jobs....


Expected results:
Successful build of rawhide in mock.

Additional info:
It doesn't reproduce on a Fedora host (F34) and Fedora 34 guest. It doesn't seem to be qemu related (of the host). It doesn't reproduce when the build happens on a physical machine (not virtualized). Failures have been seen with qemu-kvm 6.0.0 and 4.2.0

It might be xfs or libvirt storage related as the failures seem to change and are about files not being present where they should be there.

Comment 1 Severin Gehwolf 2021-11-24 14:49:26 UTC
*** Bug 2026398 has been marked as a duplicate of this bug. ***

Comment 3 Jaroslav Suchanek 2021-11-24 16:40:05 UTC
It's unlikely that this would be caused by libvirt. But would you please attach the guest XML, libvirt debug logs (https://libvirt.org/kbase/debuglogs.html), possibly also the domain log, usually located in /var/log/libvirt/qemu directory?

It would be good to narrow down the reproducer to some simple use case. For example you indicated that it might be related to xfs. Have you tried some xfstests tool? Or other fs stress tool?

Thanks.

Comment 4 Severin Gehwolf 2021-11-24 17:04:17 UTC
(In reply to Jaroslav Suchanek from comment #3)
> It's unlikely that this would be caused by libvirt.

We are only able to reproduce in a virtualized setup. The one described in comment 0. So if not reproducible on a physical machine, where should we be looking?

> But would you please
> attach the guest XML, libvirt debug logs
> (https://libvirt.org/kbase/debuglogs.html), possibly also the domain log,
> usually located in /var/log/libvirt/qemu directory?

OK, thanks. I'll gather this info.

> It would be good to narrow down the reproducer to some simple use case. For
> example you indicated that it might be related to xfs. Have you tried some
> xfstests tool? Or other fs stress tool?

What exactly do you have in mind? I know little about those.

Comment 5 Severin Gehwolf 2021-11-24 17:20:29 UTC
Created attachment 1843456 [details]
f34 guest xml

Comment 6 Severin Gehwolf 2021-11-24 17:21:45 UTC
Created attachment 1843457 [details]
F34 guest qemu cmd log

Comment 7 Severin Gehwolf 2021-11-24 18:21:46 UTC
Created attachment 1843461 [details]
libvirt debug log

Comment 8 Peter Krempa 2021-11-26 12:44:13 UTC
Note that libvirt doesn't do much besides instructing qemu to use the disk image that is configured in the XML in terms of storage handling.

Also it's not really clear from this BZ or the linked pagure issue where multiple build failures are linked what the underlying issue is. Namely I wasn't able to find the JAVA error reported above (some of the logs are no longer present), and I've seen only gmake errors or some internal test errors:


Examples:


  https://kojipkgs.fedoraproject.org//work/tasks/7748/78757748/build.log

  gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/gensrc/adfiles/ad_x86_gen.cpp', needed by '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/_build-info.marker'.  Stop.

In the above example it's a make error. The missing file seems to be generated by an internal tool, unfortunately there's nothing which would hint why. On other platforms it seems to be generating other platform specific code and that works. Note that other platforms seem to be using virt as well for build. 

  https://kojipkgs.fedoraproject.org//work/tasks/7368/78877368/build.log

  gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/classLoaderDataGraph.d'.  Stop.

  https://kojipkgs.fedoraproject.org//work/tasks/6045/78836045/build.log

  gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/classLoaderDataGraph.d'.  Stop.

Both are the same, I wasn't able to find any mention of error regarding to that file.




 https://kojipkgs.fedoraproject.org//work/tasks/7925/78767925/build.log

/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/classes/_the.BUILD_JDK_batch.tmp
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:1: error: illegal character: '#'
# This properties file is used to create a PropertyResourceBundle
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:1: error: class, interface, or enum expected
# This properties file is used to create a PropertyResourceBundle
       ^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:2: error: illegal character: '#'
# It contains Locale specific strings used be the Synth Look and Feel.
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:3: error: illegal character: '#'
# Currently, the following components need this for support:
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:4: error: illegal character: '#'
#
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:5: error: illegal character: '#'
#    FileChooser
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:6: error: illegal character: '#'
#
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:7: error: illegal character: '#'
# When this file is read in, the strings are put into the
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:8: error: illegal character: '#'
# defaults table.  This is an implementation detail of the current
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:9: error: illegal character: '#'
# workings of Swing.  DO NOT DEPEND ON THIS.
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:10: error: illegal character: '#'
# This may change in future versions of Swing as we improve localization
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:11: error: illegal character: '#'
# support.
^
/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/metal_zh_HK.java:12: error: illegal character: '#'
#

This one seems to be some bug in input files?

Either way, neither the linked cases nor the description in this bug have anything which would even hint to a virtualization problem here. Even if there is something wrong in the storage layer in virt, it's at block level, so it wouldn't impact individual files vanishing but rather block errors which are unlikely to manifest without filesystem corruption.

For now it IMO makes no sense to reasign this to any other component in the virt layer until it's clear what the root of the problem is and this BZ is not doing that sufficiently.

In case you'll encounter a filesystem or block error hint e.g. in the guest's kernel log or anything which wouldn't be also explainable by a build system failure, please attach it here.

Please also reassign this BZ to the JDK component for now at least until the root cause is know.

Comment 9 Severin Gehwolf 2021-11-26 14:19:42 UTC
(In reply to Peter Krempa from comment #8)
> Note that libvirt doesn't do much besides instructing qemu to use the disk
> image that is configured in the XML in terms of storage handling.
> 
> Also it's not really clear from this BZ or the linked pagure issue where
> multiple build failures are linked what the underlying issue is.

Your guess is as good as mine as to what the underlying issue is. Either way, we - the JDK team - would need to figure it out. Very little to go on by, though. On the JDK side, nothing changed. See also this for a bit of history:
https://koschei.fedoraproject.org/package/java-1.8.0-openjdk?collection=f36

Seems like since the update of builders (Nov 8) it fails on virtualized build VMs.

> Namely I
> wasn't able to find the JAVA error reported above (some of the logs are no
> longer present), and I've seen only gmake errors or some internal test
> errors:

Right, the thing is failures are fairly random. They don't necessarily look alike from one affected system to the next. The only consistency was build failure :-/ 

> 
> Examples:
> 
> 
>   https://kojipkgs.fedoraproject.org//work/tasks/7748/78757748/build.log
> 
>   gmake[3]: *** No rule to make target
> '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/
> build/jdk17.build-slowdebug-main/hotspot/variant-server/gensrc/adfiles/
> ad_x86_gen.cpp', needed by
> '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-8.rolling.fc36.x86_64/
> build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/_build-
> info.marker'.  Stop.
> 
> In the above example it's a make error. The missing file seems to be
> generated by an internal tool, unfortunately there's nothing which would
> hint why. On other platforms it seems to be generating other platform
> specific code and that works. Note that other platforms seem to be using
> virt as well for build. 
> 
>   https://kojipkgs.fedoraproject.org//work/tasks/7368/78877368/build.log
> 
>   gmake[3]: *** No rule to make target
> '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/
> build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/
> classLoaderDataGraph.d'.  Stop.
> 
>   https://kojipkgs.fedoraproject.org//work/tasks/6045/78836045/build.log
> 
>   gmake[3]: *** No rule to make target
> '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-7.rolling.fc36.x86_64/
> build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/
> classLoaderDataGraph.d'.  Stop.
> 
> Both are the same, I wasn't able to find any mention of error regarding to
> that file.
> 
> 
> 
> 
>  https://kojipkgs.fedoraproject.org//work/tasks/7925/78767925/build.log
> 
> /build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/jdk8.
> build-slowdebug/jdk/classes/_the.BUILD_JDK_batch.tmp
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:1: error: illegal character: '#'
> # This properties file is used to create a PropertyResourceBundle
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:1: error: class, interface, or enum expected
> # This properties file is used to create a PropertyResourceBundle
>        ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:2: error: illegal character: '#'
> # It contains Locale specific strings used be the Synth Look and Feel.
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:3: error: illegal character: '#'
> # Currently, the following components need this for support:
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:4: error: illegal character: '#'
> #
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:5: error: illegal character: '#'
> #    FileChooser
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:6: error: illegal character: '#'
> #
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:7: error: illegal character: '#'
> # When this file is read in, the strings are put into the
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:8: error: illegal character: '#'
> # defaults table.  This is an implementation detail of the current
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:9: error: illegal character: '#'
> # workings of Swing.  DO NOT DEPEND ON THIS.
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:10: error: illegal character: '#'
> # This may change in future versions of Swing as we improve localization
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:11: error: illegal character: '#'
> # support.
> ^
> /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.eln113.x86_64/build/
> jdk8.build-slowdebug/jdk/gensrc/com/sun/swing/internal/plaf/metal/resources/
> metal_zh_HK.java:12: error: illegal character: '#'
> #
> 
> This one seems to be some bug in input files?
> 
> Either way, neither the linked cases nor the description in this bug have
> anything which would even hint to a virtualization problem here. Even if
> there is something wrong in the storage layer in virt, it's at block level,
> so it wouldn't impact individual files vanishing but rather block errors
> which are unlikely to manifest without filesystem corruption.
> 
> For now it IMO makes no sense to reasign this to any other component in the
> virt layer until it's clear what the root of the problem is and this BZ is
> not doing that sufficiently.
> 
> In case you'll encounter a filesystem or block error hint e.g. in the
> guest's kernel log or anything which wouldn't be also explainable by a build
> system failure, please attach it here.
> 
> Please also reassign this BZ to the JDK component for now at least until the
> root cause is know.

We were unable to reproduce in the following environments:
- Physical host, perform build on x86_64 (via 'fedpkg mockbuild`)
- Virtualized environment with: Host F34, guest F34, mockbuild in guest F34 VM.

We were able to reproduce in the following environment:
- Virtualized environment with: Host RHEL 8.5, guest F34, mockbuild in guest F34 VM.

https://pagure.io/fedora-infrastructure/issue/10348#comment-762507 mentions Koji builders got updated on November 8. After consultation with Fedora infra folks, I was told builders updated to RHEL 8.5. This is when the failures started happening for us.

Also, for java-1.8.0-openjdk we've observed a build fail (when we happened to get a virtualized builder in koji):
https://koji.fedoraproject.org/koji/taskinfo?taskID=78504433
A more recent one is:
https://koji.fedoraproject.org/koji/taskinfo?taskID=79226399

The actual error for this was:

+ sed 's/\(separated by \)[;:]/\1:/g' /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/share/vm/Xusage.txt
Makefile:576: Building /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt  (from /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/share/vm/Xusage.txt) (/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/src/share/vm/Xusage.txt  newer)
mv /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt.temp /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt
+ mv /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt.temp /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/Xusage.txt
gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/dist/jre/lib/amd64/server/libjvm.so', needed by 'generic_export'.  Stop.
gmake[3]: Leaving directory '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/make'
gmake[2]: *** [Makefile:300: export_debug] Error 2
gmake[2]: Leaving directory '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/hotspot/make'
gmake[1]: *** [HotspotWrapper.gmk:45: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/bootbuild/jdk8.build-slowdebug/hotspot/_hotspot.timestamp] Error 2
gmake[1]: Leaving directory '/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk/make'
make: *** [/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.x86_64/openjdk//make/Main.gmk:110: hotspot-only] Error 2

The exact same NVR passed when built on a physical machine:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1853322

The thing is, failures are absolutely random. It's different from JDK version to JDK version, all of them rather cryptic and unclear why they're happening. They don't happen on physical machines. My hope was people from the virtualization team could help us narrow it down... I have *very* little experience debugging virtualization problems.

Comment 10 Kevin Fenzi 2021-12-01 18:12:00 UTC
So a few items to note: 

Fedora builders are all Fedora 34. Some of them are vm's, some of them are bare hardware nodes. All of them are F34. 

The builder nodes that are vm's are mostly running with RHEL8.5 as the L0 hypervisor (in the case of x86_64, ppc64le, aarch64 and armv7) with just s390x using a Fedora 35 L0.

On Nov 8th/9th we upgraded the RHEL hypervisors to 8.5 and applied other updates to the f34 builders. 

This sure sounds like a RHEL8.5 kernel or qemu bug. 

Currently we are using the normal rhel8.5 qemu... it was suggested we should use the virt specific one instead. I am hoping to work on that soon. I don't know if that will affect/fix this issue or not.

Also soon we are planning on moving builders to F35. I don't know if that will affect this issue or not. This will happen in the next week or so.

Happy to let you know when I make changes so you can try or gather more info from the L0 / L1 instances. 
But also see related bug that already has most of this info: https://bugzilla.redhat.com/show_bug.cgi?id=2022075

Comment 11 Severin Gehwolf 2021-12-01 19:01:33 UTC
(In reply to Kevin Fenzi from comment #10)
> Happy to let you know when I make changes so you can try or gather more info
> from the L0 / L1 instances. 

Yes, please!

Comment 12 Severin Gehwolf 2021-12-01 19:41:16 UTC
Whatever the bug is, it looks like it only triggers with glibc >= 2.34.9000-15.fc36 in the L1 VMs mock. I'll try to confirm this, but something in newer glibc seems to trigger it. So this gets even more weird. At least that seems to explain why it would not fail builds for F35 on same virtualized builders (RHEL 8.5 hosts with F34 VMs).

Question is how to trigger a build on koji.stg.fedoraproject.org which, so I'm told are still on the old setup. That would clarify whether this is a glibc issue, or a virt issue. It appears to be an issue on the virtualization layer somehow though, as newer glibc alone isn't enough to trigger the issue (builds pass on physical machines with newer glibc).

Comment 13 Kevin Fenzi 2021-12-01 19:54:43 UTC
I'm not sure what "old setup" you mean? staging is setup just like prod... although we are starting working on deploying f35 builders there to test things out before rolling that to prod now.

Comment 14 Kevin Fenzi 2021-12-02 00:18:55 UTC
ok. I have updated qemu on all the x86 builder hosts and rebooted them. All of buildvm-x86-* should now be running with the qemu from advanced virt. 

Did you see this issue only on x86_64? or was it also on other arches?

Comment 15 Andrew John Hughes 2021-12-02 02:24:26 UTC
(In reply to Kevin Fenzi from comment #14)
> ok. I have updated qemu on all the x86 builder hosts and rebooted them. All
> of buildvm-x86-* should now be running with the qemu from advanced virt. 
> 
> Did you see this issue only on x86_64? or was it also on other arches?

x86_64 & rawhide on the Cascadelake VM setup only.

This scratch just failed: https://koji.fedoraproject.org/koji/taskinfo?taskID=79491184

The exact same package succeeded on non-VM hardware on rawhide a while ago: https://koji.fedoraproject.org/koji/buildinfo?buildID=1853322

As Severin says, it seems to only happen when we have the perfect storm of this virtualised hardware and the newer rawhide buildroot. The same packages are building fine for F35 (where we've had to effectively move our work to for now).

Comment 16 Severin Gehwolf 2021-12-02 09:32:10 UTC
(In reply to Kevin Fenzi from comment #13)
> I'm not sure what "old setup" you mean? staging is setup just like prod...
> although we are starting working on deploying f35 builders there to test
> things out before rolling that to prod now.

Mikolaj told me that koji.stg.fedoraproject.org is still on RHEL 8.4 (doesn't have the November 8/9 update yet). Trying to build on that setup as we speak...

Comment 17 Severin Gehwolf 2021-12-02 10:15:55 UTC
(In reply to Severin Gehwolf from comment #16)
> (In reply to Kevin Fenzi from comment #13)
> > I'm not sure what "old setup" you mean? staging is setup just like prod...
> > although we are starting working on deploying f35 builders there to test
> > things out before rolling that to prod now.
> 
> Mikolaj told me that koji.stg.fedoraproject.org is still on RHEL 8.4
> (doesn't have the November 8/9 update yet). Trying to build on that setup as
> we speak...

Unfortunately, it won't let me log in there for some reason (my Fedora user is 'jerboaa').

Comment 18 Severin Gehwolf 2021-12-02 10:58:07 UTC
<mock-chroot> sh-5.1# rpm -qa | grep glibc
glibc-common-2.34.9000-15.fc36.x86_64
glibc-gconv-extra-2.34.9000-15.fc36.x86_64
glibc-minimal-langpack-2.34.9000-15.fc36.x86_64
glibc-2.34.9000-15.fc36.x86_64
glibc-headers-x86-2.34.9000-15.fc36.noarch
glibc-devel-2.34.9000-15.fc36.x86_64

With these glibc packages in a custom mock java-1.8.0-openjdk builds fine on this virt-setup, while with 2.34.9000-16 and better it fails. Question is what changed in 2.34.9000-16 that might trigger this bug on cascadelake virtualized  systems.

Comment 19 Florian Weimer 2021-12-02 11:11:10 UTC
Upstream commits that went into glibc-2.34.9000-16:

- Auto-sync with upstream branch master,
  commit 79d0fc65395716c1d95931064c7bf37852203c66.
- benchtests: Add acosf function to bench-math
- benchtests: Improve bench-memcpy-random
- Disable -Waggressive-loop-optimizations warnings in tst-dynarray.c
- Fix compiler issue with mmap_internal
- Check if linker also support -mtls-dialect=gnu2
- Fix LIBC_PROG_BINUTILS for -fuse-ld=lld
- elf: Disable ifuncmain{1,5,5pic,5pie} when using LLD
- Handle NULL input to malloc_usable_size [BZ #28506]
- x86_64: Add memcmpeq.S to fix disable-multi-arch build
- login: Add back libutil as an empty library
- riscv: Fix incorrect jal with HIDDEN_JUMPTARGET
- x86_64: Add evex optimized __memcmpeq in memcmpeq-evex.S
- x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.S
- x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.S
- x86_64: Add support for __memcmpeq using sse2, avx2, and evex
- Benchtests: Add benchtests for __memcmpeq
- String: Add __memcmpeq as build target
- NEWS: Add item for __memcmpeq
- String: Add tests for __memcmpeq
- String: Add hidden defs for __memcmpeq() to enable internal usage
- String: Add support for __memcmpeq() ABI on all targets
- configure: Don't check LD -v --help for LIBC_LINKER_FEATURE
- elf: Make global.out depend on reldepmod4.so [BZ #28457]
- x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
- bench-math: Sort and put each bench per line
- x86_64: Add missing libmvec ABI tests
- elf: Fix e6fd79f379 build with --enable-tunables=no
- elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
- elf: Testing infrastructure for ld.so DSO sorting (BZ #17645)
- iconv: Use TIMEOUTFACTOR for iconv test timeout
- posix: Remove alloca usage for internal fnmatch implementation
- Add alloc_align attribute to memalign et al
- linux: Fix a possibly non-constant expression in _Static_assert
- x86-64: Add sysdeps/x86_64/fpu/Makeconfig

Since it's x86-64-specific, I'd focus there. __memcmpeq is a new API, so it's not going to affect existing packages. That leaves “x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S” or perhaps a bug in the __memcmpeq integration (it's a new function, but maybe memcmp was changed inadvertently).

Comment 20 Peter Krempa 2021-12-02 11:14:52 UTC
Based on the information above I'll reassign this to qemu for now as they will certainly have a better understanding what's going on.

Comment 21 Florian Weimer 2021-12-02 11:15:26 UTC
Can we get

/lib64/ld-linux-x86-64.so.2 --list-diagnostics

output from the affected machine? I want to double-check which memcmp implementation is selected. Thanks.

Comment 22 Florian Weimer 2021-12-02 11:16:05 UTC
(In reply to Florian Weimer from comment #21)
> Can we get
> 
> /lib64/ld-linux-x86-64.so.2 --list-diagnostics
> 
> output from the affected machine? I want to double-check which memcmp
> implementation is selected. Thanks.

Sorry, to clarify, that has to be done in the guest, but it can be in a chroot (rawhide chroot would actually be preferred).

Comment 23 Daniel Berrangé 2021-12-02 14:22:48 UTC
We're hitting very similar sounding problems in F36 rawhide with EDK2 builds, where we get bizarre/inexplicable errors from make when trying to resolve targets. 

make: *** No rule to make target '/builddir/build/BUILD/edk2-e1999b264f1f/Build/Ovmf3264/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/e_des.obj', needed by '/builddir/build/BUILD/edk2-e1999b264f1f/Build/Ovmf3264/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/OpensslLib.lib'.  Stop.

despite the rules for that target clearly existing when dumping the makefile contents and all pre-requisites existing too.

Only reproduces for me inside Koji, never any local machine or VM I have access to, and only in the 6 CPU koji VMs.

(In reply to Florian Weimer from comment #21)
> Can we get
> 
> /lib64/ld-linux-x86-64.so.2 --list-diagnostics
> 
> output from the affected machine? I want to double-check which memcmp
> implementation is selected. Thanks.

On the assumption that my EDK2 issue is likely the same bug as seen with openjdk, I added this command to the edk2.spec %build. You can see the output in this build log from a failing build:

https://kojipkgs.fedoraproject.org//work/tasks/2306/79512306/build.log

Comment 24 Severin Gehwolf 2021-12-02 14:29:30 UTC
(In reply to Florian Weimer from comment #21)
> Can we get
> 
> /lib64/ld-linux-x86-64.so.2 --list-diagnostics
> 
> output from the affected machine? I want to double-check which memcmp
> implementation is selected. Thanks.

Here you go (that's from a rawhide chroot on the guest F34 VM):

dl_discover_osversion=0x50b0c
dl_dst_lib="lib64"
dl_hwcap=0x6
dl_hwcap_important=0x6
dl_hwcap2=0x2
dl_hwcaps_subdirs="x86-64-v4:x86-64-v3:x86-64-v2"
dl_hwcaps_subdirs_active=0x7
dl_osversion=0x0
dl_pagesize=0x1000
dl_platform="haswell"
dl_profile_output="/var/tmp"
dl_string_platform=0x32
dso.ld="ld-linux-x86-64.so.2"
dso.libc="libc.so.6"
env_filtered[0x0]="SHELL"
env_filtered[0x1]="HISTCONTROL"
env_filtered[0x2]="HISTSIZE"
env_filtered[0x3]="HOSTNAME"
env_filtered[0x4]="container_host_version_id"
env_filtered[0x5]="PWD"
env_filtered[0x6]="LOGNAME"
env_filtered[0x7]="container"
env_filtered[0x8]="HOME"
env[0x9]="LANG=C.UTF-8"
env_filtered[0xa]="LS_COLORS"
env_filtered[0xb]="PROMPT_COMMAND"
env_filtered[0xc]="TERM"
env_filtered[0xd]="USER"
env_filtered[0xe]="NOTIFY_SOCKET"
env_filtered[0xf]="SHLVL"
env_filtered[0x10]="container_host_id"
env_filtered[0x11]="PS1"
env_filtered[0x12]="DEBUGINFOD_URLS"
env_filtered[0x13]="which_declare"
env_filtered[0x14]="container_host_variant_id"
env[0x15]="PATH=/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin"
env_filtered[0x16]="MAIL"
env_filtered[0x17]="container_uuid"
env_filtered[0x18]="BASH_FUNC_which%%"
env_filtered[0x19]="_"
path.prefix="/usr"
path.rtld="/lib64/ld-linux-x86-64.so.2"
path.sysconfdir="/etc"
path.system_dirs[0x0]="/lib64/"
path.system_dirs[0x1]="/usr/lib64/"
version.release="development"
version.version="2.34.9000"
auxv[0x0].a_type=0x21
auxv[0x0].a_val=0x7ffe9f38b000
auxv[0x1].a_type=0x10
auxv[0x1].a_val=0xf8bfbff
auxv[0x2].a_type=0x6
auxv[0x2].a_val=0x1000
auxv[0x3].a_type=0x11
auxv[0x3].a_val=0x64
auxv[0x4].a_type=0x3
auxv[0x4].a_val=0x7fc98a7a6040
auxv[0x5].a_type=0x4
auxv[0x5].a_val=0x38
auxv[0x6].a_type=0x5
auxv[0x6].a_val=0xb
auxv[0x7].a_type=0x7
auxv[0x7].a_val=0x0
auxv[0x8].a_type=0x8
auxv[0x8].a_val=0x0
auxv[0x9].a_type=0x9
auxv[0x9].a_val=0x7fc98a7a7090
auxv[0xa].a_type=0xb
auxv[0xa].a_val=0x0
auxv[0xb].a_type=0xc
auxv[0xb].a_val=0x0
auxv[0xc].a_type=0xd
auxv[0xc].a_val=0x0
auxv[0xd].a_type=0xe
auxv[0xd].a_val=0x0
auxv[0xe].a_type=0x17
auxv[0xe].a_val=0x0
auxv[0xf].a_type=0x19
auxv[0xf].a_val=0x7ffe9f265c09
auxv[0x10].a_type=0x1a
auxv[0x10].a_val=0x2
auxv[0x11].a_type=0x1f
auxv[0x11].a_val="/lib64/ld-linux-x86-64.so.2"
auxv[0x12].a_type=0xf
auxv[0x12].a_val="x86_64"
uname.sysname="Linux"
uname.nodename="6fac65290c9e478583f88d92dee1be0c"
uname.release="5.11.12-300.fc34.x86_64"
uname.version="#1 SMP Wed Apr 7 16:31:13 UTC 2021"
uname.machine="x86_64"
uname.domainname="(none)"
x86.cpu_features.basic.kind=0x1
x86.cpu_features.basic.max_cpuid=0xd
x86.cpu_features.basic.family=0x6
x86.cpu_features.basic.model=0x55
x86.cpu_features.basic.stepping=0x4
x86.cpu_features.features[0x0].cpuid[0x0]=0x50654
x86.cpu_features.features[0x0].cpuid[0x1]=0x800
x86.cpu_features.features[0x0].cpuid[0x2]=0xfffab223
x86.cpu_features.features[0x0].cpuid[0x3]=0xf8bfbff
x86.cpu_features.features[0x0].active[0x0]=0x0
x86.cpu_features.features[0x0].active[0x1]=0x0
x86.cpu_features.features[0x0].active[0x2]=0x7ed83203
x86.cpu_features.features[0x0].active[0x3]=0x7888110
x86.cpu_features.features[0x1].cpuid[0x0]=0x0
x86.cpu_features.features[0x1].cpuid[0x1]=0xd19f0fbb
x86.cpu_features.features[0x1].cpuid[0x2]=0x1c
x86.cpu_features.features[0x1].cpuid[0x3]=0xac000400
x86.cpu_features.features[0x1].active[0x0]=0x0
x86.cpu_features.features[0x1].active[0x1]=0xd18f0b38
x86.cpu_features.features[0x1].active[0x2]=0x18
x86.cpu_features.features[0x1].active[0x3]=0x0
x86.cpu_features.features[0x2].cpuid[0x0]=0x50654
x86.cpu_features.features[0x2].cpuid[0x1]=0x0
x86.cpu_features.features[0x2].cpuid[0x2]=0x121
x86.cpu_features.features[0x2].cpuid[0x3]=0x2c100800
x86.cpu_features.features[0x2].active[0x0]=0x0
x86.cpu_features.features[0x2].active[0x1]=0x0
x86.cpu_features.features[0x2].active[0x2]=0x121
x86.cpu_features.features[0x2].active[0x3]=0x8000000
x86.cpu_features.features[0x3].cpuid[0x0]=0xf
x86.cpu_features.features[0x3].cpuid[0x1]=0x988
x86.cpu_features.features[0x3].cpuid[0x2]=0x0
x86.cpu_features.features[0x3].cpuid[0x3]=0x0
x86.cpu_features.features[0x3].active[0x0]=0x7
x86.cpu_features.features[0x3].active[0x1]=0x0
x86.cpu_features.features[0x3].active[0x2]=0x0
x86.cpu_features.features[0x3].active[0x3]=0x0
x86.cpu_features.features[0x4].cpuid[0x0]=0x0
x86.cpu_features.features[0x4].cpuid[0x1]=0x0
x86.cpu_features.features[0x4].cpuid[0x2]=0x0
x86.cpu_features.features[0x4].cpuid[0x3]=0x0
x86.cpu_features.features[0x4].active[0x0]=0x0
x86.cpu_features.features[0x4].active[0x1]=0x0
x86.cpu_features.features[0x4].active[0x2]=0x0
x86.cpu_features.features[0x4].active[0x3]=0x0
x86.cpu_features.features[0x5].cpuid[0x0]=0x302e
x86.cpu_features.features[0x5].cpuid[0x1]=0x100d000
x86.cpu_features.features[0x5].cpuid[0x2]=0x0
x86.cpu_features.features[0x5].cpuid[0x3]=0x0
x86.cpu_features.features[0x5].active[0x0]=0x0
x86.cpu_features.features[0x5].active[0x1]=0x0
x86.cpu_features.features[0x5].active[0x2]=0x0
x86.cpu_features.features[0x5].active[0x3]=0x0
x86.cpu_features.features[0x6].cpuid[0x0]=0x0
x86.cpu_features.features[0x6].cpuid[0x1]=0x0
x86.cpu_features.features[0x6].cpuid[0x2]=0x0
x86.cpu_features.features[0x6].cpuid[0x3]=0x0
x86.cpu_features.features[0x6].active[0x0]=0x0
x86.cpu_features.features[0x6].active[0x1]=0x0
x86.cpu_features.features[0x6].active[0x2]=0x0
x86.cpu_features.features[0x6].active[0x3]=0x0
x86.cpu_features.features[0x7].cpuid[0x0]=0x0
x86.cpu_features.features[0x7].cpuid[0x1]=0x0
x86.cpu_features.features[0x7].cpuid[0x2]=0x0
x86.cpu_features.features[0x7].cpuid[0x3]=0x0
x86.cpu_features.features[0x7].active[0x0]=0x0
x86.cpu_features.features[0x7].active[0x1]=0x0
x86.cpu_features.features[0x7].active[0x2]=0x0
x86.cpu_features.features[0x7].active[0x3]=0x0
x86.cpu_features.features[0x8].cpuid[0x0]=0x0
x86.cpu_features.features[0x8].cpuid[0x1]=0x0
x86.cpu_features.features[0x8].cpuid[0x2]=0x0
x86.cpu_features.features[0x8].cpuid[0x3]=0x0
x86.cpu_features.features[0x8].active[0x0]=0x0
x86.cpu_features.features[0x8].active[0x1]=0x0
x86.cpu_features.features[0x8].active[0x2]=0x0
x86.cpu_features.features[0x8].active[0x3]=0x0
x86.cpu_features.preferred.Fast_Rep_String=0x1
x86.cpu_features.preferred.Fast_Copy_Backward=0x0
x86.cpu_features.preferred.Slow_BSF=0x0
x86.cpu_features.preferred.Fast_Unaligned_Load=0x1
x86.cpu_features.preferred.Prefer_PMINUB_for_stringop=0x1
x86.cpu_features.preferred.Fast_Unaligned_Copy=0x1
x86.cpu_features.preferred.I586=0x1
x86.cpu_features.preferred.I686=0x1
x86.cpu_features.preferred.Slow_SSE4_2=0x0
x86.cpu_features.preferred.AVX_Fast_Unaligned_Load=0x1
x86.cpu_features.preferred.Prefer_MAP_32BIT_EXEC=0x0
x86.cpu_features.preferred.Prefer_No_VZEROUPPER=0x1
x86.cpu_features.preferred.Prefer_ERMS=0x0
x86.cpu_features.preferred.Prefer_No_AVX512=0x1
x86.cpu_features.preferred.MathVec_Prefer_No_AVX512=0x0
x86.cpu_features.preferred.Prefer_FSRM=0x0
x86.cpu_features.preferred.Avoid_Short_Distance_REP_MOVSB=0x0
x86.cpu_features.isa_1=0xf
x86.cpu_features.xsave_state_size=0x9c0
x86.cpu_features.xsave_state_full_size=0xb00
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x1000000
x86.cpu_features.non_temporal_threshold=0xc00000
x86.cpu_features.rep_movsb_threshold=0x2000
x86.cpu_features.rep_movsb_stop_threshold=0xc00000
x86.cpu_features.rep_stosb_threshold=0x800
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x200000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x1000000
x86.cpu_features.level3_cache_assoc=0x10
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0

That's glibc glibc-2.34.9000-21.fc36.x86_64

Comment 25 Severin Gehwolf 2021-12-02 14:34:23 UTC
Since my info is from a custom setup where we "seem" to reproduce, here is the --list-diagnostics output from Daniel's rawhide build. Pasting here as the build log might disappear:

+ /lib64/ld-linux-x86-64.so.2 --list-diagnostics
dl_discover_osversion=0x50f04
dl_dst_lib="lib64"
dl_hwcap=0x6
dl_hwcap_important=0x6
dl_hwcap2=0x2
dl_hwcaps_subdirs="x86-64-v4:x86-64-v3:x86-64-v2"
dl_hwcaps_subdirs_active=0x7
dl_osversion=0x0
dl_pagesize=0x1000
dl_platform="haswell"
dl_profile_output="/var/tmp"
dl_string_platform=0x32
dso.ld="ld-linux-x86-64.so.2"
dso.libc="libc.so.6"
env_filtered[0x0]="SHELL"
env_filtered[0x1]="RPM_SOURCE_DIR"
env_filtered[0x2]="HISTCONTROL"
env_filtered[0x3]="PKG_CONFIG_PATH"
env_filtered[0x4]="HOSTNAME"
env_filtered[0x5]="HISTSIZE"
env_filtered[0x6]="PWD"
env_filtered[0x7]="SOURCE_DATE_EPOCH"
env_filtered[0x8]="LOGNAME"
env_filtered[0x9]="RPM_ARCH"
env_filtered[0xa]="HOME"
env[0xb]="LANG=C"
env_filtered[0xc]="RPM_LD_FLAGS"
env_filtered[0xd]="PROMPT_COMMAND"
env_filtered[0xe]="RPM_PACKAGE_RELEASE"
env_filtered[0xf]="RPM_OS"
env_filtered[0x10]="TERM"
env_filtered[0x11]="LESSOPEN"
env_filtered[0x12]="USER"
env_filtered[0x13]="SHLVL"
env_filtered[0x14]="RPM_BUILD_DIR"
env_filtered[0x15]="RPM_OPT_FLAGS"
env_filtered[0x16]="RPM_DOC_DIR"
env_filtered[0x17]="RPM_PACKAGE_VERSION"
env_filtered[0x18]="DEBUGINFOD_URLS"
env_filtered[0x19]="which_declare"
env_filtered[0x1a]="CONFIG_SITE"
env[0x1b]="PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin"
env_filtered[0x1c]="MAIL"
env_filtered[0x1d]="RPM_BUILD_NCPUS"
env_filtered[0x1e]="RPM_PACKAGE_NAME"
env_filtered[0x1f]="RPM_BUILD_ROOT"
env_filtered[0x20]="OLDPWD"
env_filtered[0x21]="BASH_FUNC_which%%"
env_filtered[0x22]="_"
path.prefix="/usr"
path.rtld="/lib64/ld-linux-x86-64.so.2"
path.sysconfdir="/etc"
path.system_dirs[0x0]="/lib64/"
path.system_dirs[0x1]="/usr/lib64/"
version.release="development"
version.version="2.34.9000"
auxv[0x0].a_type=0x21
auxv[0x0].a_val=0x7ffefe905000
auxv[0x1].a_type=0x33
auxv[0x1].a_val=0xe30
auxv[0x2].a_type=0x10
auxv[0x2].a_val=0xf8bfbff
auxv[0x3].a_type=0x6
auxv[0x3].a_val=0x1000
auxv[0x4].a_type=0x11
auxv[0x4].a_val=0x64
auxv[0x5].a_type=0x3
auxv[0x5].a_val=0x7f5fcc3f5040
auxv[0x6].a_type=0x4
auxv[0x6].a_val=0x38
auxv[0x7].a_type=0x5
auxv[0x7].a_val=0xb
auxv[0x8].a_type=0x7
auxv[0x8].a_val=0x0
auxv[0x9].a_type=0x8
auxv[0x9].a_val=0x0
auxv[0xa].a_type=0x9
auxv[0xa].a_val=0x7f5fcc3f6090
auxv[0xb].a_type=0xb
auxv[0xb].a_val=0x3e8
auxv[0xc].a_type=0xc
auxv[0xc].a_val=0x3e8
auxv[0xd].a_type=0xd
auxv[0xd].a_val=0x1a9
auxv[0xe].a_type=0xe
auxv[0xe].a_val=0x1a9
auxv[0xf].a_type=0x17
auxv[0xf].a_val=0x0
auxv[0x10].a_type=0x19
auxv[0x10].a_val=0x7ffefe8b5309
auxv[0x11].a_type=0x1a
auxv[0x11].a_val=0x2
auxv[0x12].a_type=0x1f
auxv[0x12].a_val="/lib64/ld-linux-x86-64.so.2"
auxv[0x13].a_type=0xf
auxv[0x13].a_val="x86_64"
uname.sysname="Linux"
uname.nodename="buildvm-x86-31.iad2.fedoraproject.org"
uname.release="5.15.4-101.fc34.x86_64"
uname.version="#1 SMP Tue Nov 23 18:58:50 UTC 2021"
uname.machine="x86_64"
uname.domainname="fedoraproject.org"
x86.cpu_features.basic.kind=0x1
x86.cpu_features.basic.max_cpuid=0xd
x86.cpu_features.basic.family=0x6
x86.cpu_features.basic.model=0x55
x86.cpu_features.basic.stepping=0x6
x86.cpu_features.features[0x0].cpuid[0x0]=0x50656
x86.cpu_features.features[0x0].cpuid[0x1]=0x5000800
x86.cpu_features.features[0x0].cpuid[0x2]=0xfffa3223
x86.cpu_features.features[0x0].cpuid[0x3]=0xf8bfbff
x86.cpu_features.features[0x0].active[0x0]=0x0
x86.cpu_features.features[0x0].active[0x1]=0x0
x86.cpu_features.features[0x0].active[0x2]=0x7ed83203
x86.cpu_features.features[0x0].active[0x3]=0x7888110
x86.cpu_features.features[0x1].cpuid[0x0]=0x0
x86.cpu_features.features[0x1].cpuid[0x1]=0xd19f07ab
x86.cpu_features.features[0x1].cpuid[0x2]=0x81c
x86.cpu_features.features[0x1].cpuid[0x3]=0xac000400
x86.cpu_features.features[0x1].active[0x0]=0x0
x86.cpu_features.features[0x1].active[0x1]=0xd18f0328
x86.cpu_features.features[0x1].active[0x2]=0x818
x86.cpu_features.features[0x1].active[0x3]=0x0
x86.cpu_features.features[0x2].cpuid[0x0]=0x50656
x86.cpu_features.features[0x2].cpuid[0x1]=0x0
x86.cpu_features.features[0x2].cpuid[0x2]=0x121
x86.cpu_features.features[0x2].cpuid[0x3]=0x2c100800
x86.cpu_features.features[0x2].active[0x0]=0x0
x86.cpu_features.features[0x2].active[0x1]=0x0
x86.cpu_features.features[0x2].active[0x2]=0x121
x86.cpu_features.features[0x2].active[0x3]=0x8000000
x86.cpu_features.features[0x3].cpuid[0x0]=0xf
x86.cpu_features.features[0x3].cpuid[0x1]=0x988
x86.cpu_features.features[0x3].cpuid[0x2]=0x0
x86.cpu_features.features[0x3].cpuid[0x3]=0x0
x86.cpu_features.features[0x3].active[0x0]=0x7
x86.cpu_features.features[0x3].active[0x1]=0x0
x86.cpu_features.features[0x3].active[0x2]=0x0
x86.cpu_features.features[0x3].active[0x3]=0x0
x86.cpu_features.features[0x4].cpuid[0x0]=0x0
x86.cpu_features.features[0x4].cpuid[0x1]=0x0
x86.cpu_features.features[0x4].cpuid[0x2]=0x0
x86.cpu_features.features[0x4].cpuid[0x3]=0x0
x86.cpu_features.features[0x4].active[0x0]=0x0
x86.cpu_features.features[0x4].active[0x1]=0x0
x86.cpu_features.features[0x4].active[0x2]=0x0
x86.cpu_features.features[0x4].active[0x3]=0x0
x86.cpu_features.features[0x5].cpuid[0x0]=0x302e
x86.cpu_features.features[0x5].cpuid[0x1]=0x1009000
x86.cpu_features.features[0x5].cpuid[0x2]=0x0
x86.cpu_features.features[0x5].cpuid[0x3]=0x0
x86.cpu_features.features[0x5].active[0x0]=0x0
x86.cpu_features.features[0x5].active[0x1]=0x0
x86.cpu_features.features[0x5].active[0x2]=0x0
x86.cpu_features.features[0x5].active[0x3]=0x0
x86.cpu_features.features[0x6].cpuid[0x0]=0x0
x86.cpu_features.features[0x6].cpuid[0x1]=0x0
x86.cpu_features.features[0x6].cpuid[0x2]=0x0
x86.cpu_features.features[0x6].cpuid[0x3]=0x0
x86.cpu_features.features[0x6].active[0x0]=0x0
x86.cpu_features.features[0x6].active[0x1]=0x0
x86.cpu_features.features[0x6].active[0x2]=0x0
x86.cpu_features.features[0x6].active[0x3]=0x0
x86.cpu_features.features[0x7].cpuid[0x0]=0x0
x86.cpu_features.features[0x7].cpuid[0x1]=0x0
x86.cpu_features.features[0x7].cpuid[0x2]=0x0
x86.cpu_features.features[0x7].cpuid[0x3]=0x0
x86.cpu_features.features[0x7].active[0x0]=0x0
x86.cpu_features.features[0x7].active[0x1]=0x0
x86.cpu_features.features[0x7].active[0x2]=0x0
x86.cpu_features.features[0x7].active[0x3]=0x0
x86.cpu_features.features[0x8].cpuid[0x0]=0x0
x86.cpu_features.features[0x8].cpuid[0x1]=0x0
x86.cpu_features.features[0x8].cpuid[0x2]=0x0
x86.cpu_features.features[0x8].cpuid[0x3]=0x0
x86.cpu_features.features[0x8].active[0x0]=0x0
x86.cpu_features.features[0x8].active[0x1]=0x0
x86.cpu_features.features[0x8].active[0x2]=0x0
x86.cpu_features.features[0x8].active[0x3]=0x0
x86.cpu_features.preferred.Fast_Rep_String=0x1
x86.cpu_features.preferred.Fast_Copy_Backward=0x0
x86.cpu_features.preferred.Slow_BSF=0x0
x86.cpu_features.preferred.Fast_Unaligned_Load=0x1
x86.cpu_features.preferred.Prefer_PMINUB_for_stringop=0x1
x86.cpu_features.preferred.Fast_Unaligned_Copy=0x1
x86.cpu_features.preferred.I586=0x1
x86.cpu_features.preferred.I686=0x1
x86.cpu_features.preferred.Slow_SSE4_2=0x0
x86.cpu_features.preferred.AVX_Fast_Unaligned_Load=0x1
x86.cpu_features.preferred.Prefer_MAP_32BIT_EXEC=0x0
x86.cpu_features.preferred.Prefer_No_VZEROUPPER=0x0
x86.cpu_features.preferred.Prefer_ERMS=0x0
x86.cpu_features.preferred.Prefer_No_AVX512=0x1
x86.cpu_features.preferred.MathVec_Prefer_No_AVX512=0x0
x86.cpu_features.preferred.Prefer_FSRM=0x0
x86.cpu_features.preferred.Avoid_Short_Distance_REP_MOVSB=0x0
x86.cpu_features.isa_1=0xf
x86.cpu_features.xsave_state_size=0x9c0
x86.cpu_features.xsave_state_full_size=0xb00
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x1000000
x86.cpu_features.non_temporal_threshold=0xc00000
x86.cpu_features.rep_movsb_threshold=0x2000
x86.cpu_features.rep_movsb_stop_threshold=0xc00000
x86.cpu_features.rep_stosb_threshold=0x800
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x200000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x1000000
x86.cpu_features.level3_cache_assoc=0x10
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0

Comment 26 Severin Gehwolf 2021-12-02 14:56:32 UTC
In case it's useful. Here is a diff between '/lib64/ld-linux-x86-64.so.2 --list-diagnostics' of working glibc (2.34.9000-15.fc36) vs glibc where we see those issues (glibc_2.34.9000-21.fc36; release 16 and upwards, actually):

$ diff -u list_diags_glibc*
--- list_diags_glibc-2.34.9000-15.fc36.x86_64	2021-12-02 15:49:11.383652491 +0100
+++ list_diags_glibc_2.34.9000-21.fc36.x86_64.txt	2021-12-02 15:19:44.629731791 +0100
@@ -46,7 +46,7 @@
 version.release="development"
 version.version="2.34.9000"
 auxv[0x0].a_type=0x21
-auxv[0x0].a_val=0x7ffca09a6000
+auxv[0x0].a_val=0x7ffe9f38b000
 auxv[0x1].a_type=0x10
 auxv[0x1].a_val=0xf8bfbff
 auxv[0x2].a_type=0x6
@@ -54,7 +54,7 @@
 auxv[0x3].a_type=0x11
 auxv[0x3].a_val=0x64
 auxv[0x4].a_type=0x3
-auxv[0x4].a_val=0x7f0abe726040
+auxv[0x4].a_val=0x7fc98a7a6040
 auxv[0x5].a_type=0x4
 auxv[0x5].a_val=0x38
 auxv[0x6].a_type=0x5
@@ -64,7 +64,7 @@
 auxv[0x8].a_type=0x8
 auxv[0x8].a_val=0x0
 auxv[0x9].a_type=0x9
-auxv[0x9].a_val=0x7f0abe727090
+auxv[0x9].a_val=0x7fc98a7a7090
 auxv[0xa].a_type=0xb
 auxv[0xa].a_val=0x0
 auxv[0xb].a_type=0xc
@@ -76,7 +76,7 @@
 auxv[0xe].a_type=0x17
 auxv[0xe].a_val=0x0
 auxv[0xf].a_type=0x19
-auxv[0xf].a_val=0x7ffca0812059
+auxv[0xf].a_val=0x7ffe9f265c09
 auxv[0x10].a_type=0x1a
 auxv[0x10].a_val=0x2
 auxv[0x11].a_type=0x1f
@@ -84,7 +84,7 @@
 auxv[0x12].a_type=0xf
 auxv[0x12].a_val="x86_64"
 uname.sysname="Linux"
-uname.nodename="53f4eb59030a49818a91e2e2add27c93"
+uname.nodename="6fac65290c9e478583f88d92dee1be0c"
 uname.release="5.11.12-300.fc34.x86_64"
 uname.version="#1 SMP Wed Apr 7 16:31:13 UTC 2021"
 uname.machine="x86_64"
@@ -96,7 +96,7 @@
 x86.cpu_features.basic.stepping=0x4
 x86.cpu_features.features[0x0].cpuid[0x0]=0x50654
 x86.cpu_features.features[0x0].cpuid[0x1]=0x800
-x86.cpu_features.features[0x0].cpuid[0x2]=0xfffa3203
+x86.cpu_features.features[0x0].cpuid[0x2]=0xfffab223
 x86.cpu_features.features[0x0].cpuid[0x3]=0xf8bfbff
 x86.cpu_features.features[0x0].active[0x0]=0x0
 x86.cpu_features.features[0x0].active[0x1]=0x0
@@ -182,7 +182,6 @@
 x86.cpu_features.preferred.Prefer_No_AVX512=0x1
 x86.cpu_features.preferred.MathVec_Prefer_No_AVX512=0x0
 x86.cpu_features.preferred.Prefer_FSRM=0x0
-x86.cpu_features.preferred.Prefer_AVX2_STRCMP=0x1
 x86.cpu_features.preferred.Avoid_Short_Distance_REP_MOVSB=0x0
 x86.cpu_features.isa_1=0xf
 x86.cpu_features.xsave_state_size=0x9c0
@@ -190,7 +189,7 @@
 x86.cpu_features.data_cache_size=0x8000
 x86.cpu_features.shared_cache_size=0x1000000
 x86.cpu_features.non_temporal_threshold=0xc00000
-x86.cpu_features.rep_movsb_threshold=0x1000
+x86.cpu_features.rep_movsb_threshold=0x2000
 x86.cpu_features.rep_movsb_stop_threshold=0xc00000
 x86.cpu_features.rep_stosb_threshold=0x800
 x86.cpu_features.level1_icache_size=0x8000

Comment 27 Daniel Berrangé 2021-12-02 19:09:27 UTC
(In reply to Severin Gehwolf from comment #18)
> With these glibc packages in a custom mock java-1.8.0-openjdk builds fine on
> this virt-setup, while with 2.34.9000-16 and better it fails. Question is
> what changed in 2.34.9000-16 that might trigger this bug on cascadelake
> virtualized  systems.

In testing EDK2 builds in koji against an F36 rawhide side-tag, I'm hitting a different failure point - 4 builds have now passed on 2.34.9000-16, but I got a failure first time out on 2.34.9000-17

    Upstream commit: d3bf2f5927d51258a51ac7fde04f4805f8ee294a
    
    - elf: Do not run DSO sorting if tunables is not enabled
    - riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN
    - x86-64: Replace movzx with movzbl
    - regex: Unnest nested functions in regcomp.c
    - Use Linux 5.15 in build-many-glibcs.py
    - elf: Assume disjointed .rela.dyn and .rela.plt for loader
    - i386: Explain why __HAVE_64B_ATOMICS has to be 0
    - benchtests: Add hypotf
    - benchtests: Make hypot input random
    - arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors support
    - arm: Use internal symbol for _dl_argv on _dl_start_user
    - x86-64: Remove Prefer_AVX2_STRCMP
    - x86-64: Improve EVEX strcmp with masked load

Based on the Koji VM hw_info.log, we have 'avx2' feature flag set, so the -16 build should have been using the AVX2 strcmp, while with -17 we should now be using the (improved) EVEX strcmp.

Comment 28 Severin Gehwolf 2021-12-02 19:27:53 UTC
(In reply to Daniel Berrangé from comment #27)
> (In reply to Severin Gehwolf from comment #18)
> > With these glibc packages in a custom mock java-1.8.0-openjdk builds fine on
> > this virt-setup, while with 2.34.9000-16 and better it fails. Question is
> > what changed in 2.34.9000-16 that might trigger this bug on cascadelake
> > virtualized  systems.
> 
> In testing EDK2 builds in koji against an F36 rawhide side-tag, I'm hitting
> a different failure point - 4 builds have now passed on 2.34.9000-16, but I
> got a failure first time out on 2.34.9000-17
> 
>     Upstream commit: d3bf2f5927d51258a51ac7fde04f4805f8ee294a
>     
>     - elf: Do not run DSO sorting if tunables is not enabled
>     - riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN
>     - x86-64: Replace movzx with movzbl
>     - regex: Unnest nested functions in regcomp.c
>     - Use Linux 5.15 in build-many-glibcs.py
>     - elf: Assume disjointed .rela.dyn and .rela.plt for loader
>     - i386: Explain why __HAVE_64B_ATOMICS has to be 0
>     - benchtests: Add hypotf
>     - benchtests: Make hypot input random
>     - arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors
> support
>     - arm: Use internal symbol for _dl_argv on _dl_start_user
>     - x86-64: Remove Prefer_AVX2_STRCMP
>     - x86-64: Improve EVEX strcmp with masked load
> 
> Based on the Koji VM hw_info.log, we have 'avx2' feature flag set, so the
> -16 build should have been using the AVX2 strcmp, while with -17 we should
> now be using the (improved) EVEX strcmp.

That's probably true for us as well. Looking back at koshei[1] for java-1.8.0-openjdk
the first failure was actually seen with 2.34.9000-17.fc36 on November 6. So the upgrade
of the build hosts on Nov 8/9 might actually be a red herring. We don't yet know if the
issue is present with RHEL 8.4 virt stack and a rawhide build with glibc 2.34.9000-17.fc36

[1] https://koschei.fedoraproject.org/package/java-1.8.0-openjdk?last_seen_ts=1636467944&collection=f36

Comment 29 H.J. Lu 2021-12-02 23:51:18 UTC
Does it fail only on RHEL 8.5 hosts?  Will it ever fail on Fedora 35 hosts?

Comment 30 Daniel Berrangé 2021-12-03 09:06:12 UTC
Personally I've only been able to reproduce in the Koji builders.

Comment 31 Severin Gehwolf 2021-12-03 09:47:02 UTC
(In reply to H.J. Lu from comment #29)
> Does it fail only on RHEL 8.5 hosts?  Will it ever fail on Fedora 35 hosts?

I don't know. What I do know is that it doesn't fail for me on F34 (host) and F34 (guest) with rawhide mock in the guest. But that's not a cascadelake cpu on the host. So it's an unknown.

Comment 32 Severin Gehwolf 2021-12-03 11:10:20 UTC
(In reply to H.J. Lu from comment #29)
> Does it fail only on RHEL 8.5 hosts?  Will it ever fail on Fedora 35 hosts?

We know this is the type of setup that reproduces the issue:

- Host hardware is Intel Xeon Gold CPU (Cascadelake)
- Host OS is RHEL 8.5 (we haven't checked other versions, yet)
- Guest VM is F34 (we haven't checked others yet; F35 would likely work too).
- Fedora rawhide mock chroot with glibc 2.34.9000-17.fc36 or newer

There might be others that reproduce too, but that's one we know for sure. It's also the setup Koji builders have.

Comment 33 H.J. Lu 2021-12-03 13:50:43 UTC
I have Fedora 35 running on Cascadelake.  I will try mock build for Fedora 35.

Comment 34 Daniel Berrangé 2021-12-03 16:10:24 UTC
I've confirmed it will NOT reproduce on my Skylake-Server VMs.  Since the EVEX impls of strcmp get selected off

    
       if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
          && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
          && CPU_FEATURE_USABLE_P (cpu_features, BMI2))
        return OPTIMIZE (evex);

I decided to try force disabling them by hidding the AVX512VL flag:

   export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512VL

With this set, I can get successful builds of EDK2 in Koji for F36.

IOW points to a bug related to a glibc codepath that triggers off AVX512VL, most likely to be strcmp related due to the changes that appeared in the 2.34.9000-17 build that caused switch from AVX2 to EVEX strcmp impls.

Comment 35 H.J. Lu 2021-12-03 17:17:44 UTC
This is caused by

commit c46e9afb2df5fc9e39ff4d13777e4b4c26e04e55
Author: H.J. Lu <hjl.tools>
Date:   Fri Oct 29 12:40:20 2021 -0700

    x86-64: Improve EVEX strcmp with masked load

Comment 36 Florian Weimer 2021-12-03 17:36:54 UTC
(In reply to H.J. Lu from comment #35)
> This is caused by
> 
> commit c46e9afb2df5fc9e39ff4d13777e4b4c26e04e55
> Author: H.J. Lu <hjl.tools>
> Date:   Fri Oct 29 12:40:20 2021 -0700
> 
>     x86-64: Improve EVEX strcmp with masked load

Thanks, H.J. Reassigning to Fedora glibc.

Comment 37 Florian Weimer 2021-12-03 17:57:23 UTC
I'm working on an emergency fix that removes the EVEX routines as IFUNC candidates.

Comment 38 Severin Gehwolf 2021-12-03 18:00:19 UTC
(In reply to Daniel Berrangé from comment #34)
> I decided to try force disabling them by hidding the AVX512VL flag:
> 
>    export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512VL
> 
> With this set, I can get successful builds of EDK2 in Koji for F36.

I can confirm. This work-around makes our builds pass again too. Thanks for this Daniel!

Comment 39 Florian Weimer 2021-12-03 21:05:29 UTC
I've submitted an update to Bodhi: https://bodhi.fedoraproject.org/updates/FEDORA-2021-92837f7e1f

Not sure why it's not showing up here.

Comment 40 Andrew John Hughes 2021-12-04 03:24:19 UTC
Building OpenJDK in the side tag with the new glibc works: https://koji.fedoraproject.org/koji/taskinfo?taskID=79564840

Can we get this into the f36 buildroot?

Comment 41 H.J. Lu 2021-12-04 05:17:52 UTC
It has been fixed on glibc master branch.  Please verify it.

Comment 42 Florian Weimer 2021-12-04 09:28:03 UTC
I botched the Bodhi push for glibc-2.34.9000-24.fc36 because I got confused by the OpenJDK update in it. I'm trying another push for glibc-2.34.9000-25.fc36. The real fix with glibc-2.34.9000-26.fc36 is also on its way.

Comment 43 Fedora Update System 2021-12-04 09:29:36 UTC
FEDORA-2021-92837f7e1f has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 44 Fedora Update System 2021-12-05 14:33:28 UTC
FEDORA-2021-b4832cd9cb has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 45 Andrew John Hughes 2021-12-05 22:05:22 UTC
Sorry for the confusion. I can confirm builds of OpenJDK 11 & 17 are now passing reliably with the glibc in f36/rawhide on the Copperlake VM.

Comment 46 Severin Gehwolf 2021-12-07 09:37:31 UTC
Could somebody explain to me why we weren't able to reproduce this bug on physical hardware? Is that wrong? If so, what was the magic to be able to trigger it there?

Comment 47 Florian Weimer 2021-12-07 11:16:17 UTC
(In reply to Severin Gehwolf from comment #46)
> Could somebody explain to me why we weren't able to reproduce this bug on
> physical hardware? Is that wrong? If so, what was the magic to be able to
> trigger it there?

I can reproduce it on a physical machine. I just got this when building java-17-openjdk (241e828cfe2ef51c61e5d0e544f613f3ee7bc960) against glibc-2.34.9000-22.fc36.x86_64:

gmake[3]: *** No rule to make target '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-9.rolling.fc36.x86_64/build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/classLoaderData.d'.  Stop.

This is a freshly-installed machine with a Xeon Gold 5118 (so not even Cascade Lake).

Comment 50 H.J. Lu 2021-12-07 12:39:55 UTC
(In reply to Florian Weimer from comment #47)
> (In reply to Severin Gehwolf from comment #46)
> > Could somebody explain to me why we weren't able to reproduce this bug on
> > physical hardware? Is that wrong? If so, what was the magic to be able to
> > trigger it there?
> 
> I can reproduce it on a physical machine. I just got this when building
> java-17-openjdk (241e828cfe2ef51c61e5d0e544f613f3ee7bc960) against
> glibc-2.34.9000-22.fc36.x86_64:
> 
> gmake[3]: *** No rule to make target
> '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-9.rolling.fc36.x86_64/
> build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/
> classLoaderData.d'.  Stop.
> 
> This is a freshly-installed machine with a Xeon Gold 5118 (so not even
> Cascade Lake).

I reproduced this issue on a Tiger Lake laptop by running

$ mock -r fedora-36-x86_64 /tmp/java-1.8.0-openjdk-1.8.0.312.b07-2.fc36.src.rpm

and extracted a glibc testcase from it.  The glibc testcase failed on all
AVX512 machines.

Comment 51 Severin Gehwolf 2021-12-07 12:49:52 UTC
(In reply to Florian Weimer from comment #47)
> (In reply to Severin Gehwolf from comment #46)
> > Could somebody explain to me why we weren't able to reproduce this bug on
> > physical hardware? Is that wrong? If so, what was the magic to be able to
> > trigger it there?
> 
> I can reproduce it on a physical machine. I just got this when building
> java-17-openjdk (241e828cfe2ef51c61e5d0e544f613f3ee7bc960) against
> glibc-2.34.9000-22.fc36.x86_64:
> 
> gmake[3]: *** No rule to make target
> '/builddir/build/BUILD/java-17-openjdk-17.0.1.0.12-9.rolling.fc36.x86_64/
> build/jdk17.build-slowdebug-main/hotspot/variant-server/libjvm/objs/
> classLoaderData.d'.  Stop.
> 
> This is a freshly-installed machine with a Xeon Gold 5118 (so not even
> Cascade Lake).

Deterministically? On those rare occasions where we hit the physical machines in koji we got successful builds.

Or does this physical model not reproduce?

Model name:                      Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz

Examples (of successful builds with affected glibc versions in BR):

https://koji.fedoraproject.org/koji/taskinfo?taskID=79432315
https://koji.fedoraproject.org/koji/taskinfo?taskID=78753824
https://koji.fedoraproject.org/koji/taskinfo?taskID=79573328
https://koji.fedoraproject.org/koji/taskinfo?taskID=79566441
https://koji.fedoraproject.org/koji/taskinfo?taskID=79573327

Comment 52 Daniel Berrangé 2021-12-07 12:53:39 UTC
(In reply to Severin Gehwolf from comment #51)
> (In reply to Florian Weimer from comment #47)
> > (In reply to Severin Gehwolf from comment #46)
> > This is a freshly-installed machine with a Xeon Gold 5118 (so not even
> > Cascade Lake).
> 
> Deterministically? On those rare occasions where we hit the physical
> machines in koji we got successful builds.
> 
> Or does this physical model not reproduce?
> 
> Model name:                      Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> 
> Examples (of successful builds with affected glibc versions in BR):
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=79432315
> https://koji.fedoraproject.org/koji/taskinfo?taskID=78753824
> https://koji.fedoraproject.org/koji/taskinfo?taskID=79573328
> https://koji.fedoraproject.org/koji/taskinfo?taskID=79566441
> https://koji.fedoraproject.org/koji/taskinfo?taskID=79573327

Look at the 'hw_info.log' for the x86_64 task, eg for that first build we have

https://kojipkgs.fedoraproject.org//work/tasks/2480/79432480/hw_info.log

which does NOT report avx512 CPU flags, so it isn't affected by the bug.

Comment 53 Severin Gehwolf 2021-12-07 13:00:08 UTC
(In reply to Daniel Berrangé from comment #52)
> (In reply to Severin Gehwolf from comment #51)
> > (In reply to Florian Weimer from comment #47)
> > > (In reply to Severin Gehwolf from comment #46)
> > > This is a freshly-installed machine with a Xeon Gold 5118 (so not even
> > > Cascade Lake).
> > 
> > Deterministically? On those rare occasions where we hit the physical
> > machines in koji we got successful builds.
> > 
> > Or does this physical model not reproduce?
> > 
> > Model name:                      Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> > 
> > Examples (of successful builds with affected glibc versions in BR):
> > 
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=79432315
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=78753824
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=79573328
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=79566441
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=79573327
> 
> Look at the 'hw_info.log' for the x86_64 task, eg for that first build we
> have
> 
> https://kojipkgs.fedoraproject.org//work/tasks/2480/79432480/hw_info.log
> 
> which does NOT report avx512 CPU flags, so it isn't affected by the bug.

OK, got it. Thanks!


Note You need to log in before you can comment on or make changes to this bug.