Bug 2026744 - Graphics initialization often fails after reboot in x86_64 UEFI VMs (Fedora 35)
Summary: Graphics initialization often fails after reboot in x86_64 UEFI VMs (Fedora 35)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: edk2
Version: 35
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Paolo Bonzini
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-25 17:26 UTC by Adam Williamson
Modified: 2021-12-09 01:11 UTC (History)
10 users (show)

Fixed In Version: edk2-20211126gitbb1bba3d7767-1.fc35
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 01:11:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2021-11-25 17:26:23 UTC
I recently upgraded the openQA staging instance 'worker' machines - the ones that run the actual tests, in qemu VMs - to Fedora 35. Since then, I have noticed that x86_64 UEFI tests often fail because graphics initialization fails after a reboot.

For instance, if you look at the x86_64 results for the current Rawhide compose on staging:

https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=Rawhide&build=Fedora-Rawhide-20211125.n.0&groupid=1

all the failures in _console_wait_login or _graphical_wait_login for tests whose names end in "@uefi" are of this type. There are 15 of them. If you compare with the exact same set of tests on production openQA (where the workers are still on Fedora 34), there are no failures of this kind:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=Rawhide&build=Fedora-Rawhide-20211125.n.0&groupid=1

There are also no failures of this kind on staging for aarch64:

https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=Rawhide&build=Fedora-Rawhide-20211125.n.0&groupid=6

which indicates this bug is limited to x86_64 for some reason. I have also not seen any case where graphics initialization fails on the *first* boot within any given test run; it always seems to fail after a reboot.

The staging x86_64 worker host is running:

qemu-6.1.0-10.fc35.x86_64
edk2-ovmf-20210527gite1999b264f1f-2.fc35.noarch

note it is also running an old kernel - 5.11.21-300.fc34.x86_64 - due to https://bugzilla.redhat.com/show_bug.cgi?id=2009585 , though I doubt this is relevant. The openQA package versions are the same on prod and stg currently, so openQA is not doing anything differently between the two cases.

On affected boots, we just see the qemu "Display output is not active." message. Sometimes we briefly see a Fedora bootsplash screen before this (Fedora logo at the bottom, spinner, TianoCore logo and message at the top), but I'm not actually sure if that's happening during the shutdown or the startup.

Here's a qemu command line from an affected boot:

/usr/bin/qemu-system-x86_64 -vga virtio -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -cpu Nehalem -netdev user,id=qanet0,net=172.16.2.0/24 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -device usb-ehci -device usb-tablet -smp 2 -enable-kvm -no-shutdown -vnc :106,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/16/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/16/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/16/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/16/raid/pflash-vars-overlay0,unit=1

Comment 1 Adam Williamson 2021-11-25 17:34:08 UTC
oh, the pflash-code-overlay is backed by /usr/share/edk2/ovmf/OVMF_CODE.fd and the pflash-vars-overlay is backed by /usr/share/edk2/ovmf/OVMF_VARS.fd , i.e., we're just using standard files for those.

Comment 2 Adam Williamson 2021-11-25 20:57:26 UTC
It looks like edk2 is the culprit here. I downgraded edk2-ovmf to edk2-ovmf-20200801stable-4.fc34.noarch and re-ran the tests on the same compose:

https://openqa.stg.fedoraproject.org/tests/overview?build=Fedora-Rawhide-20211125.n.0&distri=fedora&version=Rawhide&groupid=1

as you can see, there are no longer any failures of the same type. All the UEFI install tests passed. So, the problem here was introduced between the 202008 and 202105 releases of edk2. I would try with 202108, but the package has a lot of patches and I'm not enough of an expert to re-diff them.

Comment 3 Gerd Hoffmann 2021-11-26 07:07:04 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=78893852
202108 scratch build

I suspect this is a guest kernel issue though, what the firmware does or doesn't do shouldn't matter much once the kms driver took over.
Any chance this is virtio-vga with 5.16-rc1?

Comment 4 Adam Williamson 2021-11-26 16:52:16 UTC
I doubt that.

1) It doesn't happen if the host is running Fedora 34, running the exact same tests (i.e. the exact same guest software)
2) It doesn't happen with downgraded edk2 on the host (since I downgraded it, that host has run hundreds of tests, and not hit the bug once)
3) It happens on tests with lots of different guest contents - here it happened on a test of a Fedora 34 update: https://openqa.stg.fedoraproject.org/tests/1465144
4) As best as I can tell, the failure seems to be happening well before we reach the guest kernel

Thanks for the scratch build, I'll try that.

Comment 5 Adam Williamson 2021-11-26 19:00:46 UTC
Looks like edk2 202108 still has the bug:

https://openqa.stg.fedoraproject.org/tests/1469831#

I'll go back to 202008 for now.

Comment 6 Gerd Hoffmann 2021-11-29 10:34:37 UTC
(In reply to Adam Williamson from comment #4)
> I doubt that.
> 
> 4) As best as I can tell, the failure seems to be happening well before we
> reach the guest kernel

https://openqa.stg.fedoraproject.org/tests/1469831#step/_graphical_wait_login/3

That one is clearly in the guest kernel.  This is the plymouth boot screen.

The logs are not very helpful though.
serial0.txt has just the grub boot menu, no kernel log.
serial_terminal.txt has a login prompt, so the guest apparently
booted up just fine, only the graphical display is broken.

What is the exact guest configuration used by openqa?
Can you attach the libvirt domain xml?

https://openqa.stg.fedoraproject.org/tests/1469831#step/_graphical_wait_login/4

This is how the display looks like when the guest turns off the monitor.
You can typically see that when the guest has been sitting around idle
for a while and the screen saver kicks in.  Wiggle the mouse to wake up
the guest display, like on physical hardware.

It should certainly NOT happen in the middle of a boot though.
I suspect initial display on efifb works fine -> kms driver tries
to take over -> something goes wrong -> display is broken.

No clue how edk2 could trigger such a failure, although that apparently
is what happens.  A kernel log would be helpful to see whenever the drm
driver throws any errors.

Comment 7 Gerd Hoffmann 2021-11-29 15:02:56 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=79400390
2021-11 rebase

Comment 8 Adam Williamson 2021-11-29 16:21:19 UTC
"That one is clearly in the guest kernel.  This is the plymouth boot screen."

As I said in the initial report above: "Sometimes we briefly see a Fedora bootsplash screen before this (Fedora logo at the bottom, spinner, TianoCore logo and message at the top), but I'm not actually sure if that's happening during the shutdown or the startup." All we know about the screenshots you see there is that they happen between the test typing "reboot" and it failing. We can't tell exactly where in that sequence the reboot actually happened, unless we happen to capture pictures from elsewhere in the boot sequence, which we didn't.

"serial_terminal.txt has a login prompt, so the guest apparently booted up just fine, only the graphical display is broken."

again, this could be from the initial boot that worked, not the reboot that failed. Contents from that boot won't be magically erased, or anything. I'd have to compare to a test that passed to see if there might be any observable difference there, though it seems kinda immaterial.

"What is the exact guest configuration used by openqa? Can you attach the libvirt domain xml?"

The exact guest configuration is the command line in the initial report. This is no libvirt domain xml. openQA does not use libvirt.

"It should certainly NOT happen in the middle of a boot though."

As I said, we do not know it's happening in the "middle" of a boot.

Comment 9 Adam Williamson 2021-11-29 16:25:28 UTC
Actually, if you step very carefully through the video at https://openqa.stg.fedoraproject.org/tests/1469831/video?filename=video.ogv , some shutdown-y messages show up after the bootsplash screen, then it immediately goes to the "Display not active" screen. So I think the bootsplash is indeed from shutdown, not startup.

Comment 10 Adam Williamson 2021-12-02 05:51:59 UTC
The 2021-11 rebase seems to do the trick. I ran a full set of tests with that installed on the worker and they didn't hit the bug.

Comment 11 Gerd Hoffmann 2021-12-02 11:22:47 UTC
new f35 scatch build (one more fix added)
https://koji.fedoraproject.org/koji/taskinfo?taskID=79476301

rawhide scatch build fails (same srpm), and I have no clue why:
https://koji.fedoraproject.org/koji/taskinfo?taskID=79504225

Note the rather stange lines in the build log ...

GNUmakefile:1402: warning: overriding recipe for target '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.obj'
GNUmakefile:1102: warning: ignoring old recipe for target '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.obj'
GNUmakefile:2222: warning: overriding recipe for target '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/ocsp/ocsp_err.obj'
GNUmakefile:1292: warning: ignoring old recipe for target '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/ocsp/ocsp_err.obj'
GNUmakefile:2592: warning: overriding recipe for target '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.obj'
GNUmakefile:1402: warning: ignoring old recipe for target '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.obj'
[ ... ]
make: *** No rule to make target '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/m_md5_sha1.obj', needed by '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/OpensslLib.lib'.  Stop.

... which are not in the f35 logs.

Another build with a "cat -n GNUmakefile" hacked into the specfile doesn't give a clue either,
there are no oddities in the GNUmakefile
https://koji.fedoraproject.org/koji/taskinfo?taskID=79475165

this blocks updating edk2, any hints are welcome

Comment 12 Daniel Berrangé 2021-12-02 11:33:42 UTC
(In reply to Gerd Hoffmann from comment #11)
> new f35 scatch build (one more fix added)
> https://koji.fedoraproject.org/koji/taskinfo?taskID=79476301
> 
> rawhide scatch build fails (same srpm), and I have no clue why:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=79504225
> 
> Note the rather stange lines in the build log ...
> 
> GNUmakefile:1402: warning: overriding recipe for target
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.
> obj'
> GNUmakefile:1102: warning: ignoring old recipe for target
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.
> obj'
> GNUmakefile:2222: warning: overriding recipe for target
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/ocsp/ocsp_err.
> obj'
> GNUmakefile:1292: warning: ignoring old recipe for target
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/ocsp/ocsp_err.
> obj'
> GNUmakefile:2592: warning: overriding recipe for target
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.
> obj'
> GNUmakefile:1402: warning: ignoring old recipe for target
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/rsa/rsa_x931.
> obj'
> [ ... ]
> make: *** No rule to make target
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/m_md5_sha1.
> obj', needed by
> '/builddir/build/BUILD/edk2-bb1bba3d7767/Build/OvmfX64/DEBUG_GCC5/X64/
> CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/OpensslLib.lib'.  Stop.
> 
> ... which are not in the f35 logs.
> 
> Another build with a "cat -n GNUmakefile" hacked into the specfile doesn't
> give a clue either,
> there are no oddities in the GNUmakefile
> https://koji.fedoraproject.org/koji/taskinfo?taskID=79475165
> 
> this blocks updating edk2, any hints are welcome

I've been debugging this problem for the past 2 days. The generated makefiles all look sane. The makefile warnings are a little odd suggesting some bad content is written into the makefile, but dumping it in the SaveFileOnChange method in EDK build system  shows no difference from content written on F35.

The F36 build will occassionally work when Koji happens to put it on a VM with 48 vCPUs, instead of the normal 6 vCPUs. So there is something racy going on here and likely related to either glibc or gcc changes in F36, since other package changes in the build root appear largely uninteresting. I'm currently trying to narrow down when/where it started breaking by trying older F36 build roots in koji

Comment 13 Daniel Berrangé 2021-12-02 11:42:08 UTC
In the following build log I've captured generated makefiles and have make -d running for OpensslLib

https://koji.fedoraproject.org/koji/taskinfo?taskID=79502018
https://kojipkgs.fedoraproject.org//work/tasks/2018/79502018/build.log

For a successfully built object, we see make immediately look for the .c target file:

    Considering target file '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/bio_ok.obj'.
     File '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/bio_ok.obj' does not exist.
      Considering target file '/builddir/build/BUILD/edk2-e1999b264f1f/CryptoPkg/Library/OpensslLib/openssl/crypto/evp/bio_ok.c'.
       Looking for an implicit rule for '/builddir/build/BUILD/edk2-e1999b264f1f/CryptoPkg/Library/OpensslLib/openssl/crypto/evp/bio_ok.c'.

But for an unsuccessful object, we see it never even gets to looking for the .c target, it disappears off in the weeds for

    Considering target file '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj'.
     File '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj' does not exist.
     Looking for an implicit rule for '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj'.
     Trying pattern rule with stem 'c_allc.obj'.
     Trying implicit prerequisite '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj.o'.
     Trying pattern rule with stem 'c_allc.obj'.
     Trying implicit prerequisite '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj.c'.
     Trying pattern rule with stem 'c_allc.obj'.
     Trying implicit prerequisite '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj.cc'.
     Trying pattern rule with stem 'c_allc.obj'.
     Trying implicit prerequisite '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj.C'.
     Trying pattern rule with stem 'c_allc.obj'.
     Trying implicit prerequisite '/builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj.cpp'.
     Trying pattern rule with stem 'c_allc.obj'.


It appears to have lost the rule for matching  c_allc.obj with c_allc.c  but the GNUmakefile dump show it exists

$(OUTPUT_DIR)/openssl/crypto/evp/c_allc.obj : $(MAKE_FILE)
$(OUTPUT_DIR)/openssl/crypto/evp/c_allc.obj : $(DEBUG_DIR)/AutoGen.h
$(OUTPUT_DIR)/openssl/crypto/evp/c_allc.obj : $(WORKSPACE)/CryptoPkg/Library/OpensslLib/openssl/crypto/evp/c_allc.c
	"$(CC)" $(DEPS_FLAGS) $(CC_RESP) -c -o /builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/c_allc.obj  /builddir/build/BUILD/edk2-e1999b264f1f/CryptoPkg/Library/OpensslLib/openssl/crypto/evp/c_allc.c


which is identical to the earlier bio_ok.obj rules that just worked

$(OUTPUT_DIR)/openssl/crypto/evp/bio_ok.obj : $(MAKE_FILE)
$(OUTPUT_DIR)/openssl/crypto/evp/bio_ok.obj : $(DEBUG_DIR)/AutoGen.h
$(OUTPUT_DIR)/openssl/crypto/evp/bio_ok.obj : $(WORKSPACE)/CryptoPkg/Library/OpensslLib/openssl/crypto/evp/bio_ok.c
	"$(CC)" $(DEPS_FLAGS) $(CC_RESP) -c -o /builddir/build/BUILD/edk2-e1999b264f1f/Build/OvmfX64/DEBUG_GCC5/X64/CryptoPkg/Library/OpensslLib/OpensslLib/OUTPUT/openssl/crypto/evp/bio_ok.obj  /builddir/build/BUILD/edk2-e1999b264f1f/CryptoPkg/Library/OpensslLib/openssl/crypto/evp/bio_ok.c

Comment 14 Laszlo Ersek 2021-12-02 13:47:24 UTC
I've found this, from a simple web search:

https://stackoverflow.com/questions/22127119/gnu-make-warning-ignoring-old-commands-for-target-xxx

it says that "The dependencies are distinct from the commands. The dependency on /a/ is not forgotten, but the commands are."

However, that seems irrelevant in this case. The makefile snippets quoted by Daniel in comment 13 do not contain multiple commands (multiple recipes). They contain multiple *dependencies*, yes, but only one command (= only one recipe) for making the target.

... Or, perhaps, is that precisely the regression? "Make" now forgets the dependency as well, when it forgets (for some reason) a recipe?

Comment 15 Daniel Berrangé 2021-12-02 14:28:16 UTC
(In reply to Daniel Berrangé from comment #12)
> (In reply to Gerd Hoffmann from comment #11)
> > new f35 scatch build (one more fix added)
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=79476301
> > 
> > rawhide scatch build fails (same srpm), and I have no clue why:
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=79504225
> > 
> > Note the rather stange lines in the build log ...

snip

> > Another build with a "cat -n GNUmakefile" hacked into the specfile doesn't
> > give a clue either,
> > there are no oddities in the GNUmakefile
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=79475165
> > 
> > this blocks updating edk2, any hints are welcome
> 
> I've been debugging this problem for the past 2 days. The generated
> makefiles all look sane. The makefile warnings are a little odd suggesting
> some bad content is written into the makefile, but dumping it in the
> SaveFileOnChange method in EDK build system  shows no difference from
> content written on F35.

It is looking like we have a glibc bug in rawhide

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/BVPXQXFC6DLFQUHXROS35Y4ITEJ5TEZQ/
https://bugzilla.redhat.com/show_bug.cgi?id=2026399


> 
> The F36 build will occassionally work when Koji happens to put it on a VM
> with 48 vCPUs, instead of the normal 6 vCPUs. So there is something racy
> going on here and likely related to either glibc or gcc changes in F36,
> since other package changes in the build root appear largely uninteresting.
> I'm currently trying to narrow down when/where it started breaking by trying
> older F36 build roots in koji

Comment 17 Daniel Berrangé 2021-12-03 16:13:01 UTC
(In reply to Laszlo Ersek from comment #16)
> If one of the glibc functions is too clever for its own good, can it be
> disabled via ifunc override? (LD_HWCAP_MASK)

Great tip, I've added

  export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512VL

to edk2.spec and build now passes on F36 rawhide. We can leave that workaround in the spec until there is a resolution for bug 2026399 (whether it turns out to be glibc or qemu/kvm at fault), so we're unblocked on edk2 builds

Comment 18 Fedora Update System 2021-12-07 12:05:39 UTC
FEDORA-2021-887e8d3a64 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-887e8d3a64

Comment 19 Fedora Update System 2021-12-08 01:35:59 UTC
FEDORA-2021-887e8d3a64 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-887e8d3a64`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-887e8d3a64

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2021-12-09 01:11:57 UTC
FEDORA-2021-887e8d3a64 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.