Bug 1804953

Summary: UEFI installs from live images fail since around Fedora-Rawhide-20200214.n.1 (boot loader entry creation fails)
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: efivarAssignee: Peter Jones <pjones>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 32CC: airlied, browseria, bskeggs, bugzilla, fzatlouk, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, pbrobinson, pjones, robatino, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: openqa AcceptedBlocker
Fixed In Version: efivar-37-6.fc32 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-27 15:11:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1705303    
Attachments:
Description Flags
gdb stack trace efibootmgr none

Description Adam Williamson 2020-02-19 23:57:04 UTC
Somewhere between Fedora-Rawhide-20200204.n.0 and Fedora-Rawhide-20200214.n.1 - I can't be more specific because the tests failed earlier, due to other bugs, on the composes between those dates - UEFI live installs started failing. We see an error in the UI:

"Failed to set new efi boot target. This is most likely a kernel or firmware bug."

Looking at program.log , we see precisely what failed:

12:20:57,262 INF program: Running in chroot '/mnt/sysroot'... efibootmgr -c -w -L Fedora -d /dev/vda -p 1 -l \EFI\fedora\shimx64.efi
12:20:58,115 DBG program: Return code: -11

This is affecting both KDE and Workstation lives. Non-live UEFI installs are working OK.

I'm assigning the bug to kernel because neither efibootmgr itself nor edk2 seem to have changed in the affected time frame, but the kernel went from 5.5 to 5.6 during that time frame. Seems like a reasonable suspect.

This is affecting both Rawhide and F32. Nominating as an F32 Beta blocker, as a violation of "The installer must be able to complete an installation to a single disk using automatic partitioning" for x86_64 UEFI (*possibly* only in a UEFI VM - haven't checked if this affects a real UEFI machine yet).

Comment 1 Adam Williamson 2020-02-20 00:22:55 UTC
*** Bug 1804956 has been marked as a duplicate of this bug. ***

Comment 2 Adam Williamson 2020-02-20 00:24:04 UTC
Chris has more details about the crash in the dupe. Chris, could you generate the *full* traceback? That'd be useful, I think.

Comment 3 Chris Murphy 2020-02-20 01:04:25 UTC
F31 with kernel 5.6.0-0.rc2.git0.1.fc32.x86_64, the problem doesn't happen.
F32 with kernel 5.5.3, the problem does happen.

So I don't think it's the kernel. My suspect is glibc-2.31-1.fc32.x86_64 (2020-02-04).

Comment 4 Chris Murphy 2020-02-20 01:09:34 UTC
Created attachment 1664216 [details]
gdb stack trace efibootmgr

Comment 5 Chris Murphy 2020-02-20 01:13:24 UTC
I wonder if this is related to bug 1773175, which I'm still seeing on both F31 and F32.

Comment 6 Adam Williamson 2020-02-20 17:51:38 UTC
Confirmed this is affecting F32 as well as Rawhide (so blocker nomination stands).

Comment 7 Chris Murphy 2020-02-20 18:54:48 UTC
> Non-live UEFI installs are working OK.

Huh. So what's different between live and non-live? The non-live is actually still "LiveOS" based in terms of assembly, it just doesn't have a complex desktop environment. Right?

strlen.S is part of glibc, "parse_acpi_root" is found in efivar/linux-acpi-root.c

I dunno, ping pjones?

Comment 8 Adam Williamson 2020-02-20 20:08:06 UTC
well, there are various things...I'm not sure it's worth trying to pin down from that angle, probably best just to work the crash from the trace and once we figure out the cause it'll probably become clear why it's not happening on the installer images...

Comment 9 browseria 2020-02-20 22:46:37 UTC
See also: bug 1804862

Comment 10 Chris Murphy 2020-02-22 07:53:28 UTC
strace efibootmgr in the two environments is not that interesting


netinstall:

openat(AT_FDCWD, "/dev/vda", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0xfc, 0), ...}) = 0
readlink("/sys/dev/block/252:0", "../../devices/pci0000:00/0000:00"..., 4096) = 68
readlink("/sys/block/vda/device", "../../../virtio2", 4096) = 16
readlink("/sys/block/vda/device/driver", "../../../../../bus/virtio/driver"..., 4096) = 44
openat(AT_FDCWD, "/sys/devices/pci0000:00/firmware_node/path", O_RDONLY) = 4


live:

openat(AT_FDCWD, "/dev/vda", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0xfc, 0), ...}) = 0
readlink("/sys/dev/block/252:0", "../../devices/pci0000:00/0000:00"..., 4096) = 68
readlink("/sys/block/vda/device", "../../../virtio2", 4096) = 16
readlink("/sys/block/vda/device/driver", "../../../../../bus/virtio/driver"..., 4096) = 44
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x564e2bc42d7d} ---
+++ killed by SIGSEGV (core dumped) +++

Comment 11 Chris Murphy 2020-02-22 08:30:01 UTC
# coredumpctl
TIME                            PID   UID   GID SIG COREFILE  EXE
Sat 2020-02-22 03:05:36 EST    2390     0     0  11 present   /usr/sbin/efibootmgr
Sat 2020-02-22 03:18:59 EST   16906     0     0  11 present   /mnt/sysroot/usr/sbin/efibootmgr
# 

In between those crashes, I ran 
# rpm -Uvh --oldpackage https://kojipkgs.fedoraproject.org//packages/efivar/37/4.fc32/x86_64/efivar-libs-37-4.fc32.x86_64.rpm

And now the first command no longer crashes (using GNOME Terminal), but the subsequent installation attempt fails. My guess, the live-base image contains efivar-libs-37-5, that's what gets installed, and that's what the installer runs. But I think downgrading install media to efivar-libs-37-4 will fix this bug.

Comment 12 Adam Williamson 2020-02-22 23:28:45 UTC
yes, 'downgrading' the package only affects the live overlay, it won't affect what the installer installs, and the command is run out of the installed system root so it'll be -5.

The most obvious difference between -4 and -5 is that -5 will have been built with GCC 10, -4 with GCC 9. I checked the build logs but to a quick eyeball check they're identical, no juicy error or warning messages...

Comment 13 Adam Williamson 2020-02-22 23:37:54 UTC
Still, your theory is strange for one reason - efivar -5 was built on 20200128, but we had five composes (0131.n.0 through 0204.n.0) where install succeeded, *after* that date. The tests ultimately failed because of anaconda failing to reboot after install, but the bootloader install phase worked.

Comment 14 Chris Murphy 2020-02-23 08:02:16 UTC
Problem doesn't happen on baremetal, 1 for 1 attempt.

Comment 15 FrantiĊĦek Zatloukal 2020-02-24 20:28:14 UTC
Discussed during the 2020-02-24 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made:

"The installer must be able to complete an installation to a single disk using automatic partitioning" for x86_64 UEFI

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2020-02-24/f32-blocker-review.2020-02-24-17.00.log.txt

Comment 16 Fedora Update System 2020-02-24 20:31:55 UTC
FEDORA-2020-8ef75170b3 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-8ef75170b3

Comment 17 Fedora Update System 2020-02-24 20:31:56 UTC
FEDORA-2020-8ef75170b3 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-8ef75170b3

Comment 18 Adam Williamson 2020-02-26 23:16:43 UTC
openQA testing confirms the fix for this; tests run with the updated efivar are all passing, tests run without it are all failing.

Comment 19 Fedora Update System 2020-02-27 15:11:53 UTC
efivar-37-6.fc32 has been pushed to the Fedora 32 stable repository. If problems still persist, please make note of it in this bug report.