Bug 2165136

Summary: grub2-2.06-77.fc38 crashing in `grub2-mkconfig`
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: grub2Assignee: Javier Martinez Canillas <fmartine>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: fmartine, lkundrak, pgnet.dev, pjones, rharwood, robatino
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard: openqa
Fixed In Version: grub2-2.06-78.fc38 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-27 22:47:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2083910    
Attachments:
Description Flags
backtrace of crashed grub2-probe none

Description Adam Williamson 2023-01-27 20:07:38 UTC
grub2-2.06-77.fc38 seems to crash consistently in an openQA test which runs `grub2-mkconfig -o $(readlink -m /etc/grub2.cfg)`. The journal shows:

Jan 27 11:54:03 localhost.localdomain kernel: grub2-probe[1792]: segfault at 28 ip 0000559ac5b459e6 sp 00007ffceea96e40 error 4 in grub2-probe[559ac5a54000+102000] likely on CPU 1 (core 1, socket 0)
Jan 27 11:54:03 localhost.localdomain kernel: Code: c7 b8 00 00 00 00 e8 7c dd fe ff 48 8b 45 c8 48 89 c6 48 8d 05 6b b2 03 00 48 89 c7 b8 00 00 00 00 e8 8b b4 fe ff 48 8b 45 90 <48> 8b 40 28 48 85 c0 74 11 48 8b 45 90 48 8b 40 28 48 8b 40 10 48
Jan 27 11:54:03 localhost.localdomain audit[1792]: ANOM_ABEND auid=0 uid=0 gid=0 ses=3 subj=unconfined_u:unconfined_r:bootloader_t:s0-s0:c0.c1023 pid=1792 comm="grub2-probe" exe="/usr/sbin/grub2-probe" sig=11 res=1
Jan 27 11:54:03 localhost.localdomain systemd[1]: Created slice system-systemd\x2dcoredump.slice - Slice /system/systemd-coredump.
Jan 27 11:54:03 localhost.localdomain audit: BPF prog-id=138 op=LOAD
Jan 27 11:54:03 localhost.localdomain audit: BPF prog-id=139 op=LOAD
Jan 27 11:54:03 localhost.localdomain audit: BPF prog-id=140 op=LOAD
Jan 27 11:54:03 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-1796-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 27 11:54:03 localhost.localdomain systemd[1]: Started systemd-coredump - Process Core Dump (PID 1796/UID 0).
Jan 27 11:54:03 localhost.localdomain systemd-coredump[1797]: Resource limits disable core dumping for process 1792 (grub2-probe).
Jan 27 11:54:03 localhost.localdomain systemd-coredump[1797]: Process 1792 (grub2-probe) of user 0 dumped core.
Jan 27 11:54:03 localhost.localdomain systemd[1]: systemd-coredump: Deactivated successfully.

I haven't backtraced the dump yet.

Comment 1 Adam Williamson 2023-01-27 20:32:54 UTC
This breaks install: https://openqa.fedoraproject.org/tests/1722170

so marking it as a Beta blocker as a violation of "The installer must be able to complete an installation to a single disk using automatic partitioning" (and any other "install must work" criterion).

Comment 2 Adam Williamson 2023-01-27 20:35:17 UTC
Created attachment 1940804 [details]
backtrace of crashed grub2-probe

Here's a backtrace of the crashed grub2-probe process. Note before the backtrace, gdb shows the message "142    ../grub-core/disk/diskfilter.c: Bad file descriptor."

Comment 3 Robbie Harwood 2023-01-27 22:47:50 UTC
With new build, this no longer reproduces under local testing.