Bug 1323842
Summary: | grubby-8.28-17.el7.x86_64 - --default-kernel cannot be retrieved even if correctly set on systems with different root uuids | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Douglas Schilling Landgraf <dougsland> | ||||
Component: | grubby | Assignee: | Peter Jones <pjones> | ||||
Status: | CLOSED ERRATA | QA Contact: | Release Test Team <release-test-team-automation> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 7.4 | CC: | cshao, dougsland, fmartine, huzhao, jstodola, pjones, rbarry, weiwang, ycui | ||||
Target Milestone: | rc | Keywords: | OtherQA, Reopened, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | grubby-8.28-26.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1690324 (view as bug list) | Environment: | |||||
Last Closed: | 2019-08-06 13:07:10 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1366785 | ||||||
Bug Blocks: | 1366549, 1670127, 1690324 | ||||||
Attachments: |
|
Description
Douglas Schilling Landgraf
2016-04-04 22:33:25 UTC
Created attachment 1143466 [details]
output strace
From the RHEV side this is required to enable rollback. I can not reproduce this on fedora 22: [root@tee ~]# ls -shal /boot/vmlinuz-4.3.4-200.fc22.x86_64 5,8M -rwxr-xr-x. 1 root root 5,8M 25. Jan 14:47 /boot/vmlinuz-4.3.4-200.fc22.x86_64 [root@tee ~]# grub2-editenv list saved_entry=Fedora (4.4.6-200.fc22.x86_64) 22 (Twenty Two) [root@tee ~]# grubby --set-default=/boot/vmlinuz-4.3.4-200.fc22.x86_64 [root@tee ~]# grub2-editenv list saved_entry=Fedora (4.3.4-200.fc22.x86_64) 22 (Twenty Two) Changed [root@tee ~]# grubby --set-default=/boot/vmlinuz-4.4.6-200.fc22.x86_64 [root@tee ~]# grub2-editenv list saved_entry=Fedora (4.4.6-200.fc22.x86_64) 22 (Twenty Two) Changed again Douglas, can you try the same flow on CentOS? In a pure Centos 7 I cannot reproduce, here more debugging: Example output: # ls -sha1 /boot/ovirt-node-ng-4.0.0-0.20160413.0+1/vmlinuz-3.10.0-327.13.1.el7.x86_64 5.0M /boot/ovirt-node-ng-4.0.0-0.20160413.0+1/vmlinuz-3.10.0-327.13.1.el7.x86_64 # grubby --set-default /boot/ovirt-node-ng-4.0.0-0.20160413.0+1/vmlinuz-3.10.0-327.13.1.el7.x86_64 # grub2-editenv list saved_entry=centos_installed/ovirt-node-ng-4.0.0-0.20160413.0+1 (oVirt Node 4.0.0_master) # grubby --default-kernel # Using gdb: #gdb /usr/sbin/grubby --default-kernel gdb> b suitableImage 2263 rootdev = findDiskForRoot(); (gdb) p dev $23 = 0x611540 "/dev/centos_installed/ovirt-node-ng-4.0.0-0.20160413.0+1" 2277 if (strcmp(getuuidbydev(rootdev), getuuidbydev(dev))) { (gdb) p rootdev $24 = 0x611590 "/dev/mapper/centos_installed-ovirt--node--ng--4.0.0--0+1" (gdb) p dev $25 = 0x611540 "/dev/centos_installed/ovirt-node-ng-4.0.0-0.20160413.0+1" (gdb) n 2278 notSuitablePrintf(entry, 0, "uuid mismatch: rootdev %s, dev %s\n", getuuidbydev(rootdev), getuuidbydev(dev)); (gdb) p getuuidbydev(rootdev) $1 = 0x6257f0 "3c1df351-9205-4743-82df-6c1954d6495a" (gdb) p getuuidbydev(dev) value has been optimized out # ls /dev/mapper/centos_installed-ovirt--node--ng--4.0.0--0+1 -la lrwxrwxrwx. 1 root root 7 Apr 13 05:22 /dev/mapper/centos_installed-ovirt--node--ng--4.0.0--0+1 -> ../dm-3 [root@localhost grubby-8.28-1]# ls /dev/centos_installed/ovirt-node-ng-4.0.0-0.20160413.0+1 -la lrwxrwxrwx. 1 root root 7 Apr 13 05:22 /dev/centos_installed/ovirt-node-ng-4.0.0-0.20160413.0+1 -> ../dm-7 Here the fulll code: if (strcmp(getuuidbydev(rootdev), getuuidbydev(dev))) { notSuitablePrintf(entry, 0, "uuid mismatch: rootdev %s, dev %s\n", getuuidbydev(rootdev), getuuidbydev(dev)); free(rootdev); return 0; } Hm. I ran the same on my CentOS setup to try to reproduce this, but I can't. Having to images available I can chang ethe default from one to the other and vice versa using grubby. This is still reproducible. grubby does successfully set the new kernel, but grubby itself can't find it: # grubby --default-kernel # This works ok with --bad-image-okay, which is an acceptable workaround for now (we're using grubby to abstract bootloader operations, so as long as it *works* we can live with it), but it would be nice to have it fixed. # grubby --default-kernel --bad-image-okay /boot/ovirt-node-ng-4.0.0-0.20160630.0+1/vmlinuz-3.10.0-327.22.2.el7.x86_64 We are using thin LVM snapshots as the basis for RHV-H Next, so the UUIDs can be odd: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 30G 0 disk ├─vda1 252:1 0 1G 0 part /boot └─vda2 252:2 0 29G 0 part ├─onn-pool00_tmeta 253:0 0 24M 0 lvm │ └─onn-pool00-tpool 253:2 0 22.5G 0 lvm │ ├─onn-ovirt--node--ng--4.0.0--0.20160627.0+1 253:3 0 7.4G 0 lvm / │ ├─onn-pool00 253:5 0 22.5G 0 lvm │ ├─onn-var 253:6 0 15G 0 lvm /var │ ├─onn-root 253:7 0 7.4G 0 lvm │ ├─onn-ovirt--node--ng--4.0.0--0.20160630.0 253:8 0 4.4G 1 lvm │ └─onn-ovirt--node--ng--4.0.0--0.20160630.0+1 253:9 0 4.4G 0 lvm ├─onn-pool00_tdata 253:1 0 22.5G 0 lvm │ └─onn-pool00-tpool 253:2 0 22.5G 0 lvm │ ├─onn-ovirt--node--ng--4.0.0--0.20160627.0+1 253:3 0 7.4G 0 lvm / │ ├─onn-pool00 253:5 0 22.5G 0 lvm │ ├─onn-var 253:6 0 15G 0 lvm /var │ ├─onn-root 253:7 0 7.4G 0 lvm │ ├─onn-ovirt--node--ng--4.0.0--0.20160630.0 253:8 0 4.4G 1 lvm │ └─onn-ovirt--node--ng--4.0.0--0.20160630.0+1 253:9 0 4.4G 0 lvm └─onn-swap 253:4 0 2G 0 lvm [SWAP] grubby complains about a uuid mismatch when it targets a different root= than the one it's running on. I imagine that this is intentional, but not ideal when using snapshots. When run from one snapshot with the kernel on another (for example): # mount /dev/mapper/onn-ovirt--node--ng--4.0.0--0.20160627.0+1 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota) # grubby --default-kernel --debug DBG: command line: --default-kernel --debug DBG: Image entry failed: uuid mismatch: rootdev 00e23a92-6296-45ef-bef2-459fb51af69a, dev af545169-1c49-4021-a72b-bb93aec51d56 # ls -l /dev/disk/by-uuid/ total 0 lrwxrwxrwx. 1 root root 10 Jun 30 16:43 00e23a92-6296-45ef-bef2-459fb51af69a -> ../../dm-3 lrwxrwxrwx. 1 root root 10 Jun 30 16:55 af545169-1c49-4021-a72b-bb93aec51d56 -> ../../dm-9 # dmsetup info Name: onn-ovirt--node--ng--4.0.0--0.20160630.0+1 State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 253, 9 Number of targets: 1 UUID: LVM-lfTUhYUbzphTBkAwx46DfhDl39M0ydKRZlffMdOpgPDdOgKhqHJ9SmyaEXmLiZ3p Name: onn-ovirt--node--ng--4.0.0--0.20160627.0+1 State: ACTIVE Read Ahead: 8192 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-lfTUhYUbzphTBkAwx46DfhDl39M0ydKRWNcl56htotMteAdyJGJutffNFr9eRdBK Hi Ryan, I am unable to reproduce this bug using basic methods like cloning whole disk or cloning logical volume to obtain same UUIDs. Will You please retest it once fix will be ready? I will send notification. Sure, I'm happy to test this when the fix is ready. This really is an issue, we need help to debug this. In our setup we have the issue that a few things w/ grubby do not work: [root@dhcp-8-127 ~]# grubby --default-kernel [root@dhcp-8-127 ~]# grubby --default-index 1 [root@dhcp-8-127 ~]# grubby --bad-image-okay --default-index 1 [root@dhcp-8-127 ~]# grubby --bad-image-okay --default-kernel /boot/rhvh-4.0-0.20160727.0+1/vmlinuz-3.10.0-327.22.2.el7.x86_64 But this info is wrong - the index=0 kernel got booted: [root@dhcp-8-127 ~]# imgbase w [INFO] You are on rhvh-4.0-0.20160803.0+1 [root@dhcp-8-127 ~]# findmnt / TARGET SOURCE FSTYPE OPTIONS / /dev/mapper/r4b_dhcp--8--127-rhvh--4.0--0.20160803.0+1 xfs rw,relatime,seclabel,attr2,inode64,sunit=512,swidth=512,noquota This root maps to index=0 in the grub cfg: [root@dhcp-8-127 ~]# grubby --info=ALL index=0 kernel=/boot/rhvh-4.0-0.20160803.0+1/vmlinuz-3.10.0-327.28.2.el7.x86_64 args="ro crashkernel=auto rd.lvm.lv=r4b_dhcp-8-127/rhvh-4.0-0.20160803.0+1 rd.lvm.lv=r4b_dhcp-8-127/swap rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.0-0.20160803.0+1" root=/dev/r4b_dhcp-8-127/rhvh-4.0-0.20160803.0+1 initrd=/boot/rhvh-4.0-0.20160803.0+1/initramfs-3.10.0-327.28.2.el7.x86_64.img title=rhvh-4.0-0.20160803.0 index=1 kernel=/boot/rhvh-4.0-0.20160727.0+1/vmlinuz-3.10.0-327.22.2.el7.x86_64 args="ro crashkernel=auto rd.lvm.lv=r4b_dhcp-8-127/rhvh-4.0-0.20160727.0+1 rd.lvm.lv=r4b_dhcp-8-127/swap rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.0-0.20160727.0+1" root=/dev/r4b_dhcp-8-127/rhvh-4.0-0.20160727.0+1 initrd=/boot/rhvh-4.0-0.20160727.0+1/initramfs-3.10.0-327.22.2.el7.x86_64.img title=rhvh-4.0-0.20160727.0 index=2 kernel=/boot/vmlinuz-3.10.0-327.22.2.el7.x86_64 args="ro crashkernel=auto rd.lvm.lv=r4b_dhcp-8-127/root rd.lvm.lv=r4b_dhcp-8-127/swap rhgb quiet LANG=en_US.UTF-8" root=/dev/mapper/r4b_dhcp--8--127-root initrd=/boot/initramfs-3.10.0-327.22.2.el7.x86_64.img title=Red Hat Enterprise Linux (3.10.0-327.22.2.el7.x86_64) 7.2 index=3 non linux entry index=4 non linux entry index=5 non linux entry index=6 non linux entry What is going on here? > What is going on here?
I don't know - can you include your /boot/grub/grubenv from all relevant snapshots?
If I'm reading all of these comments right, it seems this is essentially an RFE for grubby to be able to identify that /boot/ is on an LVM snapshot, and consider the UUIDs of all relevant DM devices as valid candidates, work out which one we /actually/ have booted from, which might be different from what's mounted, and show output based on that.
Is that right?
If so, I think we need to add a feature to grub2 to add boot=$UUID or something along those lines on the kernel command line, because otherwise there's no way after booting to know where config files were actually loaded from if it's different than the currently configured system.
(In reply to Peter Jones from comment #13) > > What is going on here? > > I don't know - can you include your /boot/grub/grubenv from all relevant > snapshots? Good that you mention this file - I think this could be related/caused by bug 1366785: /boot/grub2/grubenv is a symlink to /boot/efi/EFI/$vendor/grubenv grub2 load_env fails > If I'm reading all of these comments right, it seems this is essentially an > RFE for grubby to be able to identify that /boot/ is on an LVM snapshot, and > consider the UUIDs of all relevant DM devices as valid candidates, work out > which one we /actually/ have booted from, which might be different from > what's mounted, and show output based on that. > > Is that right? No - /boot is not snapshotted in our design. But the grub entries point to different thin LV snapshots in a VG. And it looks like grub has issues to change the defaults > > If so, I think we need to add a feature to grub2 to add boot=$UUID or > something along those lines on the kernel command line, because otherwise > there's no way after booting to know where config files were actually loaded > from if it's different than the currently configured system. As said before, it's not relevenat for us, because we have a single /boot, but might be useful in other designs. Since the requested information has not been provided, we're closing this. If you can come up with the relevant config file examples and re-open, we'll reconsider this bug. Peter, what configs do you need? (In reply to Ryan Barry from comment #17) > Peter, what configs do you need? It looks like /boot/grub2/grubenv from all snapshots is needed, see comment 13 . Ryan, can you provide them? Thanks. (In reply to Marek Hruscak from comment #18) > (In reply to Ryan Barry from comment #17) > > Peter, what configs do you need? > > It looks like /boot/grub2/grubenv from all snapshots is needed, see comment > 13 . > > Ryan, can you provide them? > Thanks. /boot is not snapshotted, everything uses an identical grubenv. It's very simply: # GRUB Environment Block saved_entry=rhvhasically, create a new LVM volume a different ID. rsync your filesystem into it. Boot back and forth to examine the differences in grubby. Alternatively, an earlier version of RHVH and 'yum upgrade' to the newest one. # grubby --default-kernel --debug DBG: command line: --default-kernel --debug DBG: Image entry failed: uuid mismatch: rootdev bfe1a223-38a3-434e-b1a6-f4586f2be4d3, dev fd53ab53-ec12-44b8-8b82-f9d334b8110f ... # blkid /dev/mapper/rhvh-rhvh--4.* /dev/mapper/rhvh-rhvh--4.0--0.20170307.0+1: UUID="bfe1a223-38a3-434e-b1a6-f4586f2be4d3" TYPE="ext4" /dev/mapper/rhvh-rhvh--4.1--0.20170817.0+1: UUID="fd53ab53-ec12-44b8-8b82-f9d334b8110f" TYPE="ext4" This missed 7.2.z, 7.3, acked for 7.4 and missed it, missing 7.5 and 7.6. Can we have a target milestone for this? 7.6.z? 7.7? If I understood correctly, there are 2 different issues mentioned in the comments of this bugzilla, which makes it confusing to follow. The two bugs seems to be: 1) grubby --default-kernel not printing the default kernel even after was set if the root param in the cmdline is for a partition with a different UUID than the one currently mounted as root. 2) GRUB not getting a default kernel even after it was set correctly. I think (2) is about the grubenv file getting borked for some reasons (Comment 14 mentioned that it may be the same than bug 1366785). I'll just ignore it since the grubenv file not being correct will lead to GRUB not being able to have a default entry, and is independent to (1). For (1), I found a much easier way to reproduce it. On a default RHEL7 install, just change the root param to a partition that's already present in the machine (for example I changed root=/dev/mapper/rhel-root to root=/dev/mapper/rhel-swap in one menu entry). # grubby --info=ALL index=0 kernel=/boot/vmlinuz-3.10.0-957.el7.x86_64 args="ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet" root=/dev/mapper/rhel-root initrd=/boot/initramfs-3.10.0-957.el7.x86_64.img title=Red Hat Enterprise Linux Workstation (3.10.0-957.el7.x86_64) 7.6 (Maipo) index=1 kernel=/boot/vmlinuz-3.10.0-932.el7.x86_64 args="ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet" root=/dev/mapper/rhel-swap initrd=/boot/initramfs-3.10.0-932.el7.x86_64.img title=Red Hat Enterprise Linux Workstation (3.10.0-932.el7.x86_64) 7.6 (Maipo) index=2 kernel=/boot/vmlinuz-0-rescue-83fb8bc5f7924651aebd43f46ba4123f args="ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet" root=/dev/mapper/rhel-root initrd=/boot/initramfs-0-rescue-83fb8bc5f7924651aebd43f46ba4123f.img title=Red Hat Enterprise Linux Workstation (0-rescue-83fb8bc5f7924651aebd43f46ba4123f) 7.6 (Maipo) index=3 non linux entry # grubby --set-default /boot/vmlinuz-3.10.0-957.el7.x86_64 # grub2-editenv list saved_entry=Red Hat Enterprise Linux Workstation (3.10.0-957.el7.x86_64) 7.6 (Maipo) # grubby --default-kernel /boot/vmlinuz-3.10.0-957.el7.x86_64 This works as expected, but when using the second entry (whose root=/dev/mapper/rhel-swap): # grubby --set-default /boot/vmlinuz-3.10.0-932.el7.x86_64 --debug # grub2-editenv list saved_entry=Red Hat Enterprise Linux Workstation (3.10.0-932.el7.x86_64) 7.6 (Maipo) # grubby --default-kernel # Which I think is the issue reported in this bugzilla. As mentioned in Comment 9, using the --bad-image-okay option makes grubby to print the output and --debug tells us what's wrong: # grubby --default-kernel --bad-image-okay /boot/vmlinuz-3.10.0-932.el7.x86_64 # grubby --default-kernel --debug DBG: command line: --default-kernel --debug DBG: Image entry failed: uuid mismatch: rootdev b6e5711f-d2da-4ab8-bae2-d8b930ab0535, dev b0660e51-8210-4c29-8d93-56c741f3804a DBG: menuentry 'Red Hat Enterprise Linux Workstation (3.10.0-932.el7.x86_64) 7.6 (Maipo)' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-932.el7.x86_64-advanced-b6e5711f-d2da-4ab8-bae2-d8b930ab0535' { DBG: load_video DBG: set gfxpayload=keep DBG: insmod gzio DBG: insmod part_gpt DBG: insmod xfs DBG: set root='hd0,gpt2' DBG: if [ x$feature_platform_search_hint = xy ]; then DBG: search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2 45d2bc24-891a-48bd-be95-4395bb5e1059 DBG: else DBG: search --no-floppy --fs-uuid --set=root 45d2bc24-891a-48bd-be95-4395bb5e1059 DBG: fi DBG: linuxefi /vmlinuz-3.10.0-932.el7.x86_64 root=/dev/mapper/rhel-swap ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet DBG: initrdefi /initramfs-3.10.0-932.el7.x86_64.img DBG: } Now this indeed is confusing and not consistent, since grubby --set-default doesn't require a --bad-image-okay to set the default and also --debug doesn't says anything about the UUID mismatch. But it's not clear to me what should be the correct fix, if making --set-default to be strict and also require a --bad-image-ok in this case (and print the UUID mismatch with --debug) or to make --default-kernel more lax and not require the --bad-image-ok in this case. (In reply to Javier Martinez Canillas from comment #21) [snip] > > But it's not clear to me what should be the correct fix, if making > --set-default to be strict and also require a --bad-image-ok in this case > (and print the UUID mismatch with --debug) or to make --default-kernel more > lax and not require the --bad-image-ok in this case. After some thinking I think we would want the latter, that is to allow --default-kernel to print the default even if there's a UUID mismatch (but still print that with info with --debug). Reproduced on RHEL-7.6 using steps from comment 21: [root@localhost ~]# rpm -q grubby grubby-8.28-25.el7.x86_64 [root@localhost ~]# grubby --info=ALL index=0 kernel=/boot/vmlinuz-3.10.0-1058.el7.x86_64 args="ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8" root=/dev/mapper/rhel-root initrd=/boot/initramfs-3.10.0-1058.el7.x86_64.img title=Red Hat Enterprise Linux Server (3.10.0-1058.el7.x86_64) 7.6 (Maipo) index=1 kernel=/boot/vmlinuz-3.10.0-957.el7.x86_64 args="ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8" root=/dev/mapper/rhel-swap initrd=/boot/initramfs-3.10.0-957.el7.x86_64.img title=Red Hat Enterprise Linux Server (3.10.0-957.el7.x86_64) 7.6 (Maipo) index=2 kernel=/boot/vmlinuz-0-rescue-c57ad8dea5194fb5ad9044b37b88e0a8 args="ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet" root=/dev/mapper/rhel-root initrd=/boot/initramfs-0-rescue-c57ad8dea5194fb5ad9044b37b88e0a8.img title=Red Hat Enterprise Linux Server (0-rescue-c57ad8dea5194fb5ad9044b37b88e0a8) 7.6 (Maipo) index=3 non linux entry [root@localhost ~]# grubby --set-default /boot/vmlinuz-3.10.0-957.el7.x86_64 [root@localhost ~]# grub2-editenv list saved_entry=Red Hat Enterprise Linux Server (3.10.0-957.el7.x86_64) 7.6 (Maipo) [root@localhost ~]# grubby --default-kernel [root@localhost ~]# Verification using grubby-8.28-26.el7 on the same system, the default kernel is printed: [root@localhost ~]# yum update grubby ... [root@localhost ~]# rpm -q grubby grubby-8.28-26.el7.x86_64 [root@localhost ~]# grubby --default-kernel /boot/vmlinuz-3.10.0-957.el7.x86_64 [root@localhost ~]# Moving to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2227 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |