Bug 1749871
| Summary: | root snapshot requires multiple selections in boot menu before snap is actually booted (when create with boom) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Corey Marthaler <cmarthal> | ||||
| Component: | boom-boot | Assignee: | Bryn M. Reeves <bmr> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 8.1 | CC: | agk, bmr, jbrassow, mcsontos | ||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||
| Target Release: | 8.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-01-29 12:32:40 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Corey Marthaler
2019-09-06 15:47:26 UTC
When you say you end up right back at the boot menu, what's actually happening? Is there a delay, or a failed boot attempt? Or does grub2 just take you immediately back to the menu with no delay or boot attempt at all? It might help to see the content of the BLS entry for the snapshot: # boom show --boot-id abaaa9c Although I'm not certain, it appears to be taking enough time (~45 seconds) to attempt the snap boot, fail for what reason, and end up back at the boot menu to try the default boot root entry, which if I dont select the snap again, will boot fine. If I do select the snap again, it may boot, or it may end up back at the boot menu again 45 seconds later for whatever reason. If I keep at it, the snap will eventually boot properly which makes this seem like a timing issue. [root@host-087 ~]# boom list BootID Version Name RootDevice 4376d75 4.18.0-141.el8.x86_64 Red Hat Enterprise Linux /dev/rhel_host-087/root_snapshot_before_changes [root@host-087 ~]# boom show --boot-id 4376d75 Boot Entry (boot_id=4376d75) title Root LV snapshot before changes machine-id 19c31f4c64714e29a8d74992ac5449d9 version 4.18.0-141.el8.x86_64 linux /vmlinuz-4.18.0-141.el8.x86_64 initrd /initramfs-4.18.0-141.el8.x86_64.img optionn root=/dev/rhel_host-087/root_snapshot_before_changes ro rd.lvm.lv=rhel_host-087/root_snapshot_before_changes kernel-4.18.0-141.el8 BUILT: Fri Aug 30 10:51:22 CDT 2019 lvm2-2.03.05-4.el8 BUILT: Sun Aug 18 11:44:11 CDT 2019 lvm2-libs-2.03.05-4.el8 BUILT: Sun Aug 18 11:44:11 CDT 2019 lvm2-dbusd-2.03.05-4.el8 BUILT: Sun Aug 18 11:46:32 CDT 2019 lvm2-lockd-2.03.05-4.el8 BUILT: Sun Aug 18 11:44:11 CDT 2019 boom-boot-1.0-0.2.20190610git246b116.el8 BUILT: Mon Jun 10 08:22:40 CDT 2019 Is that output pasted directly from the terminal? That last line should be *options*, not *optionn*:
# boom show --boot-id 327e24a
Boot Entry (boot_id=327e24a)
title Fedora 26 Snapshot 2017-10-21
machine-id 611f38fd887d41dea7eb3403b2730a76
version 4.13.5-200.fc26.x86_64
linux /vmlinuz-4.13.5-200.fc26.x86_64
initrd /initramfs-4.13.5-200.fc26.x86_64.img
options BOOT_IMAGE=%{linux} root=/dev/vg_hex/root ro rd.lvm.lv=vg_hex/root
^^^^^^^
I don't see how that would happen (the show command uses the built-in string formatter for a BootEntry: it's not just dumping the file content from /boot/loader/...) - I don't find that string anywhere in the sources.
Other than that the boot entry looks correct - I think we'd need to get console logs from the presumed failed boots to be able to debug this further. That must have been a cut buffer or paste issue, because i doubled checked and it's correct: [root@host-087 ~]# boom list BootID Version Name RootDevice 4376d75 4.18.0-141.el8.x86_64 Red Hat Enterprise Linux /dev/rhel_host-087/root_snapshot_before_changes [root@host-087 ~]# boom show --boot-id 4376d75 Boot Entry (boot_id=4376d75) title Root LV snapshot before changes machine-id 19c31f4c64714e29a8d74992ac5449d9 version 4.18.0-141.el8.x86_64 linux /vmlinuz-4.18.0-141.el8.x86_64 initrd /initramfs-4.18.0-141.el8.x86_64.img options root=/dev/rhel_host-087/root_snapshot_before_changes ro rd.lvm.lv=rhel_host-087/root_snapshot_before_changes Thanks for checking. I need to install a RHEL8.1 VM to test the dmstats bug this week so I'll try to reproduce this as well. I can't reproduce this on a fresh install of 8.1 beta:
# boom profile create --from-host --uname-pattern el8
Created profile with os_id f44fb52:
OS ID: "f44fb528ff8360ad67e2fe0274750b838da0bd6a",
Name: "Red Hat Enterprise Linux", Short name: "rhel",
Version: "8.1 (Ootpa)", Version ID: "8.1",
UTS release pattern: "el8",
Kernel pattern: "/vmlinuz-%{version}", Initramfs pattern: "/initramfs-%{version}.img",
Root options (LVM2): "rd.lvm.lv=%{lvm_root_lv}",
Root options (BTRFS): "rootflags=%{btrfs_subvolume}",
Options: "root=%{root_device} ro %{root_opts}",
Title: "%{os_name} %{os_version_id} (%{version})"
# boom create --title "RHEL8 Snapshot" --rootlv rhel/root-snap
WARNING - Boom grub2 integration is disabled in '/boot/../etc/default/boom'
Created entry with boot_id df2def0:
title RHEL8 Rollback
machine-id a32a43c0cd3e443a99ef70edc4dd7284
version 4.18.0-107.el8.x86_64
linux /vmlinuz-4.18.0-107.el8.x86_64
initrd /initramfs-4.18.0-107.el8.x86_64.img
options root=/dev/rhel/root-snap ro rd.lvm.lv=rhel/root-snap
# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.1 Beta (Ootpa)
boom-boot-1.0-0.2.20190610git246b116.el8.noarch
boom-boot-conf-1.0-0.2.20190610git246b116.el8.noarch
boom-boot-grub2-1.0-0.2.20190610git246b116.el8.noarch
device-mapper-1.02.163-1.el8.x86_64
device-mapper-event-1.02.163-1.el8.x86_64
device-mapper-event-libs-1.02.163-1.el8.x86_64
device-mapper-libs-1.02.163-1.el8.x86_64
device-mapper-multipath-0.8.0-5.el8.x86_64
device-mapper-multipath-libs-0.8.0-5.el8.x86_64
device-mapper-persistent-data-0.8.5-2.el8.x86_64
grub2-common-2.02-74.el8.noarch
grub2-pc-2.02-74.el8.x86_64
grub2-pc-modules-2.02-74.el8.noarch
grub2-tools-2.02-74.el8.x86_64
grub2-tools-extra-2.02-74.el8.x86_64
grub2-tools-minimal-2.02-74.el8.x86_64
lvm2-2.03.05-1.el8.x86_64
lvm2-libs-2.03.05-1.el8.x86_64
python3-boom-1.0-0.2.20190610git246b116.el8.noarch
# mount | grep ' \/ '
/dev/mapper/rhel-root--snap on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-107.el8.x86_64 root=/dev/rhel/root-snap ro rd.lvm.lv=rhel/root-snap
I think we'll need to see the boot logs from the hosts where you were seeing the failures.
Otherwise, an sosreport from an affected machine might show up a difference in configuration - the boot and storage profiles should capture everything needed: # sosreport --batch --profile=boot,storage Adding "boom" to the subject as it took me awhile to search for this bug...
I reproduced this with the latest rhel7.9 rpms. I selected the snap 4 times before it eventually booted to the snap device who's boot entry was created w/ boom.
[root@host-094 ~]# lvcreate -k n -a y --yes -s /dev/rhel_host-094/root -n boom_snap
WARNING: Sum of all thin volume sizes (9.57 GiB) exceeds the size of thin pool rhel_host-094/pool00 and the size of whole volume group (<7.00 GiB).
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume "boom_snap" created.
[root@host-094 ~]# lvs -a -o +devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
boom_snap rhel_host-094 Vwi-a-tz-- <4.79g pool00 root 61.86
[lvol0_pmspare] rhel_host-094 ewi------- 4.00m /dev/vda2(205)
pool00 rhel_host-094 twi-aotz-- <4.79g 61.86 48.05 pool00_tdata(0)
[pool00_tdata] rhel_host-094 Twi-ao---- <4.79g /dev/vda2(206)
[pool00_tmeta] rhel_host-094 ewi-ao---- 4.00m /dev/vda2(1431)
root rhel_host-094 Vwi-aotz-- <4.79g pool00 61.86
swap rhel_host-094 -wi-ao---- 820.00m /dev/vda2(0)
[root@host-094 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.3G 0 7.3G 0% /dev
tmpfs 7.3G 0 7.3G 0% /dev/shm
tmpfs 7.3G 8.7M 7.3G 1% /run
tmpfs 7.3G 0 7.3G 0% /sys/fs/cgroup
/dev/mapper/rhel_host--094-root 4.8G 2.9G 1.9G 61% /
/dev/vda1 1014M 156M 859M 16% /boot
tmpfs 1.5G 0 1.5G 0% /run/user/0
[root@host-094 ~]# boom create --title BOOM --rootlv /dev/rhel_host-094/boom_snap
Created entry with boot_id c9d13bf:
title BOOM
machine-id 6aa4719f9f24482b902819e419ce14a6
version 3.10.0-1149.el7.x86_64
linux /vmlinuz-3.10.0-1149.el7.x86_64
initrd /initramfs-3.10.0-1149.el7.x86_64.img
options root=/dev/rhel_host-094/boom_snap ro rd.lvm.lv=rhel_host-094/boom_snap
grub_users $grub_users
grub_arg --unrestricted
grub_class kernel
[root@host-094 ~]# ls /dev/rhel_host-094/boom_snap
/dev/rhel_host-094/boom_snap
[root@host-094 ~]# boom list
BootID Version Name RootDevice
c9d13bf 3.10.0-1149.el7.x86_64 Red Hat Enterprise Linux Server /dev/rhel_host-094/boom_snap
[root@host-094 ~]# grep boom /boot/grub2/grub.cfg
### BEGIN /etc/grub.d/42_boom ###
### END /etc/grub.d/42_boom ###
[root@host-094 ~]# grub2-mkconfig > /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-1149.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1149.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-6aa4719f9f24482b902819e419ce14a6
Found initrd image: /boot/initramfs-0-rescue-6aa4719f9f24482b902819e419ce14a6.img
done
[root@host-094 ~]# sync
[root@host-094 ~]# reboot
# Toggle down to BOOM
Red Hat Enterprise Linux Server (3.10.0-1149.el7.x86_64) 7.9 (Maipo)
Red Hat Enterprise Linux Server (0-rescue-6aa4719f9f24482b902819e419ce14>
Snapshots
BOOM
( I waited like 20-30 seconds, and then would see the above boot selection again, and I'd toggle down, etc, four times before it eventually booted the snap volume )
# Then, Quick verification that the snap is the running root vol now.
[root@host-094 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.3G 0 7.3G 0% /dev
tmpfs 7.3G 0 7.3G 0% /dev/shm
tmpfs 7.3G 8.6M 7.3G 1% /run
tmpfs 7.3G 0 7.3G 0% /sys/fs/cgroup
/dev/mapper/rhel_host--094-boom_snap 4.8G 2.9G 2.0G 60% /
/dev/vda1 1014M 156M 859M 16% /boot
tmpfs 1.5G 0 1.5G 0% /run/user/0
Created attachment 1696402 [details]
requested sos report of system seeing this issue
I've never been able to reproduce this and even then it appears from the description that this is a problem in either the Grub2 boot loader or the kernel. Boom only provides configuration to the boot loader in the form of plain text configuration files. As long as those files are correct (and the examples given here are), boom has no further influence on the success of the actual boot process. If you're seeing this problem again please re-open this bug or file a new report against the grub2 component. |