Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1749871

Summary: root snapshot requires multiple selections in boot menu before snap is actually booted (when create with boom)
Product: Red Hat Enterprise Linux 8 Reporter: Corey Marthaler <cmarthal>
Component: boom-bootAssignee: Bryn M. Reeves <bmr>
Status: CLOSED WORKSFORME QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.1CC: agk, bmr, jbrassow, mcsontos
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-29 12:32:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
requested sos report of system seeing this issue none

Description Corey Marthaler 2019-09-06 15:47:26 UTC
Description of problem:
I was going through final sign off of the boom create doc (https://projects.engineering.redhat.com/browse/RHELPLAN-2836) and realized that when ever I select a boom snapshot entry to boot, it never "just works". It often returns back to the boot menu listed below and I need to select the snap entry (Root LV snapshot before changes) anywhere from 3-5 separate times before it eventually work. This is likely not a boom bug, but some kind of storage discovery timing issue, but I wanted it documented at least for when users attempt this and it doesn't work right away.


# I select "Root LV snapshot before changes" and end up right back at the boot menu to select it over and over before it eventually works/boots.

      Root LV snapshot before changes         <-- SELECTED
      Red Hat Enterprise Linux (4.18.0-141.el8.x86_64) 8.1 (Ootpa)
      Red Hat Enterprise Linux (0-rescue-b10221518c134031a0e5d93929529e3a) 8.1>
                                                                               
                                                                                

      Use the ^ and v keys to change the selection.
      Press 'e' to edit the selected item, or 'c' for a command prompt.





# Finally booted the snap volume after selecting it four times in this scenario.
[root@host-073 ~]# uname -ar
Linux host-073.virt.lab.msp.redhat.com 4.18.0-141.el8.x86_64 #1 SMP Fri Aug 30 15:27:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[root@host-073 ~]# lvs -a -o +devices
  WARNING: lvmlockd process is not running.
  Reading without shared global lock.
  LV                           VG            Attr       LSize   Pool   Origin Data%  Meta%   Devices
  [lvol0_pmspare]              rhel_host-073 ewi-------   4.00m                              /dev/vda2(0)
  pool00                       rhel_host-073 twi-aotz--  <4.79g               81.22  74.02   pool00_tdata(0)
  [pool00_tdata]               rhel_host-073 Twi-ao----  <4.79g                              /dev/vda2(1)
  [pool00_tmeta]               rhel_host-073 ewi-ao----   4.00m                              /dev/vda2(1226)
  root                         rhel_host-073 Vwi-a-tz--  <4.79g pool00        79.85
  root_snapshot_before_changes rhel_host-073 Vwi-aotz--  <4.79g pool00 root   80.03
  swap                         rhel_host-073 -wi-ao---- 820.00m                              /dev/vda2(1227)

[root@host-073 ~]# df -h
Filesystem                                               Size  Used Avail Use% Mounted on
devtmpfs                                                 7.3G     0  7.3G   0% /dev
tmpfs                                                    7.3G     0  7.3G   0% /dev/shm
tmpfs                                                    7.3G  8.6M  7.3G   1% /run
tmpfs                                                    7.3G     0  7.3G   0% /sys/fs/cgroup
/dev/mapper/rhel_host--073-root_snapshot_before_changes  4.8G  3.9G  985M  80% /
/dev/vda1                                               1014M  178M  837M  18% /boot
tmpfs                                                    1.5G     0  1.5G   0% /run/user/0

[root@host-073 ~]# boom list
BootID  Version                  Name                     RootDevice                                     
abaaa9c 4.18.0-141.el8.x86_64    Red Hat Enterprise Linux /dev/rhel_host-073/root_snapshot_before_changes


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Bryn M. Reeves 2019-09-09 13:36:12 UTC
When you say you end up right back at the boot menu, what's actually happening? Is there a delay, or a failed boot attempt? Or does grub2 just take you immediately back to the menu with no delay or boot attempt at all?

It might help to see the content of the BLS entry for the snapshot:

  # boom show --boot-id abaaa9c

Comment 2 Corey Marthaler 2019-09-10 17:03:44 UTC
Although I'm not certain, it appears to be taking enough time (~45 seconds) to attempt the snap boot, fail for what reason, and end up back at the boot menu to try the default boot root entry, which if I dont select the snap again, will boot fine. If I do select the snap again, it may boot, or it may end up back at the boot menu again 45 seconds later for whatever reason. If I keep at it, the snap will eventually boot properly which makes this seem like a timing issue.

[root@host-087 ~]# boom list
BootID  Version                  Name                     RootDevice                                     
4376d75 4.18.0-141.el8.x86_64    Red Hat Enterprise Linux /dev/rhel_host-087/root_snapshot_before_changes

[root@host-087 ~]# boom show --boot-id 4376d75
Boot Entry (boot_id=4376d75)
  title Root LV snapshot before changes
  machine-id 19c31f4c64714e29a8d74992ac5449d9
  version 4.18.0-141.el8.x86_64
  linux /vmlinuz-4.18.0-141.el8.x86_64
  initrd /initramfs-4.18.0-141.el8.x86_64.img
  optionn root=/dev/rhel_host-087/root_snapshot_before_changes ro rd.lvm.lv=rhel_host-087/root_snapshot_before_changes
  


kernel-4.18.0-141.el8    BUILT: Fri Aug 30 10:51:22 CDT 2019
lvm2-2.03.05-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
lvm2-libs-2.03.05-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
lvm2-dbusd-2.03.05-4.el8    BUILT: Sun Aug 18 11:46:32 CDT 2019
lvm2-lockd-2.03.05-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
boom-boot-1.0-0.2.20190610git246b116.el8    BUILT: Mon Jun 10 08:22:40 CDT 2019

Comment 3 Bryn M. Reeves 2019-09-10 17:31:06 UTC
Is that output pasted directly from the terminal? That last line should be *options*, not *optionn*:

# boom show --boot-id 327e24a
Boot Entry (boot_id=327e24a)
  title Fedora 26 Snapshot 2017-10-21
  machine-id 611f38fd887d41dea7eb3403b2730a76
  version 4.13.5-200.fc26.x86_64
  linux /vmlinuz-4.13.5-200.fc26.x86_64
  initrd /initramfs-4.13.5-200.fc26.x86_64.img
  options BOOT_IMAGE=%{linux} root=/dev/vg_hex/root ro rd.lvm.lv=vg_hex/root
  ^^^^^^^

I don't see how that would happen (the show command uses the built-in string formatter for a BootEntry: it's not just dumping the file content from /boot/loader/...) - I don't find that string anywhere in the sources.

Comment 4 Bryn M. Reeves 2019-09-10 17:32:11 UTC
Other than that the boot entry looks correct - I think we'd need to get console logs from the presumed failed boots to be able to debug this further.

Comment 5 Corey Marthaler 2019-09-10 17:59:42 UTC
That must have been a cut buffer or paste issue, because i doubled checked and it's correct:

[root@host-087 ~]# boom list
BootID  Version                  Name                     RootDevice                                     
4376d75 4.18.0-141.el8.x86_64    Red Hat Enterprise Linux /dev/rhel_host-087/root_snapshot_before_changes
[root@host-087 ~]# boom show --boot-id 4376d75
Boot Entry (boot_id=4376d75)
  title Root LV snapshot before changes
  machine-id 19c31f4c64714e29a8d74992ac5449d9
  version 4.18.0-141.el8.x86_64
  linux /vmlinuz-4.18.0-141.el8.x86_64
  initrd /initramfs-4.18.0-141.el8.x86_64.img
  options root=/dev/rhel_host-087/root_snapshot_before_changes ro rd.lvm.lv=rhel_host-087/root_snapshot_before_changes

Comment 6 Bryn M. Reeves 2019-09-10 18:19:08 UTC
Thanks for checking. I need to install a RHEL8.1 VM to test the dmstats bug this week so I'll try to reproduce this as well.

Comment 7 Bryn M. Reeves 2019-09-13 11:58:12 UTC
I can't reproduce this on a fresh install of 8.1 beta:

# boom profile create --from-host --uname-pattern el8
Created profile with os_id f44fb52:
  OS ID: "f44fb528ff8360ad67e2fe0274750b838da0bd6a",
  Name: "Red Hat Enterprise Linux", Short name: "rhel",
  Version: "8.1 (Ootpa)", Version ID: "8.1",
  UTS release pattern: "el8",
  Kernel pattern: "/vmlinuz-%{version}", Initramfs pattern: "/initramfs-%{version}.img",
  Root options (LVM2): "rd.lvm.lv=%{lvm_root_lv}",
  Root options (BTRFS): "rootflags=%{btrfs_subvolume}",
  Options: "root=%{root_device} ro %{root_opts}",
  Title: "%{os_name} %{os_version_id} (%{version})"

# boom create --title "RHEL8 Snapshot" --rootlv rhel/root-snap
WARNING - Boom grub2 integration is disabled in '/boot/../etc/default/boom'
Created entry with boot_id df2def0:
  title RHEL8 Rollback
  machine-id a32a43c0cd3e443a99ef70edc4dd7284
  version 4.18.0-107.el8.x86_64
  linux /vmlinuz-4.18.0-107.el8.x86_64
  initrd /initramfs-4.18.0-107.el8.x86_64.img
  options root=/dev/rhel/root-snap ro rd.lvm.lv=rhel/root-snap

# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.1 Beta (Ootpa)

boom-boot-1.0-0.2.20190610git246b116.el8.noarch
boom-boot-conf-1.0-0.2.20190610git246b116.el8.noarch
boom-boot-grub2-1.0-0.2.20190610git246b116.el8.noarch
device-mapper-1.02.163-1.el8.x86_64
device-mapper-event-1.02.163-1.el8.x86_64
device-mapper-event-libs-1.02.163-1.el8.x86_64
device-mapper-libs-1.02.163-1.el8.x86_64
device-mapper-multipath-0.8.0-5.el8.x86_64
device-mapper-multipath-libs-0.8.0-5.el8.x86_64
device-mapper-persistent-data-0.8.5-2.el8.x86_64
grub2-common-2.02-74.el8.noarch
grub2-pc-2.02-74.el8.x86_64
grub2-pc-modules-2.02-74.el8.noarch
grub2-tools-2.02-74.el8.x86_64
grub2-tools-extra-2.02-74.el8.x86_64
grub2-tools-minimal-2.02-74.el8.x86_64
lvm2-2.03.05-1.el8.x86_64
lvm2-libs-2.03.05-1.el8.x86_64
python3-boom-1.0-0.2.20190610git246b116.el8.noarch

# mount | grep ' \/ ' 
/dev/mapper/rhel-root--snap on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-107.el8.x86_64 root=/dev/rhel/root-snap ro rd.lvm.lv=rhel/root-snap

I think we'll need to see the boot logs from the hosts where you were seeing the failures.

Comment 8 Bryn M. Reeves 2019-09-13 14:50:24 UTC
Otherwise, an sosreport from an affected machine might show up a difference in configuration - the boot and storage profiles should capture everything needed:

  # sosreport --batch --profile=boot,storage

Comment 9 Corey Marthaler 2020-06-09 22:00:41 UTC
Adding "boom" to the subject as it took me awhile to search for this bug...

I reproduced this with the latest rhel7.9 rpms. I selected the snap 4 times before it eventually booted to the snap device who's boot entry was created w/ boom.

[root@host-094 ~]# lvcreate -k n -a y --yes -s /dev/rhel_host-094/root -n boom_snap 
  WARNING: Sum of all thin volume sizes (9.57 GiB) exceeds the size of thin pool rhel_host-094/pool00 and the size of whole volume group (<7.00 GiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "boom_snap" created.
[root@host-094 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  boom_snap       rhel_host-094 Vwi-a-tz--  <4.79g pool00 root   61.86                                                  
  [lvol0_pmspare] rhel_host-094 ewi-------   4.00m                                                       /dev/vda2(205) 
  pool00          rhel_host-094 twi-aotz--  <4.79g               61.86  48.05                            pool00_tdata(0)
  [pool00_tdata]  rhel_host-094 Twi-ao----  <4.79g                                                       /dev/vda2(206) 
  [pool00_tmeta]  rhel_host-094 ewi-ao----   4.00m                                                       /dev/vda2(1431)
  root            rhel_host-094 Vwi-aotz--  <4.79g pool00        61.86                                                  
  swap            rhel_host-094 -wi-ao---- 820.00m                                                       /dev/vda2(0)   

[root@host-094 ~]# df -h
Filesystem                       Size  Used Avail Use% Mounted on
devtmpfs                         7.3G     0  7.3G   0% /dev
tmpfs                            7.3G     0  7.3G   0% /dev/shm
tmpfs                            7.3G  8.7M  7.3G   1% /run
tmpfs                            7.3G     0  7.3G   0% /sys/fs/cgroup
/dev/mapper/rhel_host--094-root  4.8G  2.9G  1.9G  61% /
/dev/vda1                       1014M  156M  859M  16% /boot
tmpfs                            1.5G     0  1.5G   0% /run/user/0


[root@host-094 ~]# boom create --title BOOM --rootlv /dev/rhel_host-094/boom_snap
Created entry with boot_id c9d13bf:
  title BOOM
  machine-id 6aa4719f9f24482b902819e419ce14a6
  version 3.10.0-1149.el7.x86_64
  linux /vmlinuz-3.10.0-1149.el7.x86_64
  initrd /initramfs-3.10.0-1149.el7.x86_64.img
  options root=/dev/rhel_host-094/boom_snap ro rd.lvm.lv=rhel_host-094/boom_snap
  grub_users $grub_users
  grub_arg --unrestricted
  grub_class kernel

[root@host-094 ~]# ls /dev/rhel_host-094/boom_snap
/dev/rhel_host-094/boom_snap
[root@host-094 ~]# boom list
BootID  Version                  Name                            RootDevice                  
c9d13bf 3.10.0-1149.el7.x86_64   Red Hat Enterprise Linux Server /dev/rhel_host-094/boom_snap
[root@host-094 ~]# grep boom /boot/grub2/grub.cfg
### BEGIN /etc/grub.d/42_boom ###
### END /etc/grub.d/42_boom ###
[root@host-094 ~]# grub2-mkconfig > /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-1149.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1149.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-6aa4719f9f24482b902819e419ce14a6
Found initrd image: /boot/initramfs-0-rescue-6aa4719f9f24482b902819e419ce14a6.img
done
[root@host-094 ~]# sync
[root@host-094 ~]# reboot 


# Toggle down to BOOM
  Red Hat Enterprise Linux Server (3.10.0-1149.el7.x86_64) 7.9 (Maipo)      
  Red Hat Enterprise Linux Server (0-rescue-6aa4719f9f24482b902819e419ce14>
  Snapshots                                                                
      BOOM                                                                      

( I waited like 20-30 seconds, and then would see the above boot selection again, and I'd toggle down, etc, four times before it eventually booted the snap volume )


# Then, Quick verification that the snap is the running root vol now.
[root@host-094 ~]# df -h
Filesystem                            Size  Used Avail Use% Mounted on
devtmpfs                              7.3G     0  7.3G   0% /dev
tmpfs                                 7.3G     0  7.3G   0% /dev/shm
tmpfs                                 7.3G  8.6M  7.3G   1% /run
tmpfs                                 7.3G     0  7.3G   0% /sys/fs/cgroup
/dev/mapper/rhel_host--094-boom_snap  4.8G  2.9G  2.0G  60% /
/dev/vda1                            1014M  156M  859M  16% /boot
tmpfs                                 1.5G     0  1.5G   0% /run/user/0

Comment 10 Corey Marthaler 2020-06-09 22:03:53 UTC
Created attachment 1696402 [details]
requested sos report of system seeing this issue

Comment 12 Bryn M. Reeves 2021-01-29 12:32:40 UTC
I've never been able to reproduce this and even then it appears from the description that this is a problem in either the Grub2 boot loader or the kernel. Boom only provides configuration to the boot loader in the form of plain text configuration files. As long as those files are correct (and the examples given here are), boom has no further influence on the success of the actual boot process.

If you're seeing this problem again please re-open this bug or file a new report against the grub2 component.