Bug 1663860

Summary: created entry becomes the default, which is inappropriate for rollbacks
Product: Red Hat Enterprise Linux 8 Reporter: Martin Pitt <mpitt>
Component: boom-bootAssignee: Bryn M. Reeves <bmr>
Status: CLOSED DUPLICATE QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: agk, bmr, fmartine, jbrassow, mcsontos, mpitt
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-20 14:39:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1671047    
Bug Blocks: 1660923    

Description Martin Pitt 2019-01-07 08:44:58 UTC
Description of problem: AFAIUI, boom's primary (or at least documented) use case is to provide rollbacks for upgrades. For that, it should not be the default boot entry, but at least in current RHEL 8.0 nightlies it is.


Version-Release number of selected component (if applicable):

boom-boot-0.9-6.el8.noarch
grub2-common-2.02-66.el8.noarch


How reproducible: Always

Steps to Reproduce:
1. Create a snapshot:
   lvcreate --snapshot -n rollback1 --size 1G rhel/root
2. Create a boom profile (work around https://bugzilla.redhat.com/show_bug.cgi?id=1649423):
   boom profile create --from-host --uname-pattern el8
3. Create a boom entry for the rollback:
   boom create --title "update rollback test" --rootlv rhel/rollback1

Actual results: After rebooting, system boots into the "rollback1" snapshot.

grub prompt (in a serial console) looks like this:

      update rollback test ##### ← SELECTED BY DEFAULT     
      Red Hat Enterprise Linux (4.18.0-57.el8.x86_64) 8.0 (Ootpa)              
      Red Hat Enterprise Linux (0-rescue-7b5128c16c4c4526ba5f8a834bd7f1cc) 8.0>
                                                                                

      Use the ^ and v keys to change the selection.                       
      Press 'e' to edit the selected item, or 'c' for a command prompt.   


Expected results: After rebooting, the former boot default stays the same.


Additional info:

Comment 1 Martin Pitt 2019-01-07 09:29:46 UTC
This does not affect RHEL 7.6 or Fedora 29. They seem to work differently, they both require manually regenerating the grub config (grub2-mkconfig  > /boot/grub2/grub.cfg) while this is not necessary in RHEL 8. However, RHEL 7.6 and Fedora don't default to the snapshot.

Comment 2 Marian Csontos 2019-01-07 12:19:14 UTC
This might be a bug in grub2/bls implementation. Could you check what's the saved_entry in the `grub2-editenv list` output?

Comment 3 Bryn M. Reeves 2019-01-07 14:17:25 UTC
This wasn't seen in any of the RHEL8 builds that we were testing in December (ISO installation and Leapp upgrades from the internal repos).

As Marian mentioned, this aspect of the boot process is under the control of Grub2's BLS module.

Could you post the output of "rpm -qa | grep grub", as well as the content of the files in /boot/loader/entries that correspond to the 1st two entries in the list?

Comment 4 Martin Pitt 2019-01-07 15:39:16 UTC
The "saved_entry" looks as expected, it defaults to the first (well, zero-th) menu entry. But this happens to be the new "rollback" entry, whereas the "Snapshots" menu on Fedora 29 and RHEL 7.6 comes last.

# grub2-editenv list
saved_entry=0
kernelopts=root=/dev/mapper/rhel-root ro rd_NO_PLYMOUTH crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root  console=ttyS0,115200 net.ifnames=0 biosdevname=0 

# rpm -qa | grep grub
grub2-common-2.02-66.el8.noarch
grub2-pc-modules-2.02-66.el8.noarch
grub2-tools-minimal-2.02-66.el8.x86_64
grub2-tools-2.02-66.el8.x86_64
grub2-tools-extra-2.02-66.el8.x86_64
grub2-pc-2.02-66.el8.x86_64
grubby-8.40-34.el8.x86_64
boom-boot-grub2-0.9-6.el8.noarch

All files in /boot/loader/entries:

---- 7b5128c16c4c4526ba5f8a834bd7f1cc-0-rescue.conf -----------
title Red Hat Enterprise Linux (0-rescue-7b5128c16c4c4526ba5f8a834bd7f1cc) 8.0 (Ootpa)
version 0-rescue-7b5128c16c4c4526ba5f8a834bd7f1cc
linux /vmlinuz-0-rescue-7b5128c16c4c4526ba5f8a834bd7f1cc
initrd /initramfs-0-rescue-7b5128c16c4c4526ba5f8a834bd7f1cc.img
options $kernelopts
id rhel-20181218151501-0-rescue-7b5128c16c4c4526ba5f8a834bd7f1cc
grub_users $grub_users
grub_arg --unrestricted
grub_class kernel


---- 7b5128c16c4c4526ba5f8a834bd7f1cc-4.18.0-57.el8.x86_64.conf ---------
title Red Hat Enterprise Linux (4.18.0-57.el8.x86_64) 8.0 (Ootpa)
version 4.18.0-57.el8.x86_64
linux /vmlinuz-4.18.0-57.el8.x86_64
initrd /initramfs-4.18.0-57.el8.x86_64.img $tuned_initrd
options $kernelopts $tuned_params
id rhel-20181218151501-4.18.0-57.el8.x86_64
grub_users $grub_users
grub_arg --unrestricted
grub_class kernel


---- 7b5128c16c4c4526ba5f8a834bd7f1cc-5fef6eb-4.18.0-57.el8.x86_64.conf (that's the snapshot one) -------
title update rollback test
machine-id 7b5128c16c4c4526ba5f8a834bd7f1cc
version 4.18.0-57.el8.x86_64
linux /vmlinuz-4.18.0-57.el8.x86_64
initrd /initramfs-4.18.0-57.el8.x86_64.img
options root=/dev/rhel/rollback1 ro rd.lvm.lv=rhel/rollback1 console=ttyS0,115200

Comment 5 Marian Csontos 2019-01-08 14:07:01 UTC
It does not look right to use index as a key, especially if we can not affect the order of entries (can we?), IMHO title is much better identifier.

I wonder, how does bls organize all the entries?

Comment 6 Bryn M. Reeves 2019-01-08 14:31:15 UTC
> It does not look right to use index as a key, especially if we can not affect the order of entries (can we?), IMHO title is much better identifier.

Afaik, with our BLS by default, the ordering is based on a version sort of the 'id' field - so one possibility here is there's a difference in Grub2's blscfg here where previously an empty id sorted last, but it now sorts first.

> I wonder, how does bls organize all the entries?

BLS does not specify - it's down to the bootloader that implements it to chose this kind of presentation detail (at least with the current published standard).

Comment 7 Bryn M. Reeves 2019-01-08 14:32:17 UTC
One possible test of that would be to edit the boom-created entry, and add an 'id' key that specifies a value that will sort differently.

Comment 9 Bryn M. Reeves 2019-01-16 15:46:08 UTC
I've updated my test systems to the grub2 version mentioned in comment #1, and I still do not see this problem. Running grub2-install, then rebooting gives me:

  Red Hat Enterprise Linux (4.18.0-48.el8.x86_64) 8.0 (Ootpa)
  RHEL7 Rollback
  Red Hat Enterprise Linux (0-rescue-523a903d...)

Editing the title string for the rollback entry to match yours ("upgrade rollback test") still does not produce the order that you're seeing.

Could you confirm that you're seeing this behaviour still with the latest RHEL8 builds? I'm waiting for the current nightly to download and will install a new VM from that to test.

Comment 10 Bryn M. Reeves 2019-01-16 15:49:16 UTC
One other question: how was the system upgraded? Is it possible that the on-disk bootloader was never updated during the process? (I've had a report of that with Fedora, which broke BLS completely, although not this exact behaviour).

To test if that is the case run "grub2-install /dev/boot_device" (normally the whole disk node), and reboot.

Comment 12 Martin Pitt 2019-01-17 10:50:21 UTC
> Could you confirm that you're seeing this behaviour still with the latest RHEL8 builds?

I tried this again with a nightly from this Tuesday. Both the grub (2.02-66) and boom (0.9-6) versions are unchanged, just the kernel is newer.

The behaviour from the reproducer is still the same:


      update rollback test  ← still the selected default on top of the list                                                    
      Red Hat Enterprise Linux (4.18.0-60.el8.x86_64) 8.0 (Ootpa)              
      Red Hat Enterprise Linux (0-rescue-b5709d2c0b424dc7be725ccc10a8d3a1) 8.0>

To test for random sort order, I now used your title:

   boom create --title "RHEL7 Rollback" --rootlv rhel/rollback1

(This is not actually true, as it just rolls back to slightly older version of RHEL 8, but *shrug*).

But same result, it comes out on top:

      RHEL7 Rollback                                                            
      Red Hat Enterprise Linux (4.18.0-60.el8.x86_64) 8.0 (Ootpa)              
      Red Hat Enterprise Linux (0-rescue-b5709d2c0b424dc7be725ccc10a8d3a1) 8.0>

> how was the system upgraded?

Not at all, it's a fresh installation from current RHEL 8 nightlies (using virt-install).

> To test if that is the case run "grub2-install /dev/boot_device" (normally the whole disk node), and reboot.

Just to cover all bases:

# grub2-install /dev/vda
Installing for i386-pc platform.

No change (as expected).

I already pasted everything above, but just in case:

# ls -l /boot/loader/entries/
total 12
-rw-r--r--. 1 root root 408 14. Jan 17:33 b5709d2c0b424dc7be725ccc10a8d3a1-0-rescue.conf
-rw-r--r--. 1 root root 331 14. Jan 17:33 b5709d2c0b424dc7be725ccc10a8d3a1-4.18.0-60.el8.x86_64.conf
-rw-r--r--. 1 root root 311 17. Jan 05:42 b5709d2c0b424dc7be725ccc10a8d3a1-cf2e286-4.18.0-60.el8.x86_64.conf

So here again with standard asciibetical ordering, the "rollback" entry is last. But supposedly it sorts by ID, not by file system name?

Could it be that something becomes confused due to these warnings? (bug 1652705); do you see them as well?

# boom list
WARNING - Could not load BootEntry '/boot/loader/entries/b5709d2c0b424dc7be725ccc10a8d3a1-4.18.0-60.el8.x86_64.conf': 'id'
WARNING - Could not load BootEntry '/boot/loader/entries/b5709d2c0b424dc7be725ccc10a8d3a1-0-rescue.conf': 'id'
BootID  Version                  Name                     RootDevice         
cf2e286 4.18.0-60.el8.x86_64     Red Hat Enterprise Linux /dev/rhel/rollback1

Comment 13 Bryn M. Reeves 2019-01-17 10:57:26 UTC
> Could it be that something becomes confused due to these warnings? (bug 1652705); do you see them as well?

No: those are purely cosmetic. It's just the boom library saying that it cannot read the system boot entries, because they use a key ('id') that is not yet part of the upstream BLS specification. We've made a change to reduce the log level of the message, since it's not really helpful and clutters the output, and we will add support for these additional keys in an update by 8.1.

All that boom does here is to write out text files in /boot/loader/entries - the interpretation of those, both for presentation in the Grub menu, and for booting, is down to the grub2 bootloader (specifically the 'blscfg' module that we provide in the RHEL and Fedora builds).

I have a more recent ISO now and will test today, but it's a bit perplexing as the parts that are responsible for this behaviour should be equal between our systems.

Comment 14 Martin Pitt 2019-01-17 11:05:46 UTC
> the interpretation of those, both for presentation in the Grub menu, and for booting, is down to the grub2 bootloader

At this point it's not clear at least to me as an outsider, whether this is due to the BLS spec not defining (enough) the order of entries to be shown, or boom not writing the entries correctly, or grub2 not interpeting them correctly. In the latter case, please reassign this to grub.

If you are interested, I can walk you through the exact steps that I did on the exact RHEL 8.0 image (it's our Cockpit test VM, so it's available within Red Hat).

Comment 15 Bryn M. Reeves 2019-01-17 11:18:14 UTC
> whether this is due to the BLS spec not defining (enough) the order of entries to be shown

BLS does not specify this - it's a presentation matter that is left to bootloader implementions. In RHEL, we use Grub2 with a set of patches that were originally developed in Fedora to support the upstream standard. We have been testing this with boom for the last several years without problems.

Last summer the bootloader team made a decision to switch to using BLS by default for the system boot entries in RHEL8 (this still is not implemented in Fedora). This involved additional changes to the BLS patches, and the creation of new keys that do not exist in the published BLS specification (the cause of the warning messages you noticed). It is these late changes that appear to be the cause, but as yet we haven't been able to determine exactly where and why the problem is happening (and why it affects your systems but not mine).

> or boom not writing the entries correctly, or grub2 not interpeting them correctly.

Boom writes 100% BLS compliant boot entries, which have been tested with a number of different BLS compatible loaders - so far this is the only report of this problem is yours, so although we are confident the problem is not in the data we write out we would like to understand what's happening in more detail.

> In the latter case, please reassign this to grub.

This is what I'm trying to do, as it seems likely that's where the problem is, but since I haven't yet been able to reproduce with the specific grub2 versions you have reported the problem on it seems a bit unfair to just toss this over the fence without any further investigation.

> If you are interested, I can walk you through the exact steps that I did on the exact RHEL 8.0 image (it's our Cockpit test VM, so it's available within Red Hat).

If I'm not able to reproduce with a current snapshot I'll take you up on that, although I probably don't need all the steps: just access to the system.

Comment 16 Bryn M. Reeves 2019-01-17 11:52:26 UTC
Reproduced with RHEL-8.0-20190116.n.0.

Comment 18 Bryn M. Reeves 2019-01-17 12:10:56 UTC
There's no need to use sed on the profiles (although you can if you like). Just specify the options you want when you create it:

# boom profile create --from-host --uname-pattern el8 --os-options "root=%{root_device} ro %{root_opts} console=ttyS0,115200"
Created profile with os_id e6f881a:
  OS ID: "e6f881ae3f8a2e010375fb840bb4f386b330db6e",
  Name: "Red Hat Enterprise Linux", Short name: "rhel",
  Version: "8.0 (Ootpa)", Version ID: "8.0",
  UTS release pattern: "el8",
  Kernel pattern: "/vmlinuz-%{version}", Initramfs pattern: "/initramfs-%{version}.img",
  Root options (LVM2): "rd.lvm.lv=%{lvm_root_lv}",
  Root options (BTRFS): "rootflags=%{btrfs_subvolume}",
  Options: "root=%{root_device} ro %{root_opts} console=ttyS0,115200"

Anything you specify on the command line will override the defaults from the os-release data.

Comment 19 Bryn M. Reeves 2019-01-17 12:11:42 UTC
The grub environment block is different between the working and non-working cases:

Working:

# grub2-editenv list
saved_entry=523a903dd71b4166b3ba5884464bfb1f-4.18.0-48.el8.x86_64
kernelopts=root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet 
boot_success=0
boot_indeterminate=0

Non-working:

# grub2-editenv list
saved_entry=0
kernelopts=root=/dev/mapper/rhel-root ro crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet 
boot_success=0

Which matches comment #4, however, although manually setting the 'saved_entry' value to the correct RHEL8 boot entry _does_ select the correct default entry, the rollback entry is still sorted first in the list (functionally better, but still a cosmetic problem).

Comment 20 Bryn M. Reeves 2019-01-17 12:18:46 UTC
Adding an 'id' key to the Boom entry (which should sort either greater or lesser than either of the others) also does not affect the order of entries in the menu.

Comment 21 Bryn M. Reeves 2019-01-17 12:47:55 UTC
According to:

  https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault

The sorting of the entries is ordered by "the BLS filename using the rpmvercmp() version comparison function.". I can't find equivalent information for RHEL, but this appears to be incorrect: adding additional BLS entries whose file names have leading '0's, 'z's, or version strings that should sort later produces no change at all in the sorting of the menu entries. All boom entries come first, followed by the system defined entries.

Comment 22 Bryn M. Reeves 2019-01-17 12:53:56 UTC
Reading the grub2 BLS patches, it seems the sort is actually applied to the 'version' field of the entries, rather than the file name as documented. Since the versions are all the same here (just one kernel), it looks like the list starts off in reverse order of inode number - this is the reason that our "new" entries appear on top. Changing a version value to a lower number will cause that entry to drop further down the list.

Comment 23 Javier Martinez Canillas 2019-02-01 19:02:17 UTC
(In reply to Bryn M. Reeves from comment #22)
> Reading the grub2 BLS patches, it seems the sort is actually applied to the
> 'version' field of the entries, rather than the file name as documented.
> Since the versions are all the same here (just one kernel), it looks like
> the list starts off in reverse order of inode number - this is the reason
> that our "new" entries appear on top. Changing a version value to a lower
> number will cause that entry to drop further down the list.

Yes. We made the version field to have precedence over the filename to also support ostree, whose generated BLS snippets filename was ostree-$ID-$VARIANT_ID-$index.conf but were actually sorted by the version field (which was the inverse of the index).

I have since then changed this in ostree so now the BLS filenames are ostree-$version-$ID-$VARIANT_ID.conf:

https://github.com/ostreedev/ostree/commit/9f48e212a3b

So we now just use sorting by filename in Fedora, but couldn't push that change in RHEL8 anymore due the devel freeze. That's why you see an inconsistency to what's mentioned in the Fedora Changes wiki page and the RHEL8 implementation.

I think that this issue will be solved (or at least worked around) by Anaconda correctly setting the BLS id as saved_entry instead of the index at installation (bug #1671047).

Comment 24 Martin Pitt 2019-02-05 07:50:02 UTC
> I think that this issue will be solved (or at least worked around) by Anaconda correctly setting the BLS id as saved_entry instead of the index at installation 

That won't help existing systems though (RHEL 7 upgrades or RHEL 8 beta installs)

Comment 25 Javier Martinez Canillas 2019-02-05 07:52:13 UTC
(In reply to Martin Pitt from comment #24)
> > I think that this issue will be solved (or at least worked around) by Anaconda correctly setting the BLS id as saved_entry instead of the index at installation 
> 
> That won't help existing systems though (RHEL 7 upgrades or RHEL 8 beta
> installs)

But in this case wouldn't the default be updated when a new kernel is updated?

Comment 26 Bryn M. Reeves 2019-02-05 10:19:05 UTC
> That won't help existing systems though (RHEL 7 upgrades or RHEL 8 beta installs)

Not sure I follow? Upgrades using Leapp (afaik our only supported in-place upgrade mechanism) use the RPM packages and don't meddle with the system - they have never shown this problem. It's only Anaconda installations that will experience this since Anaconda explicitly re-sets the saved_entry to '0'.

As far as I understand, a beta is not formally supported anyway - so the answer would be to either use a later snapshot if the user has access, or to wait for GA.

Comment 27 Bryn M. Reeves 2019-09-20 14:39:21 UTC

*** This bug has been marked as a duplicate of bug 1671047 ***