RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1850193 - [Azure]grub2-mkconfig -o /boot/grub2/grub.cfg renders the VM Unbootable on the Dual Booted Machine
Summary: [Azure]grub2-mkconfig -o /boot/grub2/grub.cfg renders the VM Unbootable on th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: grub2
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 8.0
Assignee: Javier Martinez Canillas
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks: 1874468 1874469 1874470 1900100
TreeView+ depends on / blocked
 
Reported: 2020-06-23 17:20 UTC by Sriharsha-MSFT
Modified: 2021-08-27 22:28 UTC (History)
34 users (show)

Fixed In Version: grub2-2.02-88.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1874468 1874469 1874470 (view as bug list)
Environment:
Last Closed: 2020-11-04 01:53:54 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pjanda: needinfo-
pm-rhel: mirror+


Attachments (Terms of Use)
Kickstart used here (6.98 KB, text/plain)
2020-07-03 09:46 UTC, Sriharsha-MSFT
no flags Details
Making this Kickstart close to what Vitaly has provided in his latest Kickstart (6.21 KB, text/plain)
2020-07-03 15:37 UTC, Sriharsha-MSFT
no flags Details
rhel-8.3-kvm-x86_64.ks (6.80 KB, text/plain)
2020-07-17 19:43 UTC, Javier Martinez Canillas
no flags Details
rhel-8.3-kvm-x86_64.ks (6.84 KB, text/plain)
2020-07-29 09:11 UTC, Javier Martinez Canillas
no flags Details
Error Screenshot (15.09 KB, image/png)
2020-07-29 14:24 UTC, Sriharsha-MSFT
no flags Details
Kickstart modified as per azure requirement (3.23 KB, text/plain)
2020-07-29 14:26 UTC, Sriharsha-MSFT
no flags Details
rhel-8.3-kvm-x86_64.ks (6.75 KB, text/plain)
2020-08-17 09:59 UTC, Javier Martinez Canillas
no flags Details
Screen shot of grub menu on hyper-v (600.50 KB, image/bmp)
2020-08-20 13:57 UTC, Pankaj Basnal
no flags Details
Grub2 scratch repo sync error (108.80 KB, image/png)
2020-08-20 13:59 UTC, Pankaj Basnal
no flags Details
Image of boot screen of gen1 and gen2 vm from image built with the updated script (253.78 KB, image/png)
2020-08-28 06:55 UTC, Pankaj Basnal
no flags Details
KS file used to build the RHEL8 .3 VHD (3.11 KB, text/plain)
2020-09-20 14:25 UTC, Pankaj Basnal
no flags Details
Packer file used to build the image (1.68 KB, text/plain)
2020-09-20 14:29 UTC, Pankaj Basnal
no flags Details
Iso file created from ks file used to build the RHEL8 .3 VHD (352.00 KB, application/octet-stream)
2020-09-21 09:10 UTC, xuli
no flags Details
Simplified packer file used to build the image (1.34 KB, text/plain)
2020-09-21 09:12 UTC, xuli
no flags Details
KickstartFile for LVM partition (3.96 KB, text/plain)
2020-10-13 11:19 UTC, Pankaj Basnal
no flags Details
Result of booting the LVM image with grub fix (250.53 KB, image/png)
2020-10-15 08:31 UTC, Pankaj Basnal
no flags Details
Requested output of grub> set command (55.17 KB, image/png)
2020-10-19 06:30 UTC, Pankaj Basnal
no flags Details
rhel8-LVM-dual-test-v2.ks (3.96 KB, text/plain)
2020-10-19 08:52 UTC, Javier Martinez Canillas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1570991 0 unspecified CLOSED [RFE][RHEL 7.6]Azure: Support Gen1 and Gen2 Hyper-V VMs with single VHD image 2022-01-14 19:53:34 UTC

Internal Links: 2082813

Description Sriharsha-MSFT 2020-06-23 17:20:22 UTC
Description of problem: Upon implementing feature request Bugzilla : 1570991, when run grub2-mkconfig -o /boot/grub2/grub.cfg will render the VM unable to boot.

This happens when used the following kickstart : http://people.redhat.com/~vkuznets/RHEL8-Azure.ks

How Reproducible:

1. Create a Generation 2 VM on Hyper-V and install using the following Kickstart.
2. Save this VHD or create a Copy.
3. Now boot the same VHD on a Generation 1 VM on Hyper-V. The VM Still boots successfully.
4. Now perform grub2-mkconfig -o /boot/grub2/grub.cfg and reboot the Gen 1 VM and the VM is hung in Grub (I mean VM never boots)

Reasoning (That we strongly believe):
The problem originates with trying to make a single image work for both Generation 1 and Generation 2 VMs.  I see the problem with both RHEL 8.1+ and CentOS 8.1+ images when running in a Generation 1 VM.
 
As the images are shipped, this “code” is near the top of the /boot/grub2/grub.cfg file:
 
if [ -f (hd0,gpt15)/efi/redhat/grubenv ]; then
  load_env -f (hd0,gpt15)/efi/redhat/grubenv
elif [ -s $prefix/grubenv ]; then
  load_env
fi
 
But when grub2-mkconfig is run, the script in /etc/grub.d/00_header unconditionally produces the following for the newly generated grub.cfg:
 
if [ -f ${config_directory}/grubenv ]; then
  load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
  load_env
fi
 
The newly generated version interacts badly with /boot/grub2/grubenv being a symlink to ../efi/EFI/redhat/grubenv.   The old version explicitly tests for the existence of the path /efi/redhat/grubenv on the EFI System Partition [which is (hdo, gpt15)], and reads the grubenv from there, which works.  But the new version tries to follow the symlink, which I think is not working and is what causes grub to hang.
 
If I replace the /boot/grub2/grubenv symlink with an actual copy of the file, then the newly generated grub.cfg file works with no problems and regenerating multiple times using grub2-mkconfig also cause no problems.

So the generated config file using grub2-mkconfig using grub2-PC bootloader tries to find the grubenv file but finds a symlink and this is causing boot failure. 


Our Questions:

1. Why is the symlink created when installed grub2-pc bootloader? Why is it not creating a grubenv file ? We are more interested in reasoning behind the symlink creation also possible failures especially concerning this feature request.

2. If this is a bug from your end, how soon can we expect a fix to be released ?. Can we please release this ASAP ?

3. Can we also understand if we replace this with a normal file, where can we expect failures in future ? Does installing newer version of grub2-pc bootloader reverses the changes or does yum upgrade break ?

4. How does this interact whenever we needed to revert the Kernel back

Additional Requirements:
we do expect customers to be able to run grub2-mkconfig and have it work.  We set up the grub parameters for fast boot in a production environment, but in a dev/test environment, the customer may want a longer grub timeout or choose a different menu style.  We also want to allow customers to add kernel boot line parameters, again for various special configurations (set maxcpus=<n>, for example).  And even if we didn't expect customers to run grub2-mkconfig, there's not much we could do to prevent it, short of removing the file.  So from our standpoint,  it is important that grub2-mkconfig should work, and it must leave the VM in the same state as before, modulo any changes the customer has made to /etc/default/grub.


Impact Scenarios:
1. We are expecting huge customer base for UEFI bootable GPT partitioned virtual machines (Especially on SAP space)
2. Also Azure is coming up with Advanced security features for Generation 2 virtual Machines and we can have a relatively high customer base in future.

Comment 1 Sriharsha-MSFT 2020-06-23 17:55:37 UTC
Vitaly provided the following changes in the kickstart (Copied from the Bugzilla mentioned):

"
[I think it makes sense to open a new BZ against grub2 in RHEL8 as this one
is for RHEL7 and already closed.]

Starting from RHEL8.2 we've made RHEL-guest-image (QCOW2) hybrid booted (BIOS/UEFI).
The difference which what (AFAIR) was discussed for Azure is that we install the guest
in BIOS mode and add UEFI boot later (for Azure we were doing things the other way around).
The following changes were made to the kickstart:

diff --git a/rhel8/rhel-8.2-kvm-x86_64.ks b/rhel8/rhel-8.2-kvm-x86_64.ks
index f111525297d8..4ac11f551ebd 100644
--- a/rhel8/rhel-8.2-kvm-x86_64.ks
+++ b/rhel8/rhel-8.2-kvm-x86_64.ks
@@ -19,12 +19,15 @@ rootpw --iscrypted nope
 # This information is used by appliance-tools but
 # not by the livecd tools.
 #
-zerombr
-clearpart --all --initlabel
-# autopart --type=plain --nohome # --nohome doesn't work because of rhbz#1509350
-# autopart is problematic in that it creates /boot and swap partitions rhbz#1542510 rhbz#1673094
-reqpart
-part / --fstype="xfs" --ondisk=vda --size=8000
+%pre --erroronfail
+/usr/bin/dd bs=512 count=10 if=/dev/zero of=/dev/vda
+/usr/sbin/parted -s /dev/vda mklabel gpt
+/usr/sbin/parted -s /dev/vda print
+%end
+
+part biosboot  --size=1   --fstype=biosboot
+part /boot/efi --size=100 --fstype=efi
+part /         --size=7899 --fstype=xfs --label=root --grow
 reboot
 
 # Packages
@@ -35,6 +38,9 @@ kernel
 yum
 nfs-utils
 dnf-utils
+grub2-pc
+grub2-efi-x64
+shim
 
 # pull firmware packages out
 -aic94xx-firmware
@@ -143,6 +149,16 @@ insights-client
 passwd -d root
 passwd -l root
 
+# setup uefi boot
+/usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg
+/usr/sbin/parted -s /dev/vda disk_set pmbr_boot off
+
+# setup bios boot
+cat <<'EOF' > /etc/grub2.cfg
+search --no-floppy --set efi --file /efi/redhat/grub.cfg
+configfile ($efi)/efi/redhat/grub.cfg
+EOF
+
 # setup systemd to boot to the right runlevel
 echo -n "Setting default runlevel to multiuser text mode"
 rm -f /etc/systemd/system/default.target

In this setup we're not hacking grub2 config so I *think* it should
survive re-generating '/etc/grub2-efi.cfg' (/etc/grub2.cfg is a simple
stub including grub2-efi.cfg).

Could you check if this would work for Azure images?
"

Thanks Vitaly for considering this Hybrid Boot requirement. I really appreciate that. However, I have couple of questions regarding this:

1. Why are we not considering from 8.0 images itself ? Do we have any specific reasons behind this ? The reason is that Currently RHEL 8.0 is out in Production on Azure and RHEL 8.1 have SAP repos which majority of our customers are using and also affects the customers who are also not using SAP too. 

2. can you please let me know as to why are we creating this virtual device vda under /dev of size 512K with GPT partitioning. I'm interested in knowing the purpose behind this so that we can keep in mind during image building and also check if that fits our requirement too. . 
+/usr/bin/dd bs=512 count=10 if=/dev/zero of=/dev/vda
+/usr/sbin/parted -s /dev/vda mklabel gpt
+/usr/sbin/parted -s /dev/vda print

3. In the above lines, /dev/sda is used in Hyper-V which is full virtualization Hypervisor. I'm assuming /dev/vda naming means Paravirtualized driver (Most likely virtio. Not sure about the convention though) or am I missing something here ?
4. In the following lines mentioned below:
+# setup bios boot
+cat <<'EOF' > /etc/grub2.cfg
+search --no-floppy --set efi --file /efi/redhat/grub.cfg
+configfile ($efi)/efi/redhat/grub.cfg
+EOF

Can you please let us know the impact it can cause for the following scenarios:
a. Override with our grub2-mkconfig command. 
b. Let us say if customers upgrade from RHEL 8.1 to 8.2, does this gets automatically added. Probably more appropriate question would be what happens when customer downgrade OS flavour from 8.2 to 8.1 or the viceversa (upgrade).
c. What happens when customer updates grub bootloaders.

Comment 2 Vitaly Kuznetsov 2020-06-24 16:20:58 UTC
(In reply to Sriharsha-MSFT from comment #1)
> 
> Thanks Vitaly for considering this Hybrid Boot requirement. I really
> appreciate that. However, I have couple of questions regarding this:
> 
> 1. Why are we not considering from 8.0 images itself ? Do we have any
> specific reasons behind this ? The reason is that Currently RHEL 8.0 is out
> in Production on Azure and RHEL 8.1 have SAP repos which majority of our
> customers are using and also affects the customers who are also not using
> SAP too. 

Yes, we understand that in many cases there are valid reasons to build 
images instead of relying on rhel-guest-image which is being published.
You may want to have special package set (e.g. rhui client added), settings,
...

> 
> 2. can you please let me know as to why are we creating this virtual device
> vda under /dev of size 512K with GPT partitioning. I'm interested in knowing
> the purpose behind this so that we can keep in mind during image building
> and also check if that fits our requirement too. . 
> +/usr/bin/dd bs=512 count=10 if=/dev/zero of=/dev/vda
> +/usr/sbin/parted -s /dev/vda mklabel gpt
> +/usr/sbin/parted -s /dev/vda print

This basically just wipes the virtual disk and creates a GPT partition table
(empty). If you're creating RHEL8 image on RHEL8 you may use

clearpart --all --initlabel --drives=vda --disklabel=gpt

instead.

> 
> 3. In the above lines, /dev/sda is used in Hyper-V which is full
> virtualization Hypervisor. I'm assuming /dev/vda naming means
> Paravirtualized driver (Most likely virtio. Not sure about the convention
> though) or am I missing something here ?

This depends which device is being used for the automatic install. As we're 
running on a virtio device (for rhel-guest-image) the name of the device
is 'vda' and not 'sda'. rhel-guest-image is created in brew, but to do
the same locally one can do something like

virt-install --virt-type kvm --os-variant rhel8.0 --arch x86_64 --name rhel8-kvm-test --memory 4096 --disk bus=virtio,size=8 --nographics --initrd-inject=/root/rhel-8.2-kvm.ks --extra-args "console=ttyS0 ks=file:/rhel-8.2-kvm.ks" --location /var/lib/libvirt/images/RHEL-8.1.0-20191015.0-x86_64-dvd1.iso --network bridge=br0


> 4. In the following lines mentioned below:
> +# setup bios boot
> +cat <<'EOF' > /etc/grub2.cfg
> +search --no-floppy --set efi --file /efi/redhat/grub.cfg
> +configfile ($efi)/efi/redhat/grub.cfg
> +EOF
> 
> Can you please let us know the impact it can cause for the following
> scenarios:
> a. Override with our grub2-mkconfig command. 

If customer wants to regenerate grub2 config he'll need to do

grub2-mkconfig -o /etc/grub2-efi.cfg (or /efi/redhat/grub.cfg). 

The stub in /etc/grub2.cfg will have to stay intact.

> b. Let us say if customers upgrade from RHEL 8.1 to 8.2, does this gets
> automatically added. Probably more appropriate question would be what
> happens when customer downgrade OS flavour from 8.2 to 8.1 or the viceversa
> (upgrade).

No, nothing gets added automatically. This is what's being done when we
build hybrid images. Old 8.1 images are not hybrid and as they require
different partition layout there's no way to 'upgrade' existing setups.
Also, we don't provide a 'downgrade' path for the whole RHEL. Some package
may be downgraded manually but this is unusual.

> c. What happens when customer updates grub bootloaders.

If grub2 package gets upgraded the config files won't change, just like
with a regular setup. And I don't any actions from customer's side will
be required.

Comment 3 Sriharsha-MSFT 2020-06-25 07:11:32 UTC
Thanks for the detailed explanation. Further more scenarios I wanted to clarify here:
We are installing using the ISO on our Hyper-V environment using a kickstart. Can we still go ahead and use this line:
clearpart --all --initlabel --drives=vda --disklabel=gpt

Before we incorporate this into our image building, I would really appreciate if you could tell me the significance of addition of these lines
+cat <<'EOF' > /etc/grub2.cfg
+search --no-floppy --set efi --file /efi/redhat/grub.cfg
+configfile ($efi)/efi/redhat/grub.cfg
+EOF

Also, we are more concerned about the existing customers who would already be running RHEL 8.x images.

Reiterating our findings and our questions, can you/Peter please suggest if these steps are viable and what impact can we expect (IMO, the bugs we might encounter in future)
If I replace the /boot/grub2/grubenv symlink with an actual copy of the file, then the newly generated grub.cfg file works with no problems and regenerating multiple times using grub2-mkconfig also cause no problems.
So the generated config file using grub2-mkconfig using grub2-PC bootloader tries to find the grubenv file but finds a symlink and this is causing boot failure. 

Our Questions:
1. Why is the symlink created when installed grub2-pc bootloader? Why is it not creating a grubenv file ? We are more interested in reasoning behind the symlink creation also possible failures especially concerning this feature request.
2. If this is a bug from your end, how soon can we expect a fix to be released ?. Can we please release this ASAP ?
3. Can we also understand if we replace this with a normal file, where can we expect failures in future ? Does installing newer version of grub2-pc bootloader reverses the changes or does yum upgrade break ?
4. How does this interact whenever we needed to revert the Kernel back ?

Comment 4 Vitaly Kuznetsov 2020-06-25 10:12:26 UTC
(In reply to Sriharsha-MSFT from comment #3)
> Thanks for the detailed explanation. Further more scenarios I wanted to
> clarify here:
> We are installing using the ISO on our Hyper-V environment using a
> kickstart. Can we still go ahead and use this line:
> clearpart --all --initlabel --drives=vda --disklabel=gpt
> 
> Before we incorporate this into our image building, I would really
> appreciate if you could tell me the significance of addition of these lines
> +cat <<'EOF' > /etc/grub2.cfg
> +search --no-floppy --set efi --file /efi/redhat/grub.cfg
> +configfile ($efi)/efi/redhat/grub.cfg
> +EOF
> 

This basically 'includes' grub2-efi.cfg (/efi/redhat/grub.cfg) into grub2.cfg
(for grub2-pc).

> Also, we are more concerned about the existing customers who would already
> be running RHEL 8.x images.
> 
> Reiterating our findings and our questions, can you/Peter please suggest if
> these steps are viable and what impact can we expect (IMO, the bugs we might
> encounter in future)
> If I replace the /boot/grub2/grubenv symlink with an actual copy of the
> file, then the newly generated grub.cfg file works with no problems and
> regenerating multiple times using grub2-mkconfig also cause no problems.
> So the generated config file using grub2-mkconfig using grub2-PC bootloader
> tries to find the grubenv file but finds a symlink and this is causing boot
> failure. 
> 
> Our Questions:
> 1. Why is the symlink created when installed grub2-pc bootloader? Why is it
> not creating a grubenv file ? We are more interested in reasoning behind the
> symlink creation also possible failures especially concerning this feature
> request.
> 2. If this is a bug from your end, how soon can we expect a fix to be
> released ?. Can we please release this ASAP ?
> 3. Can we also understand if we replace this with a normal file, where can
> we expect failures in future ? Does installing newer version of grub2-pc
> bootloader reverses the changes or does yum upgrade break ?
> 4. How does this interact whenever we needed to revert the Kernel back ?

In the specfile we have

%ghost %config(noreplace) /boot/grub2/grubenv

this means that this won't be overwritten by grub2 update. Yum should 
work well in the situation you describe.

The main issue is that grubenv file is being edited by 'grub2-set-default'/
'grub2-reboot'/... tools. If you have a simlink than it's the same file and
tools work correctly. If you have two separate grubenv files, which one
is going to be edited? So I don't think this can be fixed in grub2 package.

In the solution I proposed above we still have a single grubenv file which
is being used by both EFI and BIOS boot. This not only makes all tools work
correctly (e.g. when upgrading a kernel) but also allows customers to 
alternate between different boot schemes -- e.g. when it switches from Gen1
to Gen2 instance and he has some changes in grubenv (booting some other
kernel by default, non-standard kernel options,...).

My suggestion to move forward is:
Try generating images with the suggested kickstart changes and make sure
everything works well:
- Both UEFI and BIOS boot work
- We have a single grubenv and 'grub2-set-default'/'grub2-reboot' tools work
as expected with both BIOS and UEFI booted guests.
- Re-generated '/etc/grub2-efi' config also boots and works for both BIOS and
UEFI.

For the existing customers we:
- Suggest them to NOT regenerate grub2 config. This is mostly not needed with
'grub2-set-default'/'grub2-reboot'/'grub2-editenv' tools, they need to use
them instead.
- If someone already re-generated with config and it doesn't work, suggest
him to manually edit it back. I understand this is not ideal but I'd rather
stay conservative and not do any changes to customers' boot configuration
automatically as we don't know if they've edited anything by hand. The risk
of breaking some setup is very high IMO.

Comment 5 Vitaly Kuznetsov 2020-06-25 12:37:02 UTC
Also, I forgot one important detail. RHEL8 has BLS enabled by default, this
means there are no kernel entries and kernel parameters in /etc/grub2-efi.cfg
(/etc/grub2.cfg for grub2-pc) so there is even less need to have it regenerated.

Comment 6 Jon Sturgis 2020-06-25 16:58:43 UTC
Hi All

I just got off a sync call with MSFT SAP team. 

The feedback was that grub is explicitly needed for SAP environments. 

This has halted the 8.1 certification for MSFT and impacting revenue and user experience.

Comment 7 Sriharsha-MSFT 2020-06-25 18:53:01 UTC
I think we still have unanswered questions here:

1. I'm not sure if this concern is already addressed of am I missing something here:
+cat <<'EOF' > /etc/grub2.cfg
+search --no-floppy --set efi --file /efi/redhat/grub.cfg
+configfile ($efi)/efi/redhat/grub.cfg
+EOF

The problem I might be thinking here is that whenever we run grub2-mkconfig -o /etc/grub.cfg these lines gets replaced. 
So the actual question is if there is a possibility of boot failure concerning that these lines will no longer be present once regenerated in grub2-mkconfig ?
But we will let you know once we start testing the kickstart. Can you please provide the complete link to the kickstart ?

2.
For the existing customers we:
- Suggest them to NOT regenerate grub2 config. This is mostly not needed with
'grub2-set-default'/'grub2-reboot'/'grub2-editenv' tools, they need to use
them instead.

Sriharsha : Well, if they wanted to change the kernel parameters or addition of kernel parameters, we still need to perform regeneration of grub2 config.

- If someone already re-generated with config and it doesn't work, suggest
him to manually edit it back. I understand this is not ideal but I'd rather
stay conservative and not do any changes to customers' boot configuration
automatically as we don't know if they've edited anything by hand. The risk
of breaking some setup is very high IMO.

Sriharsha : we do expect customers to be able to run grub2-mkconfig and have it work.  We set up the grub parameters for fast boot in a production environment, but in a dev/test environment, the customer may want a longer grub timeout or choose a different menu style.  We also want to allow customers to add kernel boot line parameters, again for various special configurations (set maxcpus=<n>, for example).  And even if we didn't expect customers to run grub2-mkconfig, there's not much we could do to prevent it, short of removing the file.  So from our standpoint,  it is important that grub2-mkconfig should work, and it must leave the VM in the same state as before, modulo any changes the customer has made to /etc/default/grub.

How do you suggest we proceed under these conditions ?

Comment 8 Vitaly Kuznetsov 2020-06-26 14:10:39 UTC
(In reply to Sriharsha-MSFT from comment #7)
> I think we still have unanswered questions here:
> 
> 1. I'm not sure if this concern is already addressed of am I missing
> something here:
> +cat <<'EOF' > /etc/grub2.cfg
> +search --no-floppy --set efi --file /efi/redhat/grub.cfg
> +configfile ($efi)/efi/redhat/grub.cfg
> +EOF
> 
> The problem I might be thinking here is that whenever we run grub2-mkconfig
> -o /etc/grub.cfg these lines gets replaced. 
> So the actual question is if there is a possibility of boot failure
> concerning that these lines will no longer be present once regenerated in
> grub2-mkconfig ?

Of course but I don't see how this is different from doing 

 # echo 123 > /etc/grub.cfg

and making the system unbootable. I don't think we have any automated script
in RHEL which would attempt to run grub2-mkconfig -- or am I missing something?

> But we will let you know once we start testing the kickstart. Can you please
> provide the complete link to the kickstart ?

Please use http://people.redhat.com/~vkuznets/rhel-8.3-kvm-x86_64.ks

> 
> 2.
> For the existing customers we:
> - Suggest them to NOT regenerate grub2 config. This is mostly not needed with
> 'grub2-set-default'/'grub2-reboot'/'grub2-editenv' tools, they need to use
> them instead.
> 
> Sriharsha : Well, if they wanted to change the kernel parameters or addition
> of kernel parameters, we still need to perform regeneration of grub2 config.

No, the don't. Kernel parameters are stored in grubenv so customers are
advised to do grub2-editenv, there is no need to regenerate grub2 config
just to change kernel parameters. Same with picking default kernel to
boot.

> 
> - If someone already re-generated with config and it doesn't work, suggest
> him to manually edit it back. I understand this is not ideal but I'd rather
> stay conservative and not do any changes to customers' boot configuration
> automatically as we don't know if they've edited anything by hand. The risk
> of breaking some setup is very high IMO.
> 
> Sriharsha : we do expect customers to be able to run grub2-mkconfig and have
> it work.  We set up the grub parameters for fast boot in a production
> environment, but in a dev/test environment, the customer may want a longer
> grub timeout or choose a different menu style.  We also want to allow
> customers to add kernel boot line parameters, again for various special
> configurations (set maxcpus=<n>, for example).  And even if we didn't expect
> customers to run grub2-mkconfig, there's not much we could do to prevent it,
> short of removing the file.  So from our standpoint,  it is important that
> grub2-mkconfig should work, and it must leave the VM in the same state as
> before, modulo any changes the customer has made to /etc/default/grub.
> 
> How do you suggest we proceed under these conditions ?

Again, no need to regenerate grub2 config to change kernel parameters. The 
same effect can be achieved with grub2-editenv and this is also much safer.
/etc/default/grub contains default parameters when grubenv is missing and
this is not the case for the already provisioned instances.

Also, I just checked and RHEL8 documentation suggests to use grubby for
these tasks:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kernel-command-line-parameters_managing-monitoring-and-updating-the-kernel

I also believe that grubby won't be regenerating grub2 config, it won't
even touch it when BLS is in use (default).

I agree there may be very special cases when customer would want to use
grub2-mkconfig but these should be extremely rare, especially for cloud
usages (no other OSes to boot).

Comment 9 Michael Kelley 2020-06-26 16:15:07 UTC
This is good info Vitaly.  I see in the RHEL 7 documentation, grub2-mkconfig is still mentioned in several places. But the RHEL 8 documentation has gone entirely to grubby, with no mention of grub2-mkconfig.  On the flip side, there are a lot of public web pages by "experts" (not from Red Hat) that still refer to using grub2-mkconfig.  I personally picked up using grub2-mkconfig somewhere along the way, and have not shifted my thinking to using grubby.  And since grub2-mkconfig is still present in RHEL 8, it would be easy for someone to continue using the old way of doing things, either out of habit or based on instructions in some public web page.  Microsoft Azure documentation also has references to using grub2-mkconfig that we evidently should change.

Comment 10 Javier Martinez Canillas 2020-06-29 11:37:40 UTC
For me the crux of the matter is that the GRUB config was never prepared
for hybrid images. I think is a reasonable request to support these kind
of setups and that we should improve the grub2 package so it can work out
of the box without the need to find workarounds as is the case now.

Having said that, there are two issues mentioned in this bugzilla that
need to be solved with the current configuration layout:

a) /boot/grub2/grubenv being a symlink to ../efi/EFI/redhat/grubenv for
   EFI installs. In legacy BIOS installs this is just a normal file. But
   in the hybrid image is a symlink, because the image is an EFI install
   that also contains a grub2-pc image in a biosboot partition.
   
   This has the problem that GRUB isn't able to follow that symlink, so
   when booting using legacy BIOS it will fail to read the grubenv file.

b) EFI and legacy BIOS having different locations for the GRUB config file.

   This has the problem that there are two different config files that are
   used. So if there's a need to modify something in them (i.e: increasing
   the GRUB_TIMEOUT), users will have to regenerate both config files.

The solution for (a) proposed by Vitaly in [0] was to always load the grubenv
that is located in the ESP, even for legacy BIOS boot. That way GRUB wouldn't
try to read the symlink and fail.

But is done by patching the /boot/grub2/grub.cfg generated by grub2-mkconfig,
so regenerating this file would break since GRUB will try to load the grubenv
from $prefix/grubenv again. And that's the path to the symlink in /boot/grub2.

It's not clear to me how (a) is solved in the second quickstart provided [1].
Since /boot/grub2/grub.cfg just redirects to /boot/efi/redhat/grub.cfg, GRUB
will try to load the grubenv from $prefix/grubenv, which as mentioned is the
symlink in /boot/grub2/grubenv in the case of legacy BIOS.

It does partially solve (b) though, since now users could just regenerate the
/boot/efi/EFI/redhat/grub.cfg for both EFI and legacy BIOS booting. But it has
the drawback that regenerating /boot/grub2/grub.cfg will break the GRUB config
redirection, so (b) will be introduced again as mentioned by Sriharsha.

I don't have the historical background to completely understand why the config
is done in the way it is. One reason may be robustness. By having the grub.cfg
in the ESP, GRUB should be able to run with a proper configuration even if the
partition containing /boot is not accessible.

This may be particularly important in the case of multiboot setups since the same
GRUB could be used to boot another OS. But is maybe less relevant for RHEL so one
option could be to make GRUB just read the grubenv and grub.cfg from /boot/grub2,
since it needs access to that to read the kernel and initrd images anyways.

So for the long term I think we should explore that option. This will avoid both
having a grubenv symlink and two grub.cfg, which cause issues on hybrid images.

In the short term I think that Vitaly's solutions are the correct ones. But maybe
for grub.cfg we could do the opposite, and instead /boot/efi/EFI/redhat/grub.cfg
could redirect to /boot/grub2/grub.cfg.

I believe most of the non-RHEL documentation refers to /boot/grub/grub.cfg, since
that is the path used by other distributions as far as I know. By making the GRUB
config in the efidir of the ESP static, is less likely that users will make the
mistake of overwriting it by reading information found outside the official docs.

Maybe we could make grub2-mkconfig to detect that is a hybrid installation and
avoid regenerating the /boot/efi/EFI/redhat/grub.cfg file, telling the user that
this will break their setup and that they just need to regenerate the file in
/boot/grub2/grub.cfg instead. This would solve (b) and make it more robust. But
is too late to do this grub2-mkconfig change for 8.3, so it will have to be for
a future release.

For (a), what could be done is to make both legacy BIOS and EFI to load the same
grubenv from /boot/grub2, and make this a regular file instead of a symlink.

This could be achieved by setting $prefix to (hd0,gpt2)/grub2 in the grub.cfg
that is located in /boot/efi/EFI/redhat and redirects to the one in /boot/grub2.

Since for EFI the fw_path variable is used to find the efidir and $prefix is just
a fallback, I don't think changing this would have any negative effect for GRUB.

I also agree with Vitaly that changing the configuration of existing installs
automatically is too risky and prone to errors. So there should be documentation
for users to do this. It may be even a script provided to help with this, but
users should understand the risk and do the conversion explicitly.

[0]: http://people.redhat.com/~vkuznets/RHEL8-Azure.ks
[1]: http://people.redhat.com/~vkuznets/rhel-8.3-kvm-x86_64.ks

Comment 11 Sriharsha-MSFT 2020-07-02 17:48:43 UTC
I agree to Javier here. However I have some questions. (I have not yet tested this solution but will provide a detailed results over this bugzilla)

1. I'm not sure how we are able to answer this question here with the current Kickstart even after your change suggested:
Whenever we regenerate this grub2.cfg, will we not override (hd0,gpt2)/grub2 with $prefix. How do we ensure that this (hd0,gpt2) is always persistent across grub2.cfg regeneration.

2. If this is something a fix that you would be introducing in future, When can we expect a fix to tackle this issue. 
Please note that these images will be used in PROD by variety of customers such as SAP workloads and it wouldn't be a fair experience if this workaround is not stable.

3. With the above workaround be applied (assuming that this would suffice all of our scenarios) how will this grub2 fix impact the existing customers upon Yum update or upgrade since now we are working with the grubenv file and the updated grub package will no longer be having this symlink.

Comment 12 Sriharsha-MSFT 2020-07-03 09:46:20 UTC
Created attachment 1699805 [details]
Kickstart used here

Comment 13 Sriharsha-MSFT 2020-07-03 09:54:24 UTC
Asking the needinfo again. Please check and let me know

Comment 14 Sriharsha-MSFT 2020-07-03 10:01:20 UTC
Tested using the kickstart present in the attachment above (I had to modify it to work according to Hyper-V requirements). Here are the results:
1. Generation 1 VM (biosboot VM) is not booting
2. Generation 2 VM (UEFI Booted VM) works fine.

Can you please look at this ASAP ?

Comment 15 Sriharsha-MSFT 2020-07-03 15:37:48 UTC
Created attachment 1699866 [details]
Making this Kickstart close to what Vitaly has provided in his latest Kickstart

Generation 1 VM does not boot.
Please look into this ASAP.

Comment 16 Vitaly Kuznetsov 2020-07-07 14:11:15 UTC
(In reply to Sriharsha-MSFT from comment #15)
> Created attachment 1699866 [details]
> Making this Kickstart close to what Vitaly has provided in his latest
> Kickstart
> 
> Generation 1 VM does not boot.
> Please look into this ASAP.

I modified your kickstart by explicitly adding both grub2-efi/shim and grub2-pc:

@@ -141,6 +141,11 @@
 # Note: this came from Red Hat's guidance in their RHEL 8 certification feedback
 insights-client
 
+# BIOS/UEFI boot
+grub2-pc
+grub2-efi-x64
+shim
+
 %end

and I'm generating the image with

virt-install --virt-type kvm --os-variant rhel8.0 --arch x86_64 --name rhel8-ms-test --memory 4096 --disk bus=scsi,size=<MY_IMAGE_SIZE> --nographics --initrd-inject=/root/ms_new.ks --extra-args "console=ttyS0 ks=file:/ms_new.ks" --location /var/lib/libvirt/images/RHEL-XXXX-dvd1.iso --network bridge=MY_NET_BRIDGE

(NOTE: I'm installing in BIOS mode)

and the resulting image seems to boot fine with both BIOS and UEFI.

Comment 17 Vitaly Kuznetsov 2020-07-07 14:17:36 UTC
(In reply to Sriharsha-MSFT from comment #11)
> I agree to Javier here. However I have some questions. (I have not yet
> tested this solution but will provide a detailed results over this bugzilla)
> 

I'm not from bootloader team but let me provide my version of the answers.
I'd be more than happy if Javier/Peter would correct me.

> 1. I'm not sure how we are able to answer this question here with the
> current Kickstart even after your change suggested:
> Whenever we regenerate this grub2.cfg, will we not override (hd0,gpt2)/grub2
> with $prefix. How do we ensure that this (hd0,gpt2) is always persistent
> across grub2.cfg regeneration.

Users should not regenerate (or, delete) grub2.cfg on RHEL8: for all bootloader
configuration tasks (selecting default kernel, changing kernel boot params,...)
there are other ways to do that.

> 
> 2. If this is something a fix that you would be introducing in future, When
> can we expect a fix to tackle this issue. 
> Please note that these images will be used in PROD by variety of customers
> such as SAP workloads and it wouldn't be a fair experience if this
> workaround is not stable.

Future grub updates will not change the existing configuration, i.e. grub2.cfg
will not be regenerated. It would be very risky to do so.

> 
> 3. With the above workaround be applied (assuming that this would suffice
> all of our scenarios) how will this grub2 fix impact the existing customers
> upon Yum update or upgrade since now we are working with the grubenv file
> and the updated grub package will no longer be having this symlink.

This is for Javier/Peter to answer but I don't think there are current plans
to eliminate the symlink, at least in RHEL8 lifetime.

Comment 18 Alfred Sin 2020-07-08 02:16:40 UTC
Hey Vitaly - is there any reason you chose to install in BIOS mode first? I'm wondering if installing in UEFI mode first would also work to create a hybrid boot image. IIRC (and Sri can correct me if I'm wrong), our builds right now start off with a UEFI image and then add BIOS. We're worried that installing in BIOS mode and then adding UEFI would come with a slew of other changes that might be disruptive to our build pipeline. If UEFI is the long term goal for us, then building a UEFI image that can also boot from an MBR feels like the preferred way to go, as opposed to building a BIOS image that can also boot from a GPT.

Comment 19 Vitaly Kuznetsov 2020-07-08 12:21:39 UTC
Hi Alfred,

it should work both ways: installing in UEFI mode and adding BIOS boot and vice versa. For kvm-guest-image
which we publish with each minor RHEL release we decided to go with installing in legacy BIOS mode just
because these images were generated on RHEL7 hosts and UEFI is not a fully supported feature there. In
case your build host is RHEL8 it should all work.

Comment 20 Alfred Sin 2020-07-08 23:45:31 UTC
I see, that makes sense. We actually use Hyper-V to build our images to align with Azure's hypervisor so our build host is a Windows machine. Sri will try building an image as UEFI first and then adding BIOS, and will provide an update here as to what happens. We might need your help to create a kickstart that is Hyper-V compatible depending on how the build works out. If it comes to a point where we're blocked again, could we perhaps hop on a call to discuss/troubleshoot?

On another note, I do find it a bit peculiar that RHEL 8 has moved on from grub2-mkconfig, yet grub2-mkconfig is still able to be run without any warning provided to the user that the expectation is to use grubby going forward. Similar to what Michael mentioned in comment 9 above, I have also picked up grub2-mkconfig as the first command that comes to mind to regenerate grub. I have no problem with shifting my mindset to think about grubby as the first option - after all, technology changes - but I do wonder how many customers may not be aware of that shift.

Comment 21 Javier Martinez Canillas 2020-07-09 10:17:10 UTC
(In reply to Vitaly Kuznetsov from comment #17)
> (In reply to Sriharsha-MSFT from comment #11)
> > I agree to Javier here. However I have some questions. (I have not yet
> > tested this solution but will provide a detailed results over this bugzilla)
> > 
> 
> I'm not from bootloader team but let me provide my version of the answers.
> I'd be more than happy if Javier/Peter would correct me.
> 
> > 1. I'm not sure how we are able to answer this question here with the
> > current Kickstart even after your change suggested:
> > Whenever we regenerate this grub2.cfg, will we not override (hd0,gpt2)/grub2
> > with $prefix. How do we ensure that this (hd0,gpt2) is always persistent
> > across grub2.cfg regeneration.
> 
> Users should not regenerate (or, delete) grub2.cfg on RHEL8: for all
> bootloader
> configuration tasks (selecting default kernel, changing kernel boot
> params,...)
> there are other ways to do that.
>

That's correct. Although that's why I suggested that maybe the opposite could
be done, that is to make the GRUB config file in the ESP to load the one that
is in the boot partition. 

For example, something like the following (untested) change to the kickstart:

@@ -133,15 +133,18 @@
 #
 %post --erroronfail
 
-# setup uefi boot
-/usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg
-/usr/sbin/parted -s /dev/sda disk_set pmbr_boot off
-
 # setup bios boot
-cat <<'EOF' > /etc/grub2.cfg
-search --no-floppy --set efi --file /efi/redhat/grub.cfg
-configfile ($efi)/efi/redhat/grub.cfg
+rm /boot/grub2/grubenv
+cp /boot/efi/EFI/redhat/grubenv /boot/grub2/
+/usr/sbin/grub2-mkconfig -o /etc/grub2.cfg
+
+# setup uefi boot
+fs_uuid="$(grub2-probe --target=fs_uuid /etc/grub.cfg)"
+cat << EOF > /etc/grub2-efi.cfg
+search --no-floppy --fs-uuid --set=boot $fs_uuid
+configfile (\$boot)/grub2/grub.cfg
 EOF
+/usr/sbin/parted -s /dev/sda disk_set pmbr_boot off
 
This change a couple of things:

- Generates the real GRUB config in /etc/grub.cfg that will be used
  by both EFI and legacy BIOS. So it's the opposite of what Vitaly did
  and I think that this has the advantage that it will allows users to
  run grub2-mkconfig -o /boot/grub2/grub.cfg without breaking anything.

- Uses the filesystem UUID to make sure that the GRUB config file is
  read from the correct partition.

- Gets rid of the grubenv symlink, and in both EFI and legacy BIOS the
  file in the boot partition will be used. Same than with the grub.cfg.

It does have the drawback that grub2-mkconfig -o /etc/grub2-efi.cfg or
grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg could cause issues since
/boot/grub2/grubenv won't be a symlink to the grubenv file in the ESP
anymore. So the latter won't be updated by grub2-editenv and other tools.

But I still I think that what most users will find by searching information
about GRUB online is that the GRUB config is in /boot/grub2/grub.cfg, so I
think is better to focus on that case.

Also, I think this is the setup that we should use to support hybrid images
out of the box. So it would be better if these images are aligned with what
we might change in the future for new installs.

> > 
> > 2. If this is something a fix that you would be introducing in future, When
> > can we expect a fix to tackle this issue. 
> > Please note that these images will be used in PROD by variety of customers
> > such as SAP workloads and it wouldn't be a fair experience if this
> > workaround is not stable.
> 
> Future grub updates will not change the existing configuration, i.e.
> grub2.cfg
> will not be regenerated. It would be very risky to do so.
>

Agreed. Anything that could be changed will be for future installs, touching
existing configuration is very prone to errors.
 
> > 
> > 3. With the above workaround be applied (assuming that this would suffice
> > all of our scenarios) how will this grub2 fix impact the existing customers
> > upon Yum update or upgrade since now we are working with the grubenv file
> > and the updated grub package will no longer be having this symlink.
> 
> This is for Javier/Peter to answer but I don't think there are current plans
> to eliminate the symlink, at least in RHEL8 lifetime.

As mentioned I believe that we should get rid of the symlink and just make the
main GRUB config the one that is in the boot partition and make that to load
the grubenv that is there too. But yes, at least for the RHEL8 lifetime I don't
think that we should do these changes, and definitely no for existing configs.

Comment 22 Javier Martinez Canillas 2020-07-09 10:27:11 UTC
(In reply to Alfred Sin from comment #20)

[snip]

> 
> On another note, I do find it a bit peculiar that RHEL 8 has moved on from
> grub2-mkconfig, yet grub2-mkconfig is still able to be run without any
> warning provided to the user that the expectation is to use grubby going
> forward. Similar to what Michael mentioned in comment 9 above, I have also
> picked up grub2-mkconfig as the first command that comes to mind to
> regenerate grub. I have no problem with shifting my mindset to think about
> grubby as the first option - after all, technology changes - but I do wonder
> how many customers may not be aware of that shift.

Do you think that would help if we make the GRUB config in the ESP to be the
stub that loads the one in the boot partition as I proposed in Comment 21 or
are both approaches equally prone to errors?

Comment 23 Vitaly Kuznetsov 2020-07-13 08:59:54 UTC
(In reply to Alfred Sin from comment #20)
> I see, that makes sense. We actually use Hyper-V to build our images to
> align with Azure's hypervisor so our build host is a Windows machine. Sri
> will try building an image as UEFI first and then adding BIOS, and will
> provide an update here as to what happens. We might need your help to create
> a kickstart that is Hyper-V compatible depending on how the build works out.
> If it comes to a point where we're blocked again, could we perhaps hop on a
> call to discuss/troubleshoot?
> 

Sure, let's sync by email and set up a meeting if needed.

> On another note, I do find it a bit peculiar that RHEL 8 has moved on from
> grub2-mkconfig, yet grub2-mkconfig is still able to be run without any
> warning provided to the user that the expectation is to use grubby going
> forward. Similar to what Michael mentioned in comment 9 above, I have also
> picked up grub2-mkconfig as the first command that comes to mind to
> regenerate grub. I have no problem with shifting my mindset to think about
> grubby as the first option - after all, technology changes - but I do wonder
> how many customers may not be aware of that shift.

Actually, with BLS enabled (default in RHEL8) re-generating grub2 config may
not give the desired effect (e.g. if you want to have a new kernel entry added)
so those who change their configs manually should be already aware to certain
extent. I fully agree with the need to better document such peculiarities.

Going forward, I also agree that the best solution would be to have a single
grub.cfg/grubenv/... which would work for all possible boot schemes but I'm
not really sure how much work is this :-(

Comment 24 Sriharsha-MSFT 2020-07-14 05:41:39 UTC
(In reply to Javier Martinez Canillas from comment #21)
> (In reply to Vitaly Kuznetsov from comment #17)
> > (In reply to Sriharsha-MSFT from comment #11)
> > > I agree to Javier here. However I have some questions. (I have not yet
> > > tested this solution but will provide a detailed results over this bugzilla)
> > > 
> > 
> > I'm not from bootloader team but let me provide my version of the answers.
> > I'd be more than happy if Javier/Peter would correct me.
> > 
> > > 1. I'm not sure how we are able to answer this question here with the
> > > current Kickstart even after your change suggested:
> > > Whenever we regenerate this grub2.cfg, will we not override (hd0,gpt2)/grub2
> > > with $prefix. How do we ensure that this (hd0,gpt2) is always persistent
> > > across grub2.cfg regeneration.
> > 
> > Users should not regenerate (or, delete) grub2.cfg on RHEL8: for all
> > bootloader
> > configuration tasks (selecting default kernel, changing kernel boot
> > params,...)
> > there are other ways to do that.
> >
> 
> That's correct. Although that's why I suggested that maybe the opposite could
> be done, that is to make the GRUB config file in the ESP to load the one that
> is in the boot partition. 
> 
> For example, something like the following (untested) change to the kickstart:
> 
> @@ -133,15 +133,18 @@
>  #
>  %post --erroronfail
>  
> -# setup uefi boot
> -/usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg
> -/usr/sbin/parted -s /dev/sda disk_set pmbr_boot off
> -
>  # setup bios boot
> -cat <<'EOF' > /etc/grub2.cfg
> -search --no-floppy --set efi --file /efi/redhat/grub.cfg
> -configfile ($efi)/efi/redhat/grub.cfg
> +rm /boot/grub2/grubenv
> +cp /boot/efi/EFI/redhat/grubenv /boot/grub2/
> +/usr/sbin/grub2-mkconfig -o /etc/grub2.cfg
> +
> +# setup uefi boot
> +fs_uuid="$(grub2-probe --target=fs_uuid /etc/grub.cfg)"
> +cat << EOF > /etc/grub2-efi.cfg
> +search --no-floppy --fs-uuid --set=boot $fs_uuid
> +configfile (\$boot)/grub2/grub.cfg
>  EOF
> +/usr/sbin/parted -s /dev/sda disk_set pmbr_boot off
>  
> This change a couple of things:
> 
> - Generates the real GRUB config in /etc/grub.cfg that will be used
>   by both EFI and legacy BIOS. So it's the opposite of what Vitaly did
>   and I think that this has the advantage that it will allows users to
>   run grub2-mkconfig -o /boot/grub2/grub.cfg without breaking anything.
> 
> - Uses the filesystem UUID to make sure that the GRUB config file is
>   read from the correct partition.
> 
> - Gets rid of the grubenv symlink, and in both EFI and legacy BIOS the
>   file in the boot partition will be used. Same than with the grub.cfg.
> 
> It does have the drawback that grub2-mkconfig -o /etc/grub2-efi.cfg or
> grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg could cause issues since
> /boot/grub2/grubenv won't be a symlink to the grubenv file in the ESP
> anymore. So the latter won't be updated by grub2-editenv and other tools.
> 
> But I still I think that what most users will find by searching information
> about GRUB online is that the GRUB config is in /boot/grub2/grub.cfg, so I
> think is better to focus on that case.
> 
> Also, I think this is the setup that we should use to support hybrid images
> out of the box. So it would be better if these images are aligned with what
> we might change in the future for new installs.
> 
> > > 
> > > 2. If this is something a fix that you would be introducing in future, When
> > > can we expect a fix to tackle this issue. 
> > > Please note that these images will be used in PROD by variety of customers
> > > such as SAP workloads and it wouldn't be a fair experience if this
> > > workaround is not stable.
> > 
> > Future grub updates will not change the existing configuration, i.e.
> > grub2.cfg
> > will not be regenerated. It would be very risky to do so.
> >
> 
> Agreed. Anything that could be changed will be for future installs, touching
> existing configuration is very prone to errors.
>  
> > > 
> > > 3. With the above workaround be applied (assuming that this would suffice
> > > all of our scenarios) how will this grub2 fix impact the existing customers
> > > upon Yum update or upgrade since now we are working with the grubenv file
> > > and the updated grub package will no longer be having this symlink.
> > 
> > This is for Javier/Peter to answer but I don't think there are current plans
> > to eliminate the symlink, at least in RHEL8 lifetime.
> 
> As mentioned I believe that we should get rid of the symlink and just make
> the
> main GRUB config the one that is in the boot partition and make that to load
> the grubenv that is there too. But yes, at least for the RHEL8 lifetime I
> don't
> think that we should do these changes, and definitely no for existing
> configs.

Apologies for the delay in response. Finally got some bandwidth testing the changes you suggested
The VM is stuck in Grub Rescue.

Meanwhile, I would really appreciate if we can get a kickstart which is tested as per Hyper-V environment.

Comment 25 Javier Martinez Canillas 2020-07-17 19:43:35 UTC
Created attachment 1701582 [details]
rhel-8.3-kvm-x86_64.ks

Comment 26 Javier Martinez Canillas 2020-07-17 19:53:19 UTC
(In reply to Sriharsha-MSFT from comment #24)

[snip]

> > 
> > As mentioned I believe that we should get rid of the symlink and just make
> > the
> > main GRUB config the one that is in the boot partition and make that to load
> > the grubenv that is there too. But yes, at least for the RHEL8 lifetime I
> > don't
> > think that we should do these changes, and definitely no for existing
> > configs.
> 
> Apologies for the delay in response. Finally got some bandwidth testing the
> changes you suggested
> The VM is stuck in Grub Rescue.
>

Yes, sorry about that. As mentioned the proposed changes were untested to explain
what I thought that we could change to make it more robust. But I noticed a few
typos in the changes.

I've attached a rhel-8.3-kvm-x86_64.ks that contains Vitaly's latest quickstart
plus my suggested changes after the fixes.
 
> Meanwhile, I would really appreciate if we can get a kickstart which is
> tested as per Hyper-V environment.

I don't have access to an Hyper-V environment but I've tested now with both BIOS
and EFI. I used virt-install and qemu-system-x86_64 with SeaBIOS and edk2/ovmf.

To generate the hybrid image I used the following command shred by Vitaly:

virt-install --virt-type kvm --os-variant rhel8.0 --arch x86_64 --name rhel8-ms-test \
--memory 4096 --disk bus=virtio,size=12 --nographics --initrd-inject=rhel-8.3-kvm-x86_64.ks \
--extra-args "console=ttyS0 ks=file:/rhel-8.3-kvm-x86_64.ks" \
--location RHEL-8.3.0-20200702.n.0-x86_64-dvd1.iso --network bridge=virbr0

And then booted using the following commands for BIOS and EFI:

sudo qemu-system-x86_64 -m 4G -display none -serial mon:stdio -hda rhel8-ms-test.qcow2

sudo qemu-system-x86_64 -m 4G -display none -serial mon:stdio \
-drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -hda rhel8-ms-test.qcow2

Comment 27 Javier Martinez Canillas 2020-07-28 07:06:03 UTC
(In reply to Javier Martinez Canillas from comment #26)
> (In reply to Sriharsha-MSFT from comment #24)

[snip]

> 
> I don't have access to an Hyper-V environment but I've tested now with both
> BIOS
> and EFI. I used virt-install and qemu-system-x86_64 with SeaBIOS and
> edk2/ovmf.
> 

Please wait to test this since Vitaly found some issues when installing with
EFI (I only tested installing with legacy BIOS). I will take a look to those
and  attach a new kickstart file once I've addressed them.

Comment 28 Javier Martinez Canillas 2020-07-29 09:11:49 UTC
Created attachment 1702773 [details]
rhel-8.3-kvm-x86_64.ks

Comment 29 Javier Martinez Canillas 2020-07-29 09:46:11 UTC
I've attached a new version of the kickstart file. This fixes an issue that
Vitaly found. That issue was that the grub2-install tool isn't executed when
installing in EFI mode.

I've also found a bug in the GRUB module that parses the BLS files. If the
GRUB config file is generated when booting with EFI, it doesn't work for BIOS.

But the GRUB config file generated by BIOS does work for both EFI and BIOS.
Since we were installing in BIOS mode and adding the EFI bits on top, we didn't
notice this issue before.

I've fix for this and have a repo with a test build here:

https://javierm.fedorapeople.org/rhbz1850193/grub2-2.02-85.el8-scratch.repo

You can also add the grub2 test repositories when generating a hybrid image using
the following command:

virt-install --boot uefi --virt-type kvm --os-variant rhel8.0 --arch x86_64 --name rhel8-ms-test \
--memory 4096 --disk bus=virtio,size=12 --nographics --initrd-inject=rhel-8.3-kvm-x86_64.ks \
--extra-args "console=ttyS0 ks=file:/rhel-8.3-kvm-x86_64.ks inst.addrepo=grub2-x86_64,https://javierm.fedorapeople.org/rhbz1850193/x86_64/ inst.addrepo=grub2-i686,https://javierm.fedorapeople.org/rhbz1850193/i686/" \
--location RHEL-8.3.0-20200702.n.0-x86_64-dvd1.iso --network bridge=virbr0

To test, the same qemu-system-x86_64 commands shared in Comment 26 can be used.

With these changes, the GRUB config file in /boot/grub2/grub.cfg will always be used,
so users could re-generate them to change any of the GRUB config options in the
/etc/default/grub file.

Re-generating the stub GRUB config file in /boot/efi/EFI/redhat/grub.cfg should not
render the machine unbootable but just lead to EFI and BIOS having different config
files and also the one for EFI not loading the grubenv in /boot/grub2/grubenv.

So the user should be able to recover from that situation and a script or tool could
be provided to re-generate the stub config file in /boot/efi/EFI/redhat/grub.cfg to
include again the one in /boot/grub2/grub.cfg as, it was in the original image.

Comment 30 Javier Martinez Canillas 2020-07-29 10:21:55 UTC
Something I forgot to mention is that since this is a test build, the GRUB binary is
not signed and Secure Boot should be disabled to test.

Comment 31 Sriharsha-MSFT 2020-07-29 14:24:50 UTC
Created attachment 1702826 [details]
Error Screenshot

This is the error screenshot

Comment 32 Sriharsha-MSFT 2020-07-29 14:26:36 UTC
Created attachment 1702828 [details]
Kickstart modified as per azure requirement

Comment 33 Javier Martinez Canillas 2020-07-29 14:47:32 UTC
(In reply to Sriharsha-MSFT from comment #31)
> Created attachment 1702826 [details]
> Error Screenshot
> 
> This is the error screenshot

Did you get this error even when installing the test grub2 builds I shared (i.e: using the inst.addrepo options mentioned in Comment 29) ?

Comment 34 Javier Martinez Canillas 2020-07-30 07:56:25 UTC
Sriharsha,

Are these fixes needed only for 8.3 or also for older RHEL 8 releases?

Comment 35 Sriharsha-MSFT 2020-07-30 08:03:07 UTC
Yes. In Either of the cases, I received the same error.
However, I might have to dig a little deep. 
We are using packer here and Hyper-V as our Hypervisor
Hence, we are using wget to download in the repo to /etc/yum.repos.d and then performing
"dnf -y install grub2-install" in our post provisioning step.
Is this something that you would support ?

Do note that we are not using kvm here at all since Azure's base Hypervisor is on Hyper-V

We need the fix for all the 8.x images since these errors are observed in 8.x images including 8.0 as well

Comment 36 Javier Martinez Canillas 2020-07-30 08:11:55 UTC
(In reply to Sriharsha-MSFT from comment #35)
> Yes. In Either of the cases, I received the same error.
> However, I might have to dig a little deep. 
> We are using packer here and Hyper-V as our Hypervisor
> Hence, we are using wget to download in the repo to /etc/yum.repos.d and
> then performing
> "dnf -y install grub2-install" in our post provisioning step.
> Is this something that you would support ?
>

I see. In that case I think you will also need to run:

grub2-install --target=i386-pc /dev/sda
 
> Do note that we are not using kvm here at all since Azure's base Hypervisor
> is on Hyper-V
>

Yes I know. I'm testing with kvm because I don't have access to Hyper-v.
 
> We need the fix for all the 8.x images since these errors are observed in
> 8.x images including 8.0 as well

Thanks for the confirmation. Then we should ask for z-stream for this.

Comment 37 Marek Havrila 2020-07-30 08:46:38 UTC
I was able to reproduce this bug using commands from comment 26 and kickstart provided in comment 28. Adding qa_ack.

Comment 43 Mayank Thapliyal 2020-08-07 04:14:02 UTC
Hi, it has been 7 days since the last update where the issue was reproduced. Any update on the new kickstart to be provided to us to fix this?

Comment 46 Rick Barry 2020-08-07 15:47:53 UTC
(In reply to Mayank Thapliyal from comment #43)
> Hi, it has been 7 days since the last update where the issue was reproduced.
> Any update on the new kickstart to be provided to us to fix this?

Hi Mayank, I believe we were waiting for Sriharsha's retest, see Javier's comment 36.

Comment 47 Sriharsha-MSFT 2020-08-09 12:57:47 UTC
(In reply to Rick Barry from comment #46)
> (In reply to Mayank Thapliyal from comment #43)
> > Hi, it has been 7 days since the last update where the issue was reproduced.
> > Any update on the new kickstart to be provided to us to fix this?
> 
> Hi Mayank, I believe we were waiting for Sriharsha's retest, see Javier's
> comment 36.

Hi Rick, The fact that Marek Havrila was already able to reproduce this issue (See comment 37), I did not dig deeper.
Sure, I can go ahead and provide my testing results

Comment 48 Esteban Flores 2020-08-10 16:34:10 UTC
Quick thing, through this conversation, grubby has been recommended as a tool to edit GRUB parameters. The documentation contradicts this recommendation in favor of grub2-editenv:
https://access.redhat.com/solutions/3710121
Going forward, we need a streamlined process for updating GRUB parameters (such as serial console settings). With RHEL8 grubby, grub2-mkconfig -o and others ignore parameters defined in /etc/default.

Comment 50 Javier Martinez Canillas 2020-08-11 12:43:33 UTC
(In reply to Esteban Flores from comment #48)
> Quick thing, through this conversation, grubby has been recommended as a
> tool to edit GRUB parameters. The documentation contradicts this
> recommendation in favor of grub2-editenv:
> https://access.redhat.com/solutions/3710121
> Going forward, we need a streamlined process for updating GRUB parameters
> (such as serial console settings). With RHEL8 grubby, grub2-mkconfig -o and
> others ignore parameters defined in /etc/default.

The grub2-mkconfig tool not honoring the GRUB_CMDLINE_LINUX in /etc/default/grub was a bug in RHEL 8.0 that was fixed (see Bug #1637875).

That's why the mentioned document also lists grub2-editenv, but as Vitaly said grubby is the preferred approach to modify the cmdline.

Comment 59 Javier Martinez Canillas 2020-08-17 09:59:24 UTC
Created attachment 1711579 [details]
rhel-8.3-kvm-x86_64.ks

Comment 62 Pankaj Basnal 2020-08-20 13:57:14 UTC
Created attachment 1712020 [details]
Screen shot of grub menu on hyper-v

we tried the provided ks file but it failed to boot on hyper-v

Comment 63 Pankaj Basnal 2020-08-20 13:59:17 UTC
Created attachment 1712021 [details]
Grub2 scratch repo sync error

the repo mentioned "https://javierm.fedorapeople.org/rhbz1850193/grub2-2.02-85.el8-scratch.repo" gave error of cache synchronization.

Comment 69 HuijingHei 2020-08-25 04:29:12 UTC
Hi Mohammed, this is test result on Hyper-V:

1) Install gen2 vm with RHEL-8.3.0-20200811.0 compose and kickstart(replace vda to sda) on Hyper-V, after vm start, upgrade grub2 to grub2-2.02-89.el8.noarch and exec '/usr/sbin/grub2-mkconfig -o /etc/grub2.cfg'. Then start the same disk with gen1, vm failed to start(result is the same as comment #62)

2) Install gen2 vm with nightly RHEL-8.3.0-20200824.n.0 compose and kickstart, start the same disk with gen1 successfully

Comment 70 Yuxin Sun 2020-08-26 06:58:49 UTC
Hi Mohammed,

I built the image with KVM in UEFI mode, convert it to .vhd and upload it to Azure. Can boot up both gen1 and gen2 VMs successfully. 
Base compose: RHEL-8.3.0-20200811.0(grub2 is 2.02-84.el8)
Kickstart file: Use the ks file in attachment 1711579 [details] and added Azure needed configurations.

Other test cases:
* Before the test, update grub2 related packages to 2.02-88.el8 version.
In Gen1/Gen2 VM:
1) Install new kernel and reboot, can switch to new kernel
2) Use grubby to switch back to the old kernel. Can boot up with the old kernel.
3) Set GRUB_DEFAULT=0 in /etc/default/grub and grub2-mkconfig -o /boot/grub2/grub.cfg(Gen1) or /boot/efi/EFI/redhat/grub.cfg(Gen2). Can boot up with the new kernel.

Thanks!

Comment 71 Javier Martinez Canillas 2020-08-26 07:08:14 UTC
(In reply to Yuxin Sun from comment #70)

[snip]

> 3) Set GRUB_DEFAULT=0 in /etc/default/grub and grub2-mkconfig -o
> /boot/grub2/grub.cfg(Gen1) or /boot/efi/EFI/redhat/grub.cfg(Gen2). Can boot
> up with the new kernel.
> 

Thanks a lot for testing. But even though re-generating the EFI config
in /boot/efi/EFI/redhat/grub.cfg would work, there shouldn't be a need
to do it since both Gen1 and Gen2 would use /boot/grub2/grub.cfg.

Re-generating /boot/efi/EFI/redhat/grub.cfg will overwrite the stub
config that just loads the one in /boot/grub2/grub.cfg.

Comment 72 Yuxin Sun 2020-08-26 08:18:08 UTC
(In reply to Javier Martinez Canillas from comment #71)
> (In reply to Yuxin Sun from comment #70)
> 
> [snip]
> 
> > 3) Set GRUB_DEFAULT=0 in /etc/default/grub and grub2-mkconfig -o
> > /boot/grub2/grub.cfg(Gen1) or /boot/efi/EFI/redhat/grub.cfg(Gen2). Can boot
> > up with the new kernel.
> > 
> 
> Thanks a lot for testing. But even though re-generating the EFI config
> in /boot/efi/EFI/redhat/grub.cfg would work, there shouldn't be a need
> to do it since both Gen1 and Gen2 would use /boot/grub2/grub.cfg.
> 
> Re-generating /boot/efi/EFI/redhat/grub.cfg will overwrite the stub
> config that just loads the one in /boot/grub2/grub.cfg.

Thanks Javier! So the correct way we use it in the future is to regenerate /boot/grub2/grub.cfg in both gen1 and gen2 VMs right?

Comment 73 Javier Martinez Canillas 2020-08-26 08:52:01 UTC
(In reply to Yuxin Sun from comment #72)

[snip]

> > 
> > Re-generating /boot/efi/EFI/redhat/grub.cfg will overwrite the stub
> > config that just loads the one in /boot/grub2/grub.cfg.
> 
> Thanks Javier! So the correct way we use it in the future is to regenerate
> /boot/grub2/grub.cfg in both gen1 and gen2 VMs right?

That is correct. This bugzilla was filed because re-generating /boot/grub2/grub.cfg
would render the BIOS machine unbootable (because in the previous kickstart the EFI
config file in /boot/efi/EFI/redhat/grub.cfg was used for both Gen1 and Gen2 VMs).

That was a combination of some bugs in GRUB (that were fixed in grub2-2.02-88.el8)
and the fact that /boot/grub2/grub.cfg was a stub config that loaded the one in
/boot/efi/EFI/redhat/grub.cfg.

The new kickstart was changed to instead use the /boot/grub2/grub.cfg config file
for both Gen1 and Gen2, and have the /boot/efi/EFI/redhat/grub.cfg to be the stub
config file that just loaded the other.

This was done because users from Gen1 are used to change options in /etc/default/grub
and running grub2-mkconfig -o /boot/grub2/grub.cfg, so they expect the same to work
on Gen2 VMs.

Now re-generating the stub config in /boot/efi/EFI/redhat/grub.cfg would not render
the machine unbootable anymore, but still is something that shouldn't be done because
then the two config files will get out of sync.

Comment 74 Yuxin Sun 2020-08-26 10:29:12 UTC
Thanks Javier! I tested in Azure Gen2 VM to regenerate /boot/grub2/grub.cfg and it worked well.

Comment 77 Pankaj Basnal 2020-08-28 06:55:44 UTC
Created attachment 1712912 [details]
Image of boot screen of gen1 and gen2 vm from image built with the updated script

Setup that I used for building the image -
Windows Server 2019 Datacenter
Hyper-V Gen2 VM

ISO used - RHEL-8.3.0-20200609.1-x86_64-dvd1.iso
grub2    - 2.02-87.el8_2

I took this section from the post of kickstart file "rhel-8.3-kvm-x86_64.ks" provided by Javier and ran it as a post-build script.

# setup bios boot
grub2-install --target=i386-pc /dev/sda
rm -f /boot/grub2/grubenv
cp -pr /boot/efi/EFI/redhat/grubenv /boot/grub2/
rm -f /boot/efi/EFI/redhat/grubenv
/usr/sbin/grub2-mkconfig -o /etc/grub2.cfg

# setup uefi boot
fs_uuid="$(grub2-probe --target=fs_uuid /etc/grub2.cfg)"
cat << EOF > /etc/grub2-efi.cfg
search --no-floppy --fs-uuid --set=dev $fs_uuid
set prefix=(\$dev)/boot/grub2
export \$prefix
configfile \$prefix/grub2.cfg
EOF
/usr/sbin/parted -s /dev/sda disk_set pmbr_boot off

Once packer had created the VHDX, I attached that to a gen1 VM on the Windows Machine itself. The VM didn't boot.

Attaching the screenshots of Gen1 VM and Gen2 VM which were created from the VHDX.

Comment 78 xuli 2020-08-28 08:13:26 UTC
Hi Pankaj,

I think that you may not use the latest fixed grub2 version grub2-2.02-88.el8.

Could you please try the RHEL 8.3 external snapshot 2 (RHEL-8.3.0-20200825.0) which includes the grub2-2.02-88.el8.  Just checked the ftp server shared to Microsoft, the latest build is RHEL-8.3.0-Snapshot-1.0, maybe still need to wait for more days to get the latest build?

Thank you so much.

Best Regards,
Xuemin

Comment 79 Pankaj Basnal 2020-08-28 10:03:57 UTC
(In reply to xuli from comment #78)
> Hi Pankaj,
> 
> I think that you may not use the latest fixed grub2 version
> grub2-2.02-88.el8.
> 
> Could you please try the RHEL 8.3 external snapshot 2
> (RHEL-8.3.0-20200825.0) which includes the grub2-2.02-88.el8.  Just checked
> the ftp server shared to Microsoft, the latest build is
> RHEL-8.3.0-Snapshot-1.0, maybe still need to wait for more days to get the
> latest build?
> 
> Thank you so much.
> 
> Best Regards,
> Xuemin

Yes, the latest RHEL 8.3 external snapshot 2 isn't available on the MSFT FTP server. But we would like to have a solution for RHEL 8.0/8.1/8.2 as well because we have customers who are using those images. Wouldn't the same ks solution work for these rhel versions?

Comment 80 xuli 2020-08-31 11:37:28 UTC
Hi Pankaj,

Based on comment of Javier's https://bugzilla.redhat.com/show_bug.cgi?id=1850193#c73, and your test result of https://bugzilla.redhat.com/show_bug.cgi?id=1850193#c77, it looks that the same ks solution is necessary, but grub2 package change is also needed.

Comment of #c73 : "That was a combination of some bugs in GRUB (that were fixed in grub2-2.02-88.el8)

Also we test on Hyper-V RHEL 8.2 GA version, if only update the kickstart file from https://bugzilla.redhat.com/attachment.cgi?id=1711579, install gen 2 vm, cannot boot up gen1 vm directly with the same vhdx. If beside updating the kickstart file, also update grub2 related packages to grub2-2.02-88 in gen2 vm, then execute "/usr/sbin/grub2-mkconfig -o /etc/grub2.cfg ; grub2-install --target=i386-pc /dev/sda", it could boot up gen 1 vm for RHEL 8.2 with the same vhdx. 

So maybe still need grub2 z-stream update to fix this issue completely for RHEL 8.0, 8.1, 8.2 images? Could please Javier help to confirm?

Comment 81 Javier Martinez Canillas 2020-08-31 11:43:49 UTC
(In reply to xuli from comment #80)

[snip]

> So maybe still need grub2 z-stream update to fix this issue completely for
> RHEL 8.0, 8.1, 8.2 images? Could please Javier help to confirm?

Yes, that's correct. That's why I set the zstream? flag in Comment 36.

Comment 90 Rick Barry 2020-09-02 17:07:02 UTC
Pankaj, sorry we had networking troubles in today's call. Mayank will reschedule a follow-up,
but in the meantime we want to make sure you are testing with the correct combination of RHEL
and grub2. 

RHEL 8.3 external snapshot 3 should be available to Microsoft in the next few days. Snap 3
contains grub2-2.02-88.el8 which is what you will need.

Please let us know when you've had a chance to test with snap 3. 

Updated grub2 packages for RHEL 8.0, 8.1 and 8.2 are planned, but RHEL 8.3 snap 2 will be
available first.

Comment 91 xuli 2020-09-03 02:12:47 UTC
Hi Pankaj, Rick,

Just check the MSFT FTP server, RHEL 8.3 snap 2 is available and contains grub2-2.02-88.el8 package.

Thank you so much.

Best Regards,
Xuemin

Comment 94 Rick Barry 2020-09-03 14:48:13 UTC
Pankaj, please note a correction to comment 91:

> RHEL 8.3 external snapshot 3 should be available to Microsoft in the next
> few days. Snap 3
> contains grub2-2.02-88.el8 which is what you will need.
> 
> Please let us know when you've had a chance to test with snap 3. 

Replace all references to snap 3 above with "snap 2". So, please try testing with RHEL 8.3 snap 2. As Xuemin says in comment 91, snap 2 is available now.

Sorry for the confusion.

Comment 95 Jon Sturgis 2020-09-08 15:32:49 UTC
I have previously spoken/confirmed with MSFT and RHT teams. This will need to be backported to 8.1, 8.2 and so on. 

This does not need to be backported to 8.0.

Comment 96 Javier Martinez Canillas 2020-09-08 15:39:50 UTC
(In reply to Jon Sturgis from comment #95)
> I have previously spoken/confirmed with MSFT and RHT teams. This will need
> to be backported to 8.1, 8.2 and so on. 
> 
> This does not need to be backported to 8.0.

Thanks for the confirmation. I misunderstood that was also needed for 8.0.

Comment 97 HuijingHei 2020-09-17 09:48:42 UTC
Hi,

Test with RHEL-8.3.0-20200909.1-x86_64 compose and kickstart(attachment 1711579 [details]) on Hyper-V, here is the test result:

1) Both UEFI and BIOS boot work
2) To switch to different kernel using 'grub2-set-default'/'grub2-reboot' tools work as expected with both BIOS and UEFI booted guests.
3) Re-generated '/etc/grub2.cfg' config boots and works for both BIOS and UEFI.
4) According to comment #73, on UEFI vm, if regenerate '/etc/grub2-efi.cfg' with 'grub2-mkconfig -o /etc/grub2-efi.cfg', 'grub2-set-default'/'grub2-reboot' tools will not work.


For 4), as workaround, to switch to different kernel, can set 'GRUB_DEFAULT=$kernel_index' in /etc/default/grub, then 'grub2-mkconfig -o /etc/grub2-efi.cfg'. But I am not sure where to get the kernel_index, seems it is different with 'grubby --info /boot/vmlinuz-xx', could you help to check?

For example, there are 2 kernel versions on the vm, and get index of 4.18.0-235.el8.x86_64 with grubby is 0, but in /etc/default/grub it should be 'GRUB_DEFAULT=1'

]# grubby --info /boot/vmlinuz-4.18.0-235.el8.x86_64  | grep index
index=0

]# cat /etc/default/grub | grep -i default
GRUB_DEFAULT=1

Another question, do we need to add document about 3) and 4)? Thanks!

BR
hhei

Comment 98 Pankaj Basnal 2020-09-20 14:25:06 UTC
Created attachment 1715463 [details]
KS file used to build the RHEL8 .3 VHD

This is the ks file that I used to build the image. It only contains the basic configuration for installation and bios boot setup in post section. To have a minimum ks with which we can reproduce the issue. This image build using this ks is bootable on gen1 but not on gen2 VM.

Comment 99 Pankaj Basnal 2020-09-20 14:29:57 UTC
Created attachment 1715464 [details]
Packer file used to build the image

Attaching the packer file which we used to build the image. I used the RHEL8.3 Snapshot2 as the base ISO. It uses a Gen2 VM on Hyper-V to build the image. Let me know if this helps to reproduce the issue. If not, then we can have another call so that I can get clearer understanding of the setup used to successfully build image for both gen1 and gen2.

Comment 100 xuli 2020-09-21 09:10:54 UTC
Created attachment 1715500 [details]
Iso file created from ks file used to build the RHEL8 .3 VHD

Comment 101 xuli 2020-09-21 09:12:44 UTC
Created attachment 1715506 [details]
Simplified packer file used to build the image

Comment 102 xuli 2020-09-21 09:28:53 UTC
Hi Pankaj,
 
Both Huijing and me use your kickstart file in attachment 1715463 [details], we try to install Hyper-V gen2 vm firstly, then power off and start gen1 vm with the same vhdx. Both gen2 and gen1 vm can boot up successfully. To make the test result more clear, add more detailed steps here about installation and boot up.
 
Could you please try again to create a kickstart iso file method by following steps,then use two iso files to install Gen2 vm? If it works, then maybe some script issue exists in the packer.jason or postinstall.sh, or other config issue?

Details:
ISO file: RHEL 8.3 Snapshot 2 with kernel 4.18.0-235.el8.x86_64.
Kickstart file: attachment 1715463 [details]
Hyper-V Host: 2019

Steps:

1) Create a kickstart iso file with ks.cfg on linux vm, also share iso as attachment 1715500 [details]

a. Change the name rhel8-updated-RAW.ks to ks.cfg
b. Execute mkisofs file in the RHEL 8 vm
#mkisofs -V "OEMDRV" -o "ks8.iso" "/root/ks.cfg"

2) Copy RHEL-8.3-20200825.iso and ks8.iso created to the Hyper-V 2019 host. Create a new gen2 vm, disable secure boot, and attach ks8.iso and RHEL-8.3-20200825.iso.

3) Start vm and the new gen2 vm can be installed successfully.

4) Copy the newly created image vhdx file as a new file, e.g. vm-gen1.vhdx, create a new gen1 vm based on existing vm-gen1.vhdx. View that this gen1 vm can boot up normally.

Meanwhile, I also use your "Packer file" as template to create a gen2 vm based on RHEL-8.3.0-20200909.1 iso, it also works when using the local Packer file which simplifies from your attachment 1715464 [details]. Please refer to  attachment 1715506 [details].

Have made some changes:
1) removes some lines, e.g. postinstall.sh in the provisioners part
2) remove iso_checksum_type ( Deprecated configuration key: 'iso_checksum_type' when run .\packer.exe validate packer.json.)
3) Set static value for iso_url, checksum, ks file path.

PS > .\packer.exe build packer.json
hyperv-iso: output will be in this color.

==> hyperv-iso: Creating build directory...
==> hyperv-iso: Retrieving ISO
==> hyperv-iso: Trying C:\Users\xuli\RHEL-8.3.0-20200909.1-x86_64-dvd1.iso
==> hyperv-iso: Trying file://C:/Users/xuli/RHEL-8.3.0-20200909.1-x86_64-dvd1.iso?checksum=sha256%3A3d9d3e4c7fa863b5febc8b4a67cec538590ff5a91a18a1d0c0838de18591e543
==> hyperv-iso: file://C:/Users/xuli/RHEL-8.3.0-20200909.1-x86_64-dvd1.iso?checksum=sha256%3A3d9d3e4c7fa863b5febc8b4a67cec538590ff5a91a18a1d0c0838de18591e543 => C:/Users/xuli/RHEL-8.3.0-20200909.1-x86_64-dvd1.iso
==> hyperv-iso: Starting HTTP server on port 8582
==> hyperv-iso: Creating switch 'Intel-40G-test' if required...
==> hyperv-iso:     switch 'Intel-40G-test' already exists. Will not delete on cleanup...
==> hyperv-iso: Creating virtual machine...
==> hyperv-iso: Enabling Integration Service...
==> hyperv-iso: Setting boot drive to os dvd drive C:/Users/xuli/RHEL-8.3.0-20200909.1-x86_64-dvd1.iso ...
==> hyperv-iso: Mounting os dvd drive C:/Users/xuli/RHEL-8.3.0-20200909.1-x86_64-dvd1.iso ...
==> hyperv-iso: Skipping mounting Integration Services Setup Disk...
==> hyperv-iso: Mounting secondary DVD images...
==> hyperv-iso: Configuring vlan...
==> hyperv-iso: Determine Host IP for HyperV machine...
==> hyperv-iso: Host IP for the HyperV machine: 192.168.10.1
==> hyperv-iso: Attempting to connect with vmconnect...
==> hyperv-iso: Starting the virtual machine...
==> hyperv-iso: Waiting 10s for boot...
==> hyperv-iso: Typing the boot command...
==> hyperv-iso: Waiting for SSH to become available...
==> hyperv-iso: Connected to SSH!
==> hyperv-iso: Provisioning with shell script: C:\Users\xuli\AppData\Local\Temp\packer-shell667341827
==> hyperv-iso: Downloading /var/run/packages.lst => packages.lst
==> hyperv-iso: Gracefully halting virtual machine...
==> hyperv-iso: Waiting for vm to be powered down...
==> hyperv-iso: Unmount/delete secondary dvd drives...
==> hyperv-iso: Unmount/delete Integration Services dvd drive...
==> hyperv-iso: Unmount/delete os dvd drive...
==> hyperv-iso: Delete os dvd drives controller 0 location 1 ...
==> hyperv-iso: Compacting disks...
    hyperv-iso: Compacting disk: default2.vhdx
    hyperv-iso: Disk size is unchanged
==> hyperv-iso: Exporting virtual machine...
==> hyperv-iso: Collating build artifacts...
==> hyperv-iso: Disconnecting from vmconnect...
==> hyperv-iso: Unregistering and deleting virtual machine...
==> hyperv-iso: Deleting build directory...
Build 'hyperv-iso' finished after 10 minutes 53 seconds.

Thank you so much.
Best Regards,
Xuemin

Comment 103 Petr Janda 2020-09-24 22:06:02 UTC
According my testing in KVM with RHEL-8.3.0-20200909.1 (snap 3) and comment 97 bug in grub is fixed. I believe I can switch this to verified.

Comment 104 Pankaj Basnal 2020-09-27 17:57:26 UTC
So far, in our testing secure boot was enabled while creating the gen2 VM with the built vhdx. In my latest test with RHEL-8.3.0-20200909.1, after disabling the secure boot, it worked. Testing on Azure to see if it'll work on Azure or not.

Comment 106 Mayank Thapliyal 2020-10-06 07:00:46 UTC
Hi,
Are these fixes backported to work on 8.1 and 8.2 as well? Please provide an update on thesame.

Comment 107 Petr Janda 2020-10-06 12:35:57 UTC
Hello there are bug 1874469 for 8.1 and bug 1874470 for 8.2 to backport this fix, but we don't have a builds yet.

Comment 108 Rick Barry 2020-10-06 18:12:48 UTC
(In reply to Pankaj Basnal from comment #104)
> So far, in our testing secure boot was enabled while creating the gen2 VM
> with the built vhdx. In my latest test with RHEL-8.3.0-20200909.1, after
> disabling the secure boot, it worked. Testing on Azure to see if it'll work
> on Azure or not.

Pankaj, do you have an update on how testing is going on Azure?

Comment 109 Javier Martinez Canillas 2020-10-07 08:58:29 UTC
(In reply to Petr Janda from comment #107)
> Hello there are bug 1874469 for 8.1 and bug 1874470 for 8.2 to backport this
> fix, but we don't have a builds yet.

We now have builds and erratas for these two as well. So should be released in
the next z-stream batches.

Comment 111 Pankaj Basnal 2020-10-13 11:19:03 UTC
Created attachment 1721171 [details]
KickstartFile for LVM partition

The solution worked for RAW partitioned image but it failed for the image with LVM partitions. I have attached the kickstart file used for building the image. 

Error after attaching the vhdx to a gen2 machine with secure boot disabled - System BootOrder not found. Initializing defailts.
Creating boot entry "Boot0002" with label "Red Hat Enterprise Linux" for file "\EFI\redhat\shimx64.efi".

Apologies for the delay. I got caught up in other priority work items.

Comment 112 Javier Martinez Canillas 2020-10-13 12:28:03 UTC
(In reply to Pankaj Basnal from comment #111)
> Created attachment 1721171 [details]
> KickstartFile for LVM partition
> 
> The solution worked for RAW partitioned image but it failed for the image
> with LVM partitions. I have attached the kickstart file used for building
> the image. 
>

Can you please share a serial console log or screenshot of all the output?
 
> Error after attaching the vhdx to a gen2 machine with secure boot disabled -
> System BootOrder not found. Initializing defailts.
> Creating boot entry "Boot0002" with label "Red Hat Enterprise Linux" for
> file "\EFI\redhat\shimx64.efi".
>

This is saying there wasn't an EFI variable with a boot entry for the ESP to
boot nor a BootOrder variable and shim will attempt to create boot entries
using the data contained in the CSV file at \EFI\redhat\BOOTX64.CSV.
 
I assume that message was only present the first time you booted the VM and
not in the following boots?

Comment 115 Pankaj Basnal 2020-10-15 08:31:11 UTC
Created attachment 1721777 [details]
Result of booting the LVM image with grub fix

I don't have the serial console log output, as per my knowledge, hyper-v doesn't pass that to packer during installation. We do get logs when vm is running and packer runs scripts on the vm. 
We can have a call to further debug the problem.

The attached image contains 2 images. 1st image is the result when we boot the image for the first time. 2nd image is the result of second boot.

The same packer and hyper-v configuration was used to build the image.

Comment 116 Javier Martinez Canillas 2020-10-15 15:25:44 UTC
Can you please share a screenshot when executing the following fro the GRUB prompt:

grub> set

I'm particularly interested in the fw_path and prefix variables. You could also get
the output of these two with:

grub> echo $fw_path
grub> echo $prefix

Comment 117 Pankaj Basnal 2020-10-19 06:30:56 UTC
Created attachment 1722568 [details]
Requested output of grub> set command

Attached the output of grub> set command and also highlighted the specific values of interest

Comment 118 Javier Martinez Canillas 2020-10-19 07:19:08 UTC
The $prefix variable seems to be set correctly to (lvm/rootvg-rootlv)/boot/grub2

Does listing the files in the $prefix work? i.e:

grub> ls (lvm/rootvg-rootlv)/boot/grub2

I assume the grub.cfg config file is present in that directory.

Also, loading the config file work?

grub> configfile $prefix/grub.cfg

If not, could you please do it after setting the debug to all, i.e:

grub> set debug=all

Comment 119 xuli 2020-10-19 07:40:03 UTC
Hi Javier,

We can reproduce the issue locally when using the LVM kickstart file https://bugzilla.redhat.com/attachment.cgi?id=1721171, have sent you an email about how to access the Hyper-V VM.

I just tried the command
grub> ls (lvm/rootvg-rootlv)/boot/grub2

error: ../../grub-core/fs/fshelp.c:258: `boot/grub2' not found

Thank you so much.
Best Regards,
Xuemin

Comment 120 xuli 2020-10-19 07:45:41 UTC
typo update: error: ../../grub-core/fs/fshelp.c:258:file `/boot/grub2' not found.

Comment 121 Javier Martinez Canillas 2020-10-19 08:45:31 UTC
Thanks Xuemin for providing me access to a machine where I could reproduce this issue.

The problem is that the kickstart file rhel8-LVM-dual-test.ks uses a different partition
layout than the previous kickstarts. It creates a boot partition while the others didn't
have one and /boot was just a sub-directory in the root partition.

So (lvm/rootvg-rootlv)/boot/grub2 doesn't exist as pointed out by Xuemin. If you want to
use a boot partition, then the grub.cfg config file in the ESP needs to be modified, i.e:

# setup uefi boot
fs_uuid="$(grub2-probe --target=fs_uuid /boot/grub2)"
cat << EOF > /etc/grub2-efi.cfg
search --no-floppy --fs-uuid --set=dev $fs_uuid
set prefix=(\$dev)/grub2
export \$prefix
configfile \$prefix/grub.cfg
EOF
/usr/sbin/parted -s /dev/vda disk_set pmbr_boot off


I also noticed that the grub2-pc package is not included in the %packages section.

Comment 122 Javier Martinez Canillas 2020-10-19 08:52:56 UTC
Created attachment 1722591 [details]
rhel8-LVM-dual-test-v2.ks

Attached a v2 of the rhel8-LVM-dual-test.ks file. I still couldn't test this
but contains the changes I think were missing to support an image that has
the different partition layout with a boot partition.

Comment 123 xuli 2020-10-19 09:35:24 UTC
Hi Javier,

So great, just do a quick test for new kickstart file rhel8-LVM-dual-test-v2.ks with latest RHEL 8.3 compose build on Hyper-V, it can boot up the gen2 VM (disable secure boot), also the image can boot up for gen1 vm. We need Pankaj's help to do final confirmation.

Thank you so much.
Best Regards,
Xuemin

Comment 124 Javier Martinez Canillas 2020-10-19 09:57:08 UTC
Thanks a lot for testing Xuemin.

Comment 125 Pankaj Basnal 2020-10-19 13:19:43 UTC
Thanks Xuemin and Javier for the quick fix and testing. 
I tested the provided KS file and it worked for both gen2 and gen1 vm. I still have a few different configurations to tests and I also want to check if having /boot as a subdirectory of root will cause any problem with the VM or not. Will update on my findings.

Comment 127 errata-xmlrpc 2020-11-04 01:53:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (grub2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4513


Note You need to log in before you can comment on or make changes to this bug.