Bug 1570991

Summary: [RFE][RHEL 7.6]Azure: Support Gen1 and Gen2 Hyper-V VMs with single VHD image
Product: Red Hat Enterprise Linux 7 Reporter: Stephen A. Zarkos <stephen.zarkos>
Component: grub2Assignee: Peter Jones <pjones>
Status: CLOSED NOTABUG QA Contact: Release Test Team <release-test-team-automation>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.6CC: abgopal, ailan, alsin, borisb, boyang, cavery, cchouhan, dgilbert, guybo, hhei, jenander, jjarvis, jsuchane, leiwang, lersek, mikelley, pjones, prarao, rharwood, ribarry, sribs, stephen.zarkos, vkuznets, wshi, xiaofwan, xuli, yacao, yujiang, yuxisun
Target Milestone: rcKeywords: FutureFeature, TestOnly
Target Release: ---Flags: alsin: needinfo-
cchouhan: needinfo-
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1714900 (view as bug list) Environment:
Last Closed: 2019-12-16 19:25:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1714900    

Description Stephen A. Zarkos 2018-04-23 22:36:00 UTC
Description of problem:

Feature request: Azure: Support Gen1 and Gen2 Hyper-V VMs with single VHD image

We will start to see the need for Gen2 VMs on Azure (currently on Gen1 are supported). The first requirement will be to support Azure Confidential Computing team, which is a service where customers can utilize Intel's SGX support on Azure VMs.

Using the Intel SGX extensions under Hyper-V requires a Gen-2 VM.  One key difference between Gen1 and Gen2 VMs is that Gen1 utilizes legacy BIOS boot, whereas Gen-2 uses UEFI. From an Azure image perspective we would like to have a single VM image that can boot as either a Gen1 or Gen2 VM (BIOS or UEFI).

Test Case:

A BIOS+UEFI enabled image can be tested on-prem with Hyper-V on Windows Server 2016 (or also in Azure with nested virtualization).

 1) For Azure the image format should be VHD, not VHDX
 2) Create two VMs in Hyper-V, one as Gen1 and the other as Gen2, but don't attach any disks.
 3) Attach the VHD to the Gen1 VM (must be IDE disk to boot)
 4) Boot the Gen1 VM and ensure if boots fully to a login prompt
 5) Shut down the Gen 1 VM
 6) Repeat steps 3-5 for the Gen2 VM. It's OK to use the same VHD for both as long as only one VM is booted at a time.

Additional tests should be considered, such as upgrading the kernel and ensure the VM still boots into the new kernel on both the Gen1 and Gen2 VM. This will test that the kernel config is updated for both BIOS and UEFI.

Comment 2 Vitaly Kuznetsov 2018-04-24 11:14:25 UTC
This is likely to become a TestOnly bug. On the meeting we had Peter mentioned grub2-efi + syslinux conjunction as a possible solution. I was going to try it myself as well as grub2-efi + grub2-pc and then we can decide what's better.

Comment 3 Vitaly Kuznetsov 2018-05-03 15:24:08 UTC
I was playing with grub2-efi + syslinux setup and I came up with the following instruction how to make this work:

1) Install RHEL7 on Hyper-V Gen2 VM, make /boot partition separate and use 'ext4' as its filesystem.

2) yum -y install syslinux-extlinux gdisk

3) extlinux -i /boot/extlinux/

4) mkdir /boot/syslinux
   rm -f /etc/extlinux.conf
   ln -s /boot/syslinux/extlinux.conf /etc/extlinux.conf

5) Create default config. The format is important as grubby needs to be able to parse it on kernel update. The following seems to work:

cat /etc/extlinux.conf:

timeout 5
default 'Red Hat Enterprise Linux Server (3.10.0-862.el7.x86_64) 7.5 (Maipo)'

label 'Red Hat Enterprise Linux Server (3.10.0-862.el7.x86_64) 7.5 (Maipo)'
        kernel /vmlinuz-3.10.0-862.el7.x86_64
        initrd /initramfs-3.10.0-862.el7.x86_64.img
        append root=UUID=5c992c43-85da-4ba3-8db3-f155f6dd8cd6 ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8

I'm not exactly sure why step 4 is needed but for some reason syslinux can't find config in the default /boot/extlinux/extlinux.conf location.

Peter, Laszlo (you're the boot experts :-) does this all make sense to you? Or do you see a better/easier way to achieve the same?

Comment 4 Laszlo Ersek 2018-05-03 17:00:14 UTC
Hi Vitaly,

first I had to read up on a bunch of references to even understand the question :) So, IIUC, we need
- either a bootloader (*one* boot loader) that is bootable in both BIOS and
  UEFI mode, whose config file can be upgraded by grubby at kernel updates,
- or a set of two boot loaders that can peacefully coexist, one bootable in
  BIOS mode, another bootable in UEFI mode, such that grubby can update
  the config files for both, in parallel, when the kernel is updated.

I would have thought that "grub2" is good for this (with the grub2-pc and grub2-efi-x64 packages, and their config files at /etc/grub2.cfg and /etc/grub2-efi.cfg, respectively).

Alternatively, it seems to me that (upstream) syslinux can be built for both BIOS and EFI (but I'm not sure if syslinux/EFI is available in RHEL-7 -- I think not?)

A mixture of two boot loaders (grub2-efi for EFI, syslinux for BIOS) doesn't seem very robust; for example, the grubby(8) manual calls the extlinux config file format "deprecated".

Anyway, I don't have any practical suggestion; just a vague theoretical one -- if you can get it to work with just grub2, and just one config file *format* (for both BIOS and UEFI), I'd recommend that. For example, I doubt that kernel command line options placed in "/etc/default/grub" will take effect for syslinux too. (I could be wrong, of course.) If only the grub2-efi + syslinux combo works, then sure, we don't have a choice.

Sorry, that's all I can say here. :)

Comment 5 Vitaly Kuznetsov 2018-05-04 09:32:00 UTC
(In reply to Laszlo Ersek from comment #4)
> Hi Vitaly,
> 
> first I had to read up on a bunch of references to even understand the
> question :) So, IIUC, we need
> - either a bootloader (*one* boot loader) that is bootable in both BIOS and
>   UEFI mode, whose config file can be upgraded by grubby at kernel updates,
> - or a set of two boot loaders that can peacefully coexist, one bootable in
>   BIOS mode, another bootable in UEFI mode, such that grubby can update
>   the config files for both, in parallel, when the kernel is updated.
> 
> I would have thought that "grub2" is good for this (with the grub2-pc and
> grub2-efi-x64 packages, and their config files at /etc/grub2.cfg and
> /etc/grub2-efi.cfg, respectively).
> 

Thank you Laszlo,

I was going to try grub2-pc + grub2-efi too. It seems we'll need a special partition for mbr+gpt to work (at least according to https://wiki.gentoo.org/wiki/GRUB2#BIOS_with_GPT), not sure it can be created from our installed though.

Comment 6 Vitaly Kuznetsov 2018-05-11 08:25:50 UTC
So I tried grub2-pc + grub2-efi and it worked too.

The install sequence is:

1) Install RHEL7 on Gen2 VM, make sure to leave some space on the hard drive, in my test I freed 4Mb and it was enough.

2) After setup finishes install 'gdisk' package.

3) With gdisk package create new partition (4Mb) and change its type to 0xEF02 (BIOS boot partition)

4) make sure you have grub2-pc and grub2-pc-modules packages

5) run 'grub2-install -d /usr/lib/grub/i386-pc/ /dev/sda'

6) run 'cat /etc/grub2-efi.cfg | sed -e s,linuxefi,linux, -e s,initrdefi,initrd, > /boot/grub2/grub.cfg'

I also tried kernel update and grubby seems to be able to update both configs simultaneously, no issues noticed. This may come handy if someone decides to detach the hard drive and attach it to a VM using different boot method (so effectively changing VM's generation).

In my opinion this is less hackish than grub2-efi + syslinux so I'd recommend this sequence instead.

Comment 7 HuijingHei 2018-05-14 09:02:02 UTC
Tried according to comment #6 on WS2016, both gen1 and gen2 vm can boot with the hard drive after kernel update.

Comment 11 Rick Barry 2018-07-12 19:15:50 UTC
Note: Test-only with updated Azure image build from Microsoft.

Comment 19 Rick Barry 2018-11-27 14:09:18 UTC
Hi Stephen,

We'll need a MSFT-supplied RHEL 7.6 Azure image that has been built according to instructions above (see comment 6) for our QE to complete testing.

Do you know when such an image will be available?

Comment 22 Yuxin Sun 2019-04-19 02:06:40 UTC
Hi Stephen,

Is there any update about such on-demand image in Azure? Thanks!

Comment 23 Stephen A. Zarkos 2019-04-19 19:18:05 UTC
Hi,

I'll need to defer to either @Alfred or @Boris as they own the RHEL image builds.

Thanks!
Steve

Comment 24 Alfred Sin 2019-04-19 19:41:39 UTC
We are still working on these, and I'll provide updates once we're done.

Thanks,
Alfred

Comment 25 Yuxin Sun 2019-11-05 09:55:57 UTC
Reference BZ#1667028

Comment 26 Alfred Sin 2019-11-20 16:41:42 UTC
Hi all - we're having issues with creating a RHEL 8.x VHD that can boot both gen 1 and gen 2. Is there any guidance that you guys can provide?

Comment 29 Stephen A. Zarkos 2019-11-21 00:17:00 UTC
FWIW I made a kickstart for the CentOS folks here: https://github.com/CentOS/sig-cloud-instance-build/pull/165

The KS assumes you are installing on a UEFI enabled system (using virt-install I just added "--boot uefi" as a parameter), and then BIOS boot is configured in postinst. Any feedback from Red Hat on this approach would be appreciated.

Comment 30 xuli 2019-11-21 07:22:56 UTC
Hi Alfred,

For RHEL 8 support Gen1 and Gen2 Hyper-V VMs with single VHD image, refer to https://bugzilla.redhat.com/show_bug.cgi?id=1714900#c3

I copied steps here provided by yuxin for your reference.

1. Create 4 partitions
[root@localhost ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0     11:0    1 1024M  0 rom  
vda    253:0    0   10G  0 disk
├─vda1 253:1    0  476M  0 part /boot/efi
├─vda2 253:2    0  488M  0 part /boot
├─vda3 253:3    0   10M  0 part
└─vda4 253:4    0    9G  0 part /
2. Install the image through UEFI
3. sgdisk /dev/vda -t 3:0xEF0
4. grub2-install -d /usr/lib/grub/i386-pc/ /dev/vda
5. Keep the GRUB_ENABLE_BLSCFG=true in /etc/default/grub. Run grub2-mkconfig -o /boot/grub2/grub.cfg (or just cp /boot/efi/EFI/redhat/grub.cfg /boot/grub2/grub.cfg)
6. Modify /boot/grub2/grub.cfg.
   1). Find the "search --no-floppy --fs-uuid --set=boot" lines, replace the uuid with /dev/vda2 uuid
   2). Remove the lines between
   ### BEGIN /etc/grub.d/30_uefi-firmware ###
... and ...
   ### END /etc/grub.d/30_uefi-firmware ###
7. cp /boot/efi/EFI/redhat/grubenv /boot/grub2/grubenv

Comment 32 Vitaly Kuznetsov 2019-11-21 12:31:08 UTC
(In reply to Stephen A. Zarkos from comment #29)
> FWIW I made a kickstart for the CentOS folks here:
> https://github.com/CentOS/sig-cloud-instance-build/pull/165
> 
> The KS assumes you are installing on a UEFI enabled system (using
> virt-install I just added "--boot uefi" as a parameter), and then BIOS boot
> is configured in postinst. Any feedback from Red Hat on this approach would
> be appreciated.

Thanks a lot! I modified it just a tiny bit to make it work for RHEL8:
http://people.redhat.com/~vkuznets/RHEL8-Azure.ks

I checked that it creates a hybird VM with the following virt-install command line:
# virt-install --virt-type kvm --os-variant rhel8.0 --arch x86_64 --boot uefi --name rhel8-azure-test --memory 4096 --disk bus=scsi,size=8 --nographics --initrd-inject=./RHEL8-Azure.ks --extra-args "console=ttyS0 ks=file:/RHEL8-Azure.ks" --location /var/lib/libvirt/images/RHEL-8.1.0-20191015.0-x86_64-dvd1.iso --network bridge=br0

The only missing piece for Alfred is injecting RHUI client I suppose but it should be
solvable the exact same way as for Gen1 images. I'm going to try the generated image
on Azure and if it works I'll drop you an email so we can continue the discussion.

Thanks again!

Comment 33 Alfred Sin 2019-11-21 16:56:04 UTC
Thank you Vitaly! We'll give it a shot and I'll follow up if there are any questions.

Comment 34 HuijingHei 2019-11-22 08:59:34 UTC
(In reply to Vitaly Kuznetsov from comment #32)
> (In reply to Stephen A. Zarkos from comment #29)
> > FWIW I made a kickstart for the CentOS folks here:
> > https://github.com/CentOS/sig-cloud-instance-build/pull/165
> > 
> > The KS assumes you are installing on a UEFI enabled system (using
> > virt-install I just added "--boot uefi" as a parameter), and then BIOS boot
> > is configured in postinst. Any feedback from Red Hat on this approach would
> > be appreciated.
> 
> Thanks a lot! I modified it just a tiny bit to make it work for RHEL8:
> http://people.redhat.com/~vkuznets/RHEL8-Azure.ks
> 
> I checked that it creates a hybird VM with the following virt-install
> command line:
> # virt-install --virt-type kvm --os-variant rhel8.0 --arch x86_64 --boot
> uefi --name rhel8-azure-test --memory 4096 --disk bus=scsi,size=8
> --nographics --initrd-inject=./RHEL8-Azure.ks --extra-args "console=ttyS0
> ks=file:/RHEL8-Azure.ks" --location
> /var/lib/libvirt/images/RHEL-8.1.0-20191015.0-x86_64-dvd1.iso --network
> bridge=br0


Hi Vitaly,

Create the qcow2 image and convert to vhd, test on Azure, both gen1 and gen2 vm can start successfully.
# qemu-img convert -f qcow2 -o subformat=fixed,force_size -O vpc rhel8-azure-test.qcow2 rhel8-azure-test.vhd
 
Discuss with Yuxin, there are 2 small things need to concern:

1) If load vm with BIOS, there is "System setup" on the grub, which should be in vm with EFI firmware. Because in /boot/grub2/grub.cfg there is related info, we can remove it in /boot/grub2/grub.cfg
### BEGIN /etc/grub.d/30_uefi-firmware ###
menuentry 'System setup' $menuentry_id_option 'uefi-firmware' {
	fwsetup
}
### END /etc/grub.d/30_uefi-firmware ###


2) If load vm with EFI and upgrade kernel, the default boot kernel will not be updated. Because after install new kernel, the default boot kernel info will be updated in /boot/grub2/grubenv, not in /boot/efi/EFI/redhat/grubenv. The workaround is sync /boot/efi/EFI/redhat/grubenv with /boot/grub2/grubenv

Thanks!

Comment 35 Vitaly Kuznetsov 2019-11-22 09:42:33 UTC
(In reply to HuijingHei from comment #34)
> (In reply to Vitaly Kuznetsov from comment #32)
> > (In reply to Stephen A. Zarkos from comment #29)
> > > FWIW I made a kickstart for the CentOS folks here:
> > > https://github.com/CentOS/sig-cloud-instance-build/pull/165
> > > 
> > > The KS assumes you are installing on a UEFI enabled system (using
> > > virt-install I just added "--boot uefi" as a parameter), and then BIOS boot
> > > is configured in postinst. Any feedback from Red Hat on this approach would
> > > be appreciated.
> > 
> > Thanks a lot! I modified it just a tiny bit to make it work for RHEL8:
> > http://people.redhat.com/~vkuznets/RHEL8-Azure.ks
> > 
> > I checked that it creates a hybird VM with the following virt-install
> > command line:
> > # virt-install --virt-type kvm --os-variant rhel8.0 --arch x86_64 --boot
> > uefi --name rhel8-azure-test --memory 4096 --disk bus=scsi,size=8
> > --nographics --initrd-inject=./RHEL8-Azure.ks --extra-args "console=ttyS0
> > ks=file:/RHEL8-Azure.ks" --location
> > /var/lib/libvirt/images/RHEL-8.1.0-20191015.0-x86_64-dvd1.iso --network
> > bridge=br0
> 
> 
> Hi Vitaly,
> 
> Create the qcow2 image and convert to vhd, test on Azure, both gen1 and gen2
> vm can start successfully.
> # qemu-img convert -f qcow2 -o subformat=fixed,force_size -O vpc
> rhel8-azure-test.qcow2 rhel8-azure-test.vhd
>  
> Discuss with Yuxin, there are 2 small things need to concern:
> 
> 1) If load vm with BIOS, there is "System setup" on the grub, which should
> be in vm with EFI firmware. Because in /boot/grub2/grub.cfg there is related
> info, we can remove it in /boot/grub2/grub.cfg
> ### BEGIN /etc/grub.d/30_uefi-firmware ###
> menuentry 'System setup' $menuentry_id_option 'uefi-firmware' {
> 	fwsetup
> }
> ### END /etc/grub.d/30_uefi-firmware ###

Thank you for testing,

this is a minor issue indeed as on bios booted instances this just won't work (and
nobody is supposed to pick booting entry manually anyways). I'll see if we can easily
remove it.

> 
> 
> 2) If load vm with EFI and upgrade kernel, the default boot kernel will not
> be updated. Because after install new kernel, the default boot kernel info
> will be updated in /boot/grub2/grubenv, not in /boot/efi/EFI/redhat/grubenv.
> The workaround is sync /boot/efi/EFI/redhat/grubenv with /boot/grub2/grubenv

This is somewhat important: kernel upgrade should change the default. Let's see
what's the right solution here (and maybe a temporary workaround).

Comment 36 HuijingHei 2019-11-22 10:02:56 UTC

(In reply to Vitaly Kuznetsov from comment #32)
> (In reply to Stephen A. Zarkos from comment #29)
> > FWIW I made a kickstart for the CentOS folks here:
> > https://github.com/CentOS/sig-cloud-instance-build/pull/165
> > 
> > The KS assumes you are installing on a UEFI enabled system (using
> > virt-install I just added "--boot uefi" as a parameter), and then BIOS boot
> > is configured in postinst. Any feedback from Red Hat on this approach would
> > be appreciated.
> 
> Thanks a lot! I modified it just a tiny bit to make it work for RHEL8:
> http://people.redhat.com/~vkuznets/RHEL8-Azure.ks

Hi Vitaly, check the ks file with Xuemin, after add parameter in /etc/default/grub, only generate /boot/grub2/grub.cfg, /boot/efi/EFI/redhat/grub.cfg also need to be regernated, is this right? Thanks!

Comment 37 Vitaly Kuznetsov 2019-11-22 13:15:30 UTC
(In reply to HuijingHei from comment #36)
> 
> 
> Hi Vitaly, check the ks file with Xuemin, after add parameter in
> /etc/default/grub, only generate /boot/grub2/grub.cfg,
> /boot/efi/EFI/redhat/grub.cfg also need to be regernated, is this right?

Yes. I hope I was able to fix all three issues you've mentioned, please check the updated
http://people.redhat.com/~vkuznets/RHEL8-Azure.ks

Diff:
--- /tmp/original-ks.cfg	2019-11-22 14:10:18.286221389 +0100
+++ /tmp/RHEL8-Azure.ks	2019-11-22 14:09:33.062880214 +0100
@@ -145,9 +145,9 @@
 echo 'GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"' >> /etc/default/grub
 sed -i 's/^GRUB_TERMINAL_OUTPUT=".*"$/GRUB_TERMINAL="serial console"/g' /etc/default/grub
 
+grub2-mkconfig --output /etc/grub2-efi.cfg
+
 # Enable BIOS bootloader
-rm -f /boot/grub2/grubenv
-cp /boot/efi/EFI/redhat/grubenv /boot/grub2/grubenv
 grub2-install --target=i386-pc --directory=/usr/lib/grub/i386-pc/ /dev/sda
 grub2-mkconfig --output=/boot/grub2/grub.cfg
 
@@ -156,6 +156,8 @@
  BOOT_ID=`blkid --match-tag UUID --output value /dev/sda1`
  sed -i 's/gpt15/gpt1/' /boot/grub2/grub.cfg
  sed -i "s/${EFI_ID}/${BOOT_ID}/" /boot/grub2/grub.cfg
+ sed -i 's|${config_directory}/grubenv|(hd0,gpt15)/efi/redhat/grubenv|' /boot/grub2/grub.cfg
+ sed -i '/^### BEGIN \/etc\/grub.d\/30_uefi/,/^### END \/etc\/grub.d\/30_uefi/{/^### BEGIN \/etc\/grub.d\/30_uefi/!{/^### END \/etc\/grub.d\/30_uefi/!d}}' /boot/grub2/grub.cfg
 
 # Blacklist the nouveau driver
 cat << EOF > /etc/modprobe.d/blacklist-nouveau.conf

with these changes kernel updates (and manual grub2-editenv usage) is supposed to work flawlessly. I'm still testing this
but the results are promising.

Comment 38 Laszlo Ersek 2019-11-23 11:07:45 UTC
Hi All,

can someone please clarify the scope for me -- is my understanding correct that we expect the installed guest to successfully *alternate* between being booted under UEFI and being booted under traditional BIOS?

Because it's one thing to provide an initial image that can "latch" to either BIOS or UEFI operation, going forward, and it's another thing (it seems to me anyway) to expect the installed guest (in production) to flip back & forth between BIOS and UEFI, from boot to boot, and to keep upgrading the kernel / updating the grub config etc. under such circumstances.

Thanks,
Laszlo

Comment 39 HuijingHei 2019-11-25 07:45:58 UTC
(In reply to Vitaly Kuznetsov from comment #37)
> Yes. I hope I was able to fix all three issues you've mentioned, please
> check the updated
> http://people.redhat.com/~vkuznets/RHEL8-Azure.ks
> 
> Diff:
> --- /tmp/original-ks.cfg	2019-11-22 14:10:18.286221389 +0100
> +++ /tmp/RHEL8-Azure.ks	2019-11-22 14:09:33.062880214 +0100
> @@ -145,9 +145,9 @@
>  echo 'GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8
> --parity=no --stop=1"' >> /etc/default/grub
>  sed -i 's/^GRUB_TERMINAL_OUTPUT=".*"$/GRUB_TERMINAL="serial console"/g'
> /etc/default/grub
>  
> +grub2-mkconfig --output /etc/grub2-efi.cfg
> +
>  # Enable BIOS bootloader
> -rm -f /boot/grub2/grubenv
> -cp /boot/efi/EFI/redhat/grubenv /boot/grub2/grubenv
>  grub2-install --target=i386-pc --directory=/usr/lib/grub/i386-pc/ /dev/sda
>  grub2-mkconfig --output=/boot/grub2/grub.cfg
>  
> @@ -156,6 +156,8 @@
>   BOOT_ID=`blkid --match-tag UUID --output value /dev/sda1`
>   sed -i 's/gpt15/gpt1/' /boot/grub2/grub.cfg
>   sed -i "s/${EFI_ID}/${BOOT_ID}/" /boot/grub2/grub.cfg
> + sed -i 's|${config_directory}/grubenv|(hd0,gpt15)/efi/redhat/grubenv|'
> /boot/grub2/grub.cfg
> + sed -i '/^### BEGIN \/etc\/grub.d\/30_uefi/,/^### END
> \/etc\/grub.d\/30_uefi/{/^### BEGIN \/etc\/grub.d\/30_uefi/!{/^### END
> \/etc\/grub.d\/30_uefi/!d}}' /boot/grub2/grub.cfg
>  
>  # Blacklist the nouveau driver
>  cat << EOF > /etc/modprobe.d/blacklist-nouveau.conf
> 
> with these changes kernel updates (and manual grub2-editenv usage) is
> supposed to work flawlessly. I'm still testing this
> but the results are promising.

Create the qcow2 image and convert to vhd, test on Azure, both gen1 and gen2 vm can start successfully. 

All the three issues are fixed. Thanks!
1) Load vm with BIOS, no "System setup" on the grub
2) Load vm with EFI and upgrade kernel, the default boot kernel changed
3) /boot/efi/EFI/redhat/grub.cfg is recreated after update /etc/default/grub

Comment 40 Vitaly Kuznetsov 2019-11-25 09:39:26 UTC
(In reply to Laszlo Ersek from comment #38)
> Hi All,
> 
> can someone please clarify the scope for me -- is my understanding correct
> that we expect the installed guest to successfully *alternate* between being
> booted under UEFI and being booted under traditional BIOS?
> 
> Because it's one thing to provide an initial image that can "latch" to
> either BIOS or UEFI operation, going forward, and it's another thing (it
> seems to me anyway) to expect the installed guest (in production) to flip
> back & forth between BIOS and UEFI, from boot to boot, and to keep upgrading
> the kernel / updating the grub config etc. under such circumstances.
> 

I'm not sure Azure allows changing generation (UEFI/BIOS) while changing instance
type directly, however, it is likely possible to detach root volume from e.g. Gen2
instance and try attaching to Gen1 instance.
Generally, we don't need to support this use-case, the immediate need is to have
the same image which can be used to provision both Gen1 and Gen2 without any changes
to it. We can, in theory, get away with some first boot configuration making the
provisioned instance Gen1-only or Gen2-only, however, going forward I'd aim at
making it possible to 'alternate'.

Comment 41 Laszlo Ersek 2019-11-25 12:58:35 UTC
Thanks, Vitaly!

Comment 42 Alfred Sin 2019-11-25 19:21:36 UTC
(In reply to Vitaly Kuznetsov from comment #40)
> (In reply to Laszlo Ersek from comment #38)
> > Hi All,
> > 
> > can someone please clarify the scope for me -- is my understanding correct
> > that we expect the installed guest to successfully *alternate* between being
> > booted under UEFI and being booted under traditional BIOS?
> > 
> > Because it's one thing to provide an initial image that can "latch" to
> > either BIOS or UEFI operation, going forward, and it's another thing (it
> > seems to me anyway) to expect the installed guest (in production) to flip
> > back & forth between BIOS and UEFI, from boot to boot, and to keep upgrading
> > the kernel / updating the grub config etc. under such circumstances.
> > 
> 
> I'm not sure Azure allows changing generation (UEFI/BIOS) while changing
> instance
> type directly, however, it is likely possible to detach root volume from
> e.g. Gen2
> instance and try attaching to Gen1 instance.
> Generally, we don't need to support this use-case, the immediate need is to
> have
> the same image which can be used to provision both Gen1 and Gen2 without any
> changes
> to it. We can, in theory, get away with some first boot configuration making
> the
> provisioned instance Gen1-only or Gen2-only, however, going forward I'd aim
> at
> making it possible to 'alternate'.

Yeah, Vitaly is right. The biggest value-add (IMHO) of the dual-boot image is to simplify our image building and publishing process where we don't have to worry about the Hyper-V generation of the image we're building and publishing. I don't even think Azure right now supports switching back and forth between generations.

Comment 43 Stephen A. Zarkos 2019-11-25 21:04:50 UTC
Thank you all for the feedback and validation effort, awesome teamwork!

Comment 44 Alfred Sin 2019-12-04 19:23:35 UTC
Hi Vitaly,

We created a RHEL 8.1 dual-boot image but we were unable to boot a VM created from a Gen 1 image based off that VHD. The VM showed up in Azure but never provisioned. Sri on our team has more details so I'll tag him as needinfo. We used the kickstart that you provided, with a couple modifications:
    - we used LVM partitioning
    - we run the steps in the %post section in a separate part of our image build, so our ks ends at the end of the %packages section

Do you have any thoughts off the top of your head as to where we might have gone wrong? The image build itself reports as successful so there's not much to go off of. We only know something is wrong because a Gen 1 image will not succeed in provisioning a VM.

Thanks,
Alfred

Comment 45 Vitaly Kuznetsov 2019-12-05 11:13:40 UTC
(In reply to Alfred Sin from comment #44)
> Hi Vitaly,
> 
> We created a RHEL 8.1 dual-boot image but we were unable to boot a VM
> created from a Gen 1 image based off that VHD. The VM showed up in Azure but
> never provisioned. Sri on our team has more details so I'll tag him as
> needinfo. We used the kickstart that you provided, with a couple
> modifications:
>     - we used LVM partitioning
>     - we run the steps in the %post section in a separate part of our image
> build, so our ks ends at the end of the %packages section

Hi Alfred,

I can't think of what can be wrong in this separate part, please double check that
grub installation and grub BIOS config creation steps are performed.

I decided to check if the kickstart works with LVM, the result can be downloaded
from here:
http://people.redhat.com/~vkuznets/RHEL8-Azure-LVM.ks

The diff is simple:
# diff -u RHEL8-Azure.ks RHEL8-Azure-LVM.ks 
--- RHEL8-Azure.ks	2019-12-05 05:56:02.454662187 -0500
+++ RHEL8-Azure-LVM.ks	2019-12-05 05:56:33.742842874 -0500
@@ -51,7 +51,9 @@
 # part biosboot --onpart=sda14 --size=4
 part /boot/efi --onpart=sda15 --fstype=vfat
 part /boot --fstype="xfs" --size=500
-part / --fstype="xfs" --size=1 --grow --asprimary
+part pv.01 --fstype=lvmpv --size=1 --grow
+volgroup rootvg pv.01
+logvol / --vgname=rootvg --fstype=xfs --size=1 --grow --name=rootlv
 
 %pre --log=/var/log/anaconda/pre-install.log --erroronfail
 #!/bin/bash

The resulting image seems to be booting well with both BIOS and UEFI.

Comment 46 Alfred Sin 2019-12-12 22:29:14 UTC
I don't think I ever got back to resolving this - we did manage to get a RHEL 8 image that boots as Gen 1 and Gen 2, based off the kickstart file that Vitaly and Steve provided. Thanks all for the help.

Comment 47 Rick Barry 2019-12-13 16:12:21 UTC
(In reply to Alfred Sin from comment #46)
> I don't think I ever got back to resolving this - we did manage to get a
> RHEL 8 image that boots as Gen 1 and Gen 2, based off the kickstart file
> that Vitaly and Steve provided. Thanks all for the help.

Thanks for the update, Alfred.

Are you OK with us closing this BZ?

Comment 48 Alfred Sin 2019-12-16 19:20:21 UTC
Yes, let's close it. Thanks for checking, Rick.

Comment 49 Rick Barry 2019-12-16 19:25:36 UTC
Thanks, Alfred. I'm setting this to CLOSED/NOTABUG since there were no
issues in the assigned component and no fixes nedeed on the RHEL side.

If you encounter any problems in the future, feel free to enter a new bug.

Comment 50 Sriharsha-MSFT 2020-06-20 05:15:26 UTC
Michael Kelley performed some experiments and this the issue in his own words
"
I have a theory, and I’ve run the experiments to prove that it is right. 😊
 
The problem originates with trying to make a single image work for both Generation 1 and Generation 2 VMs.  I see the problem with both RHEL 8.1+ and CentOS 8.1+ images when running in a Generation 1 VM.
 
As the images are shipped, this “code” is near the top of the /boot/grub2/grub.cfg file:
 
if [ -f (hd0,gpt15)/efi/redhat/grubenv ]; then
  load_env -f (hd0,gpt15)/efi/redhat/grubenv
elif [ -s $prefix/grubenv ]; then
  load_env
fi
 
But when grub2-mkconfig is run, the script in /etc/grub.d/00_header unconditionally produces the following for the newly generated grub.cfg:
 
if [ -f ${config_directory}/grubenv ]; then
  load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
  load_env
fi
 
The newly generated version interacts badly with /boot/grub2/grubenv being a symlink to ../efi/EFI/redhat/grubenv.   The old version explicitly tests for the existence of the path /efi/redhat/grubenv on the EFI System Partition [which is (hdo, gpt15)], and reads the grubenv from there, which works.  But the new version tries to follow the symlink, which I think is not working and is what causes grub to hang.
 
If I replace the /boot/grub2/grubenv symlink with an actual copy of the file, then the newly generated grub.cfg file works with no problems.
"

Comment 51 Vitaly Kuznetsov 2020-06-22 11:39:33 UTC
This looks correct, if grub config is regenerated the result may not be exactly the same. My understanding,
however, is that users are not supposed to re-generate grub config with 'grub2-mkconfig' under normal
conditions. Any information on why this is necessary?

Comment 52 Sriharsha-MSFT 2020-06-22 11:48:34 UTC
This is the case wherein customers might need to go to the older installed Kernel. Also, when installing Kernel from the repository, I assume that during Kernel installation, grub2-mkconfig will be run by default. 
The root cause of this issue is that grubenv is present in /boot/efi/EFI/redhat/grubenv and there is a symlink in /boot/grub2/grubenv. I'm not sure if Grub understands symlinks to another disk or partition. 
The resolution of the bug is to create a copy rather than symlink. Can this fix be applied in the upcoming versions of grub2-pc to create a copy rather than symlink ?

Comment 53 Vitaly Kuznetsov 2020-06-22 13:34:21 UTC
(In reply to Sriharsha-MSFT from comment #52)
> This is the case wherein customers might need to go to the older installed
> Kernel.

Hm, the usual way to handle this is "grub2-set-default"/"grub2-reboot"


> Also, when installing Kernel from the repository, I assume that
> during Kernel installation, grub2-mkconfig will be run by default. 

I'm not sure, I think RHEL uses grubby to modify grub2 config, it is not
 regenerated.

> The root cause of this issue is that grubenv is present in
> /boot/efi/EFI/redhat/grubenv and there is a symlink in /boot/grub2/grubenv.
> I'm not sure if Grub understands symlinks to another disk or partition. 
> The resolution of the bug is to create a copy rather than symlink. Can this
> fix be applied in the upcoming versions of grub2-pc to create a copy rather
> than symlink ?

I see, maybe Peter can answer.

Comment 54 Michael Kelley 2020-06-22 13:57:22 UTC
(In reply to Vitaly Kuznetsov from comment #51)
> This looks correct, if grub config is regenerated the result may not be
> exactly the same. My understanding,
> however, is that users are not supposed to re-generate grub config with
> 'grub2-mkconfig' under normal
> conditions. Any information on why this is necessary?

FWIW, we do expect customers to be able to run grub2-mkconfig and have it work.  We set up the grub parameters for fast boot in a production environment, but in a dev/test environment, the customer may want a longer grub timeout or choose a different menu style.  We also want to allow customers to add kernel boot line parameters, again for various special configurations (set maxcpus=<n>, for example).  And even if we didn't expect customers to run grub2-mkconfig, there's not much we could do to prevent it, short of removing the file.  So from our standpoint,  it is important that grub2-mkconfig should work, and it must leave the VM in the same state as before, modulo any changes the customer has made to /etc/default/grub.

Comment 55 Rick Barry 2020-06-22 16:19:02 UTC
Adding needinfo for Peter.

Peter, can you comment on the use case being discussed in comments 50 - 54?

Comment 56 Sriharsha-MSFT 2020-06-22 19:46:40 UTC
I mean to say there is a symlink in /boot/grub2/grub.cfg which points to /boot/efi/EFI/redhat/grubenv.

So the generated config file using grub2-mkconfig using grub2-PC bootloader tries to find the grubenv file but finds a symlink and this is causing boot failure. 

My theory is that $config_directory is affected by this symlink and is unable to read the grubenv.

Im more interested to understand the reasoning behind this symlink creation.

If this is a bug, can we release a fix ASAP since this is affecting all the RHEL8.x images on Azure.

Comment 57 Vitaly Kuznetsov 2020-06-23 09:02:23 UTC
[I think it makes sense to open a new BZ against grub2 in RHEL8 as this one
is for RHEL7 and already closed.]

Starting from RHEL8.2 we've made RHEL-guest-image (QCOW2) hybrid booted (BIOS/UEFI).
The difference which what (AFAIR) was discussed for Azure is that we install the guest
in BIOS mode and add UEFI boot later (for Azure we were doing things the other way around).
The following changes were made to the kickstart:

diff --git a/rhel8/rhel-8.2-kvm-x86_64.ks b/rhel8/rhel-8.2-kvm-x86_64.ks
index f111525297d8..4ac11f551ebd 100644
--- a/rhel8/rhel-8.2-kvm-x86_64.ks
+++ b/rhel8/rhel-8.2-kvm-x86_64.ks
@@ -19,12 +19,15 @@ rootpw --iscrypted nope
 # This information is used by appliance-tools but
 # not by the livecd tools.
 #
-zerombr
-clearpart --all --initlabel
-# autopart --type=plain --nohome # --nohome doesn't work because of rhbz#1509350
-# autopart is problematic in that it creates /boot and swap partitions rhbz#1542510 rhbz#1673094
-reqpart
-part / --fstype="xfs" --ondisk=vda --size=8000
+%pre --erroronfail
+/usr/bin/dd bs=512 count=10 if=/dev/zero of=/dev/vda
+/usr/sbin/parted -s /dev/vda mklabel gpt
+/usr/sbin/parted -s /dev/vda print
+%end
+
+part biosboot  --size=1   --fstype=biosboot
+part /boot/efi --size=100 --fstype=efi
+part /         --size=7899 --fstype=xfs --label=root --grow
 reboot
 
 # Packages
@@ -35,6 +38,9 @@ kernel
 yum
 nfs-utils
 dnf-utils
+grub2-pc
+grub2-efi-x64
+shim
 
 # pull firmware packages out
 -aic94xx-firmware
@@ -143,6 +149,16 @@ insights-client
 passwd -d root
 passwd -l root
 
+# setup uefi boot
+/usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg
+/usr/sbin/parted -s /dev/vda disk_set pmbr_boot off
+
+# setup bios boot
+cat <<'EOF' > /etc/grub2.cfg
+search --no-floppy --set efi --file /efi/redhat/grub.cfg
+configfile ($efi)/efi/redhat/grub.cfg
+EOF
+
 # setup systemd to boot to the right runlevel
 echo -n "Setting default runlevel to multiuser text mode"
 rm -f /etc/systemd/system/default.target

In this setup we're not hacking grub2 config so I *think* it should
survive re-generating '/etc/grub2-efi.cfg' (/etc/grub2.cfg is a simple
stub including grub2-efi.cfg).

Could you check if this would work for Azure images?

Comment 58 Sriharsha-MSFT 2020-06-23 17:34:24 UTC
Please look at following Bugzilla as it is created for RHEL 8.x and Thanks Vitaly for pointing out. I have further questions on my mind based on the newer updates which I have posted there:
https://bugzilla.redhat.com/show_bug.cgi?id=1850193

Comment 59 Chandan Chouhan 2021-12-23 11:50:41 UTC
Hello Team, 

We have a case from the customer MS, where they reported the issue that they are not able to make older kernel as default after installing new kernel.

The steps they followed are :

1. Updated the server, new kernel is installed as a part of update.

2. As expected, the server is up with new kernel after reboot. 

3. Now the issue is when they are running below command to make older one as default: 
# grub2-set-default 1 

or 

They follow other steps mentioned in below kcs:
https://access.redhat.com/solutions/3089

4. They are unable to set older one as default. It is always up with newer kernel as default after reboot. 

It looks like the issue is because of below symlink. If the symlink is removed, it is working as expected.
/boot/grub2/grubenv -> ../efi/EFI/redhat/grubenv

It looks like the customer reported similar issue in C#50, 52.

As per customer, they are facing this issue in both rhel-7 and rhel-8.

Let us know if you need any reproducer steps/information from the customer.

Comment 60 Vitaly Kuznetsov 2021-12-23 13:25:08 UTC
(In reply to Chandan Chouhan from comment #59)
> Hello Team, 
> 
> We have a case from the customer MS, where they reported the issue that they
> are not able to make older kernel as default after installing new kernel.
> 
> The steps they followed are :
> 
> 1. Updated the server, new kernel is installed as a part of update.
> 
> 2. As expected, the server is up with new kernel after reboot. 
> 
> 3. Now the issue is when they are running below command to make older one as
> default: 
> # grub2-set-default 1 
> 
> or 
> 
> They follow other steps mentioned in below kcs:
> https://access.redhat.com/solutions/3089
> 
> 4. They are unable to set older one as default. It is always up with newer
> kernel as default after reboot. 
> 
> It looks like the issue is because of below symlink. If the symlink is
> removed, it is working as expected.
> /boot/grub2/grubenv -> ../efi/EFI/redhat/grubenv
> 
> It looks like the customer reported similar issue in C#50, 52.
> 
> As per customer, they are facing this issue in both rhel-7 and rhel-8.

Please open a new BZ against e.g. RHEL8/'grub2' package as this one is already used for
all sort of things like kickstart files and is already CLOSED.

Comment 61 Chandan Chouhan 2021-12-28 08:59:50 UTC
Hello Vitaly,

Raised a new bugzilla for rhel-7

Unable to make older kernel as default in rhel-7.x azure images.
https://bugzilla.redhat.com/show_bug.cgi?id=2035737

Comment 62 Yuxin Sun 2021-12-29 01:23:44 UTC
(In reply to Chandan Chouhan from comment #61)
> Hello Vitaly,
> 
> Raised a new bugzilla for rhel-7
> 
> Unable to make older kernel as default in rhel-7.x azure images.
> https://bugzilla.redhat.com/show_bug.cgi?id=2035737

Hi Chandan,

I'm the QE of RHEL guest on Azure. Could you please help to grant the BZ#2035737 access to me? I cannot access to this BZ now. Thanks!

Comment 63 Chandan Chouhan 2021-12-29 02:26:52 UTC
Hello Yuxin,

Kindly check now. I added you there.