Description of problem: Hello all, our RHEV 4.0 infrastructure is composited by: 1x RHEV-M hosted engine vm 2x RHEV-H hypervisors We are facing a GRUB issue. Let's think we have 2 scenarios: 1) In the first scenario we just installed both hypervisors with default options. Running grub2-mkconfig -o /boot/grub2/grub.cfg will correctly update the grub.cfg with the current kernel references. In this moment we have 3.10.0-327.28.2.el7.x86_64 kernel and both vmlinuz and initramfs files are present under /boot 2) In the second scenario we updated both hypervisors. We are running now 3.10.0-327.36.1.el7.x86_64. This time running grub2-mkconfig -o /boot/grub2/grub.cfg will overwrite grub.cfg omitting the new kernel references because his files now are present under /boot/rhvh-4.0-0.20161012.0+1 directory: # ls -l /boot total 80344 -rw-r--r--. 1 root root 126431 Jun 27 20:52 config-3.10.0-327.28.2.el7.x86_64 drwxr-xr-x. 3 root root 4096 Aug 17 22:07 efi -rw-r--r--. 1 root root 178176 Sep 5 2014 elf-memtest86+-4.20 drwxr-xr-x. 2 root root 4096 Aug 17 22:11 extlinux drwx------. 6 root root 4096 Nov 11 13:26 grub2 -rw-r--r--. 1 root root 47977751 Sep 26 13:06 initramfs-3.10.0-327.28.2.el7.x86_64.img -rw-r--r--. 1 root root 24429118 Nov 11 13:10 initramfs-3.10.0-327.36.1.el7.x86_64kdump.img -rw-r--r--. 1 root root 603547 Aug 17 22:20 initrd-plymouth.img drwx------. 2 root root 16384 Sep 26 13:02 lost+found -rw-r--r--. 1 root root 176500 Sep 5 2014 memtest86+-4.20 drwxr-xr-x. 2 root root 4096 Sep 26 13:07 rhvh-4.0-0.20160817.0+1 drwxr-xr-x. 2 root root 4096 Oct 19 17:49 rhvh-4.0-0.20160919.0+1 drwxr-xr-x. 2 root root 4096 Nov 3 14:33 rhvh-4.0-0.20161012.0+1 -rw-r--r--. 1 root root 252632 Jun 27 20:54 symvers-3.10.0-327.28.2.el7.x86_64.gz -rw-------. 1 root root 2964948 Jun 27 20:52 System.map-3.10.0-327.28.2.el7.x86_64 -rw-r--r--. 1 root root 326628 Nov 11 2014 tboot.gz -rw-r--r--. 1 root root 12620 Nov 11 2014 tboot-syms -rwxr-xr-x. 1 root root 5157728 Jun 27 20:52 vmlinuz-3.10.0-327.28.2.el7.x86_64 # ls -l /boot/rhvh-4.0-0.20161012.0+1 total 55044 -rw-r--r--. 1 root root 126431 Nov 3 14:32 config-3.10.0-327.36.1.el7.x86_64 -rw-r--r--. 1 root root 47860173 Nov 3 14:33 initramfs-3.10.0-327.36.1.el7.x86_64.img -rw-r--r--. 1 root root 252739 Nov 3 14:32 symvers-3.10.0-327.36.1.el7.x86_64.gz -rw-------. 1 root root 2965270 Nov 3 14:32 System.map-3.10.0-327.36.1.el7.x86_64 -rwxr-xr-x. 1 root root 5155840 Nov 3 14:32 vmlinuz-3.10.0-327.36.1.el7.x86_64 # grub2-mkconfig | grep 3.10.0 Generating grub configuration file ... Found linux image: /boot/vmlinuz-3.10.0-327.28.2.el7.x86_64 Found initrd image: /boot/initramfs-3.10.0-327.28.2.el7.x86_64.img menuentry 'Red Hat Enterprise Linux (3.10.0-327.28.2.el7.x86_64) 7.2' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-327.28.2.el7.x86_64-advanced-/dev/mapper/rhvh-rhvh--4.0--0.20161012.0+1' { linux16 /vmlinuz-3.10.0-327.28.2.el7.x86_64 root=/dev/mapper/rhvh-rhvh--4.0--0.20161012.0+1 ro rd.lvm.lv=rhvh/rhvh-4.0-0.20161012.0+1 rd.lvm.lv=rhvh/swap rhgb hpsa.hpsa_allow_any=1 quiet initrd16 /initramfs-3.10.0-327.28.2.el7.x86_64.img Found linux image: /boot/vmlinuz-3.10.0-327.28.2.el7.x86_64 Found initrd image: /boot/initramfs-3.10.0-327.28.2.el7.x86_64.img menuentry 'Red Hat Enterprise Linux GNU/Linux, with tboot 1.8.1 and Linux 3.10.0-327.28.2.el7.x86_64' --class red --class gnu-linux --class gnu --class os --class tboot { echo 'Loading Linux 3.10.0-327.28.2.el7.x86_64 ...' module /vmlinuz-3.10.0-327.28.2.el7.x86_64 /vmlinuz-3.10.0-327.28.2.el7.x86_64 root=/dev/mapper/rhvh-rhvh--4.0--0.20161012.0+1 ro rd.lvm.lv=rhvh/rhvh-4.0-0.20161012.0+1 rd.lvm.lv=rhvh/swap rhgb hpsa.hpsa_allow_any=1 quiet intel_iommu=on module /initramfs-3.10.0-327.28.2.el7.x86_64.img /initramfs-3.10.0-327.28.2.el7.x86_64.img done I also noticed that during the last update it renamed root logical volume: # df -h / Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhvh-rhvh--4.0--0.20161012.0+1 3.9G 1.8G 2.1G 47% / # lvs | grep 2016 rhvh-4.0-0.20160817.0 rhvh Vwi---tz-k 14.78g pool00 root rhvh-4.0-0.20160817.0+1 rhvh Vwi---tz-- 14.78g pool00 rhvh-4.0-0.20160817.0 rhvh-4.0-0.20160919.0 rhvh Vri---tz-k 3.81g pool00 rhvh-4.0-0.20160919.0+1 rhvh Vwi---tz-- 3.81g pool00 rhvh-4.0-0.20160919.0 rhvh-4.0-0.20161012.0 rhvh Vri---tz-k 3.81g pool00 rhvh-4.0-0.20161012.0+1 rhvh Vwi-aotz-- 3.81g pool00 rhvh-4.0-0.20161012.0 47.11 Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Install rhev-h 4.0 2. Update the hypervisor 3. Run grub2-mkconfig Actual results: Last kernel updated not present at boot time Expected results: Last kernel updated present at boot time Additional info:
My first question is: What do you try to achieve? it looks like you try to install a custom kernel or kernel update, is this correct? Please note that RHVH is only intended to get image updates, and not individual package (like kernel package) updates.
I wanted to make permanent "hpsa.hpsa_allow_any=1" directive instead of stop the boot and edit the linux16 string during grub phase. So I added that parameter to /etc/default/grub and I ran grub2-mkconfig -o /boot/grub2/grub.cfg
Thanks - That's something we should fix. For now I'd recommend to manually edit grub.cfg until we have fixed the issue.
I've just hit this exact same bug (Internal Revenue Service in Portugal) on a upgrade of RHVH initiated from the UI. This is rather critical since any host that's upgraded via the UI exhibits this behaviour. For correction, we've moved the new vmlinuz image to /boot.
Created attachment 1247661 [details] comment 9: grub files
Looks like this requires a follow-up patch. Nice that grub2-mkconfig works. Not nice that the title is wrong (though it appears that it'll boot into RHVH correctly). Can you please try rebooting after this (either twice, or once and checking grub.Cfg)? I'm a little worried that we'll remove all those boot entries and leave the system unbootable, in which case this will need a new build (either to revert, which will also break virt-v2v again, or with path to /etc/grub2.conf.d/)
(In reply to Ryan Barry from comment #11) > Can you please try rebooting after this (either twice, or once and checking > grub.Cfg)? I'm a little worried that we'll remove all those boot entries and > leave the system unbootable, in which case this will need a new build > (either to revert, which will also break virt-v2v again, or with path to > /etc/grub2.conf.d/) After reboot several times(once, or twice, or three times), the boot entries change again, it shows like: ---------------------------- tboot 1.8.1 ---------------------------- Enter it(tboot 1.8.1), submenu of boot entries shows like: --------------------------- Red Hat Enterprise Linux GNU/Linux, with tboot 1.8.1 and Linux 3.10.0-514.6.1.el7.x86_64 Red Hat Enterprise Linux GNU/Linux, with tboot 1.8.1 and Linux 3.10.0-327.36.1.el7.x86_64 --------------------------- Select the first boot entry and enter it, can boot successful to new build(RHVH-4.1-20170202.0). Select the second boot entry and enter it, it is emergency mode of new build(RHVH-4.1-20170202.0). So it means can not boot old build(RHVH-4.0-20160919.0) after reboot twice. Please refer to attachment for detailed grub.cfg info.
Created attachment 1247931 [details] Comment 12: grub files after reboot several times
Sandro - The results of this were reported late, but we need to block/respin on this. Either revert the patch which caused this (which will also break virt-v2v for the beta) while a grub2 script is written for GA, or write/patch before beta. I'd guess the patch will be quick (~2 hours to write/test), but it's another thing to verify very late. Your call.
Let's fix this ASAP and go async if needed
(In reply to Ryan Barry from comment #11) > I'm a little worried that we'll remove all those boot entries and > leave the system unbootable, in which case this will need a new build > (either to revert, which will also break virt-v2v again, or with path to > /etc/grub2.conf.d/) Can you please detail how this affects virt-v2v?
See: https://bugzilla.redhat.com/show_bug.cgi?id=1392904 The fix for both bugs is to put the running kernel in /boot, which also causes the problem from comment#10 and comment#12. Reverting the patch (to no longer put the kernel and initrd in /boot) will break v2v again, since /boot/kernel-... will no longer be correct.
Still encounter this issue as comment 9 and comment 12 in redhat-virtualization-host-4.1-20170208.0. From redhat-virtualization-host-4.0-20170201.0 To redhat-virtualization-host-4.1-20170208.0 Small difference: 1. After first boot, boot entry shows like: --------------- Red Hat Enterprise Linux (3.10.0-514.6.1.el7.x86_64) 7.3 Red Hat Enterprise Linux (3.10.0-514.2.2.el7.x86_64) 7.3 tboot 1.9.4 --------------- 2. After second boot, boot entry shows like: --------------- tboot 1.9.4 --------------- Enter it, sub boot entry shows like: --------------- Red Hat Enterprise Linux GNU/Linux, with tboot 1.9.4 and Linux 3.10.0-514.6.1.el7.x86_64 Red Hat Enterprise Linux GNU/Linux, with tboot 1.9.4 and Linux 3.10.0-514.2.2.el7.x86_64 ---------------
Test version: From: redhat-virtualization-host-4.0-20160919.0 To: redhat-virtualization-host-4.1-20170222.0 imgbased-0.9.13-0.1.el7ev.noarch Test steps: Tested according to comment 9. Test results: 1. Checks in step3 are all correct. 2. But in step4, run "# grub2-mkconfig -o /boot/grub2/grub.cfg" in host. 2.1 After first boot, boot entry shows like: -------------------------------- rhvh-4.1-0.20170223.0+1 rhvh-4.0-0.20160919.0+1 Red Hat Enterprise Linux (3.10.0-514.6.1.el7.x86_64) 7.3 Red Hat Enterprise Linux (3.10.0-327.36.1.el7.x86_64) 7.3 tboot 1.9.4 Red Hat Enterprise Linux Release 7.2 (on /dev/mapper/rhvh_dhcp--10--16-rhvh--4.0--0.20160919.0+1) Advanced options for Red Hat Enterprise Linux release 7.2 (on /dev/mapper/rhvh_dhcp--10--16-rhvh--4.0--0.20160919.0+1) Red Hat Enterprise Linux Release 7.2 (on /dev/mapper/rhvh_dhcp--10--16-root) Advanced options for Red Hat Enterprise Linux release 7.2 (on /dev/mapper/rhvh_dhcp--10--16-root) -------------------------------- This is unnormal? 2.2 After second or several times boot, boot entry shows like: -------------------------------- rhvh-4.1-0.20170223.0+1 rhvh-4.0-0.20160919.0+1 tboot 1.9.4 -------------------------------- This is correct. So the boot entry of first boot is not expected results, is it right? could I verify this bug?
This is expected. We could totally disable 10_linux, but this may have bad side effects when installing... As it is, the new script relies on imgbased-clean-grub to pick out the right changes as normal, to avoid the risk of an unbootable system (for the same reason as the RHEL boot entries are only removed after the first boot)
Thanks Ryan. According to comment 23 and comment 24, change the status to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1114