Bug 1805589

Summary: grub2-mkconfig produces incorrect config if host installed over iSCSI
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: redhat-virtualization-hostAssignee: Nir Levy <nlevy>
Status: CLOSED ERRATA QA Contact: cshao <cshao>
Severity: medium Docs Contact:
Priority: high    
Version: 4.3.8CC: cshao, dfediuck, jeharris, lsvaty, mavital, mtessun, nlevy, peyu, qiyuan, sbonazzo, shlei, weiwang, yaniwang, yturgema
Target Milestone: ovirt-4.4.2   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-05 13:09:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
grub.cfg at step 1
none
grub.cfg at step 3 none

Description Germano Veit Michel 2020-02-21 05:04:49 UTC
Description of problem:

Sometimes grub2-mkconfig generates incorrect config on hosts installed over iscsi (local disk for /boot only).
There is always a duplicate rd.lvm.lv=rhvh_rhvh/rhvh-x.y.z argument and it randomly misses some other LV on the kernel command line.
I've reproduced twice with missing 'swap' and 'var_log_audit', and for the customer it was missing 'var'.
It can make the host fail to reboot.

1. Fresh install

# nodectl info
layers: 
  rhvh-4.3.8.1-0.20200126.0: 
    rhvh-4.3.8.1-0.20200126.0+1
bootloader: 
  default: rhvh-4.3.8.1-0.20200126.0 (3.10.0-1062.12.1.el7.x86_64)
  entries: 
    rhvh-4.3.8.1-0.20200126.0 (3.10.0-1062.12.1.el7.x86_64): 
      index: 0
      title: rhvh-4.3.8.1-0.20200126.0 (3.10.0-1062.12.1.el7.x86_64)
      kernel: /boot/rhvh-4.3.8.1-0.20200126.0+1/vmlinuz-3.10.0-1062.12.1.el7.x86_64
      args: "ro spectre_v2=retpoline rd.lvm.lv=rhvh_rhvh/home netroot=iscsi:@192.168.150.70::3260:iface0:eth0::iqn.2003-01.org.linux-iscsi.storage.x8664:sn.f39f33014c8b rd.iscsi.initiator=iqn.1994-05.com.redhat:248c71871e12 rd.lvm.lv=rhvh_rhvh/swap rd.lvm.lv=rhvh_rhvh/tmp rd.lvm.lv=rhvh_rhvh/rhvh-4.3.8.1-0.20200126.0+1 rd.lvm.lv=rhvh_rhvh/var rd.lvm.lv=rhvh_rhvh/var_log rd.lvm.lv=rhvh_rhvh/var_log_audit rhgb quiet ip=eth0:dhcp LANG=en_AU.UTF-8 img.bootid=rhvh-4.3.8.1-0.20200126.0+1"
      initrd: /boot/rhvh-4.3.8.1-0.20200126.0+1/initramfs-3.10.0-1062.12.1.el7.x86_64.img
      root: /dev/rhvh_rhvh/rhvh-4.3.8.1-0.20200126.0+1
current_layer: rhvh-4.3.8.1-0.20200126.0+1

2. Check Current kernel command line "rd.lvm.lv" arguments:

# egrep -o "rd.lvm.lv=[.0-9a-z_/+-]+" /boot/grub2/grub.cfg
rd.lvm.lv=rhvh_rhvh/home
rd.lvm.lv=rhvh_rhvh/swap
rd.lvm.lv=rhvh_rhvh/tmp
rd.lvm.lv=rhvh_rhvh/rhvh-4.3.8.1-0.20200126.0+1
rd.lvm.lv=rhvh_rhvh/var
rd.lvm.lv=rhvh_rhvh/var_log
rd.lvm.lv=rhvh_rhvh/var_log_audit

3. Generate new, see the duplicate and now swap is missing:

# grub2-mkconfig | egrep -o "rd.lvm.lv=[.0-9a-z_/+-]+"
Generating grub configuration file ...
Found linux image: /boot/rhvh-4.3.8.1-0.20200126.0+1//vmlinuz-3.10.0-1062.12.1.el7.x86_64
Found initrd image: /boot/rhvh-4.3.8.1-0.20200126.0+1/initramfs-3.10.0-1062.12.1.el7.x86_64.img
rd.lvm.lv=rhvh_rhvh/home
rd.lvm.lv=rhvh_rhvh/rhvh-4.3.8.1-0.20200126.0+1
rd.lvm.lv=rhvh_rhvh/tmp
rd.lvm.lv=rhvh_rhvh/var_log
rd.lvm.lv=rhvh_rhvh/var_log_audit
rd.lvm.lv=rhvh_rhvh/var
rd.lvm.lv=rhvh_rhvh/rhvh-4.3.8.1-0.20200126.0+1    <--- duplicate. Where is swap?
done

Version-Release number of selected component (if applicable):
rhvh-4.3.8.1-0.20200126.0

How reproducible:
2 out of 3 times

Steps to Reproduce:
1. Install host using 2 disks:
   local disk -> /boot
   iscsi disk -> rhvh VG
2. Run grub2-mkconfig
3. grub configuration missing some LV, host fails to reboot

Additional info:
# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=988184k,nr_inodes=247046,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/rhvh_rhvh-rhvh--4.3.8.1--0.20200126.0+1 on / type ext4 (rw,relatime,seclabel,discard,stripe=1024,data=ordered)
/dev/mapper/rhvh_rhvh-var on /var type ext4 (rw,relatime,seclabel,discard,stripe=1024,data=ordered)
/dev/mapper/rhvh_rhvh-tmp on /tmp type ext4 (rw,relatime,seclabel,discard,stripe=1024,data=ordered)
/dev/mapper/rhvh_rhvh-home on /home type ext4 (rw,relatime,seclabel,discard,stripe=1024,data=ordered)
/dev/mapper/rhvh_rhvh-var_log on /var/log type ext4 (rw,relatime,seclabel,discard,stripe=1024,data=ordered)
/dev/mapper/rhvh_rhvh-var_log_audit on /var/log/audit type ext4 (rw,relatime,seclabel,discard,stripe=1024,data=ordered)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
hugetlbfs on /dev/hugepages1G type hugetlbfs (rw,relatime,seclabel,pagesize=1G)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=33,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=16022)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
/dev/vda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered)
/dev/mapper/rhvh_rhvh-var_crash on /var/crash type ext4 (rw,relatime,seclabel,discard,stripe=1024,data=ordered)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=204696k,mode=700)

# vgs
  VG                                   #PV #LV #SN Attr   VSize  VFree  
  rhvh_rhvh                              1  11   0 wz--n- 79.99g  14.07g

# vgs
  VG                                   #PV #LV #SN Attr   VSize  VFree  
  2af97604-e629-4308-b655-3d5a88a64024   1  10   0 wz--n- 49.62g <43.88g
  rhvh_rhvh                              1  11   0 wz--n- 79.99g  14.07g

# lsblk
NAME                                                                                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                                                                   8:0    0   50G  0 disk 
├─2af97604--e629--4308--b655--3d5a88a64024-metadata                                 253:12   0  128M  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-outbox                                   253:13   0  128M  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-xleases                                  253:14   0    1G  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-leases                                   253:15   0    2G  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-ids                                      253:16   0  128M  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-inbox                                    253:17   0  128M  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-master                                   253:18   0    1G  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-bf6a2878--00db--4209--8d5d--18f41d4af4fa 253:19   0  128M  0 lvm  
├─2af97604--e629--4308--b655--3d5a88a64024-a3eec119--89e0--47c0--9fae--9b42ac9d8c40 253:20   0  128M  0 lvm  
└─2af97604--e629--4308--b655--3d5a88a64024-db40946e--6a4b--463b--b7c4--fc96629bf61c 253:21   0    1G  0 lvm  
sdb                                                                                   8:16   0   80G  0 disk 
└─sdb1                                                                                8:17   0   80G  0 part 
  ├─rhvh_rhvh-pool00_tmeta                                                          253:0    0    1G  0 lvm  
  │ └─rhvh_rhvh-pool00-tpool                                                        253:2    0 61.9G  0 lvm  
  │   ├─rhvh_rhvh-home                                                              253:3    0    1G  0 lvm  /home
  │   ├─rhvh_rhvh-tmp                                                               253:5    0    1G  0 lvm  /tmp
  │   ├─rhvh_rhvh-rhvh--4.3.8.1--0.20200126.0+1                                     253:6    0 34.9G  0 lvm  /
  │   ├─rhvh_rhvh-var                                                               253:7    0   15G  0 lvm  /var
  │   ├─rhvh_rhvh-var_log                                                           253:8    0    8G  0 lvm  /var/log
  │   ├─rhvh_rhvh-var_log_audit                                                     253:9    0    2G  0 lvm  /var/log/audit
  │   ├─rhvh_rhvh-pool00                                                            253:10   0 61.9G  0 lvm  
  │   └─rhvh_rhvh-var_crash                                                         253:11   0   10G  0 lvm  /var/crash
  ├─rhvh_rhvh-pool00_tdata                                                          253:1    0 61.9G  0 lvm  
  │ └─rhvh_rhvh-pool00-tpool                                                        253:2    0 61.9G  0 lvm  
  │   ├─rhvh_rhvh-home                                                              253:3    0    1G  0 lvm  /home
  │   ├─rhvh_rhvh-tmp                                                               253:5    0    1G  0 lvm  /tmp
  │   ├─rhvh_rhvh-rhvh--4.3.8.1--0.20200126.0+1                                     253:6    0 34.9G  0 lvm  /
  │   ├─rhvh_rhvh-var                                                               253:7    0   15G  0 lvm  /var
  │   ├─rhvh_rhvh-var_log                                                           253:8    0    8G  0 lvm  /var/log
  │   ├─rhvh_rhvh-var_log_audit                                                     253:9    0    2G  0 lvm  /var/log/audit
  │   ├─rhvh_rhvh-pool00                                                            253:10   0 61.9G  0 lvm  
  │   └─rhvh_rhvh-var_crash                                                         253:11   0   10G  0 lvm  /var/crash
  └─rhvh_rhvh-swap                                                                  253:4    0    2G  0 lvm  [SWAP]
sr0                                                                                  11:0    1 1024M  0 rom  
vda                                                                                 252:0    0   10G  0 disk 
└─vda1                                                                              252:1    0    1G  0 part /boot

Comment 1 Germano Veit Michel 2020-02-21 05:06:27 UTC
Created attachment 1664611 [details]
grub.cfg at step 1

Comment 2 Germano Veit Michel 2020-02-21 05:06:51 UTC
Created attachment 1664612 [details]
grub.cfg at step 3

Comment 3 Qin Yuan 2020-02-25 07:23:29 UTC
This issue can be reproduced following the steps in #c0.

It also can be reproduced by adding a rd.lvm.lv to /etc/default/grub, see steps:

1. install rhvh-4.3.8.1-0.20200126.0, use auto partitioning
2. check lvs

[root@dell-per740-28 ~]# lvs
  LV                          VG   Attr       LSize    Pool   Origin                    Data%  Meta%  Move Log Cpy%Sync Convert
  home                        rhvh Vwi-aotz--    1.00g pool00                           4.79                                   
  pool00                      rhvh twi-aotz-- <787.96g                                  1.98   2.70                            
  rhvh-4.3.8.1-0.20200126.0   rhvh Vwi---tz-k <760.96g pool00 root                                                             
  rhvh-4.3.8.1-0.20200126.0+1 rhvh Vwi-aotz-- <760.96g pool00 rhvh-4.3.8.1-0.20200126.0 1.87                                   
  root                        rhvh Vri---tz-k <760.96g pool00                                                                  
  swap                        rhvh -wi-ao----    4.00g                                                                         
  tmp                         rhvh Vwi-aotz--    1.00g pool00                           4.84                                   
  var                         rhvh Vwi-aotz--   15.00g pool00                           3.42                                   
  var_crash                   rhvh Vwi-aotz--   10.00g pool00                           2.86                                   
  var_log                     rhvh Vwi-aotz--    8.00g pool00                           3.30                                   
  var_log_audit               rhvh Vwi-aotz--    2.00g pool00                           4.78 

3. check rd.lvm.lv in /boot/grub2/grub.cfg

[root@dell-per740-28 ~]# egrep -o "rd.lvm.lv=[.0-9a-z_/+-]+" /boot/grub2/grub.cfg
rd.lvm.lv=rhvh/swap
rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1

4. check /etc/default/grub

[root@dell-per740-28 ~]# cat /etc/default/grub 
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX='crashkernel=auto spectre_v2=retpoline rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1 rd.lvm.lv=rhvh/swap rhgb quiet'
GRUB_DISABLE_RECOVERY="true"

5. Add "rd.lvm.lv=rhvh/var" to GRUB_CMDLINE_LINUX in /etc/default/grub, make sure it's the first rd.lvm.lv in GRUB_CMDLINE_LINUX

GRUB_CMDLINE_LINUX='crashkernel=auto spectre_v2=retpoline rd.lvm.lv=rhvh/var rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1 rd.lvm.lv=rhvh/swap rhgb quiet'

6. run grub2-mkconfig

[root@dell-per740-28 ~]# grub2-mkconfig | egrep -o "rd.lvm.lv=[.0-9a-z_/+-]+"
Generating grub configuration file ...
Found linux image: /boot/rhvh-4.3.8.1-0.20200126.0+1//vmlinuz-3.10.0-1062.12.1.el7.x86_64
Found initrd image: /boot/rhvh-4.3.8.1-0.20200126.0+1/initramfs-3.10.0-1062.12.1.el7.x86_64.img
rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1
rd.lvm.lv=rhvh/swap
rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1
done

As you can see, there is no rd.lvm.lv=rhvh/var, but two rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1

Besides, in my tests, I found if rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1 is the first rd.lvm.lv in GRUB_CMDLINE_LINUX, grub2-mkconfig could generate correct result:

[root@dell-per740-28 ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX='crashkernel=auto spectre_v2=retpoline rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1 rd.lvm.lv=rhvh/var rd.lvm.lv=rhvh/swap rhgb quiet'
GRUB_DISABLE_RECOVERY="true"

[root@dell-per740-28 ~]# grub2-mkconfig | egrep -o "rd.lvm.lv=[.0-9a-z_/+-]+"
Generating grub configuration file ...
Found linux image: /boot/rhvh-4.3.8.1-0.20200126.0+1//vmlinuz-3.10.0-1062.12.1.el7.x86_64
Found initrd image: /boot/rhvh-4.3.8.1-0.20200126.0+1/initramfs-3.10.0-1062.12.1.el7.x86_64.img
rd.lvm.lv=rhvh/var
rd.lvm.lv=rhvh/swap
rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1
done

Comment 7 Qin Yuan 2020-02-26 08:25:30 UTC
I tested several rhvh versions, the same results. So seems this is not a regression bug.

Besides, as I mentioned in c#3, this bug can be easily reproduced by put a rd.lvm.lv before rd.lvm.lv=rhvh/rhvh-4.3.8.1-0.20200126.0+1, not specific to iSCSI machine.

Comment 8 Sandro Bonazzola 2020-03-04 08:55:29 UTC
Moving to 4.4.2 not being a regression and not critical for 4.4 GA.
We need to discuss about this with grub package maintainer, looks like a grub issue.

Comment 9 Yuval Turgeman 2020-04-06 07:34:35 UTC
Qin, does this reproduce on 4.4 ?

Comment 10 Qin Yuan 2020-04-07 06:18:04 UTC
This is not reproducible on the latest rhvh 4.4, rhvh-4.4.0.16-0.20200401.0.

Comment 11 Sandro Bonazzola 2020-08-11 07:44:04 UTC
Moving to QA according to comment #10

Comment 15 cshao 2020-08-30 10:44:48 UTC
verify this bug according #c10.

Comment 17 errata-xmlrpc 2020-10-05 13:09:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4172