Description of problem: [RHVH 4.4] sometimes when defining 2 sizes of huge pages the parameters order changed and all memory occupied by the huge pages. Version-Release number of selected component (if applicable): rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64) How reproducible: 50% sometimes it's reproduced because the args ordered randomly and sometimes it's ordered as expected :) as you can see the example below in "Additional Info:" Steps to Reproduce: 1. reprovision the host with rhvh-4.4. 2. in our case we defined in the provisioning the following huge pages params: default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 Actual results: in the grub file (/etc/default/grub) you can see in the right order: GRUB_CMDLINE_LINUX='nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 crashkernel=auto resume=/dev/mapper/rhvh_lynx02-swap rd.lvm.lv=rhvh_lynx02/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_lynx02/swap console=ttyS1,115200' but in the rhvh args as you can see when running "nodectl info": bootloader: default: rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64) entries: rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64): index: 0 kernel: /boot//rhvh-4.4.0.18-0.20200417.0+1/vmlinuz-4.18.0-193.el8.x86_64 args: rd.lvm.lv=rhvh_lynx02/rhvh-4.4.0.18-0.20200417.0+1 nofb hugepagesz=1G intel_iommu=on hugepages=1024 boot=UUID=a8114db4-8745-4a68-8871-2299dd5eaba2 crashkernel=auto hugepages=4 console=tty0 default_hugepagesz=1G console=ttyS1,115200 rootflags=discard rd.lvm.lv=rhvh_lynx02/swap resume=/dev/mapper/rhvh_lynx02-swap hugepagesz=2M quiet img.bootid=rhvh-4.4.0.18-0.20200417.0+1 root: /dev/rhvh_lynx02/rhvh-4.4.0.18-0.20200417.0+1 initrd: /boot//rhvh-4.4.0.18-0.20200417.0+1/initramfs-4.18.0-193.el8.x86_64.img title: rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64) blsid: rhvh-4.4.0.18-0.20200417.0+1-4.18.0-193.el8.x86_64 layers: rhvh-4.4.0.18-0.20200417.0: rhvh-4.4.0.18-0.20200417.0+1 current_layer: rhvh-4.4.0.18-0.20200417.0+1 in args you can see we have "...hugepagesz=1G ... hugepages=1024..." >>> which lead to trying to allocate 1024 hugepages with size 1G and we can see the following error in journalctl: Apr 22 19:04:25 lynx02.lab.eng.tlv2.redhat.com kernel: HugeTLB: allocating 1024 of page size 1.00 GiB failed. Only allocated 28 hugepages. >>> which lead to occupied all memory by the huge pages as you can see: free -h total used free shared buff/cache available Mem: 31Gi 29Gi 131Mi 6.0Mi 1.9Gi 1.8Gi Swap: 15Gi 4.0Mi 15Gi Expected results: to keep the boot loader order, to prevent these mistakes. Additional info: its 50% reproduced because the args ordered randomly. so sometimes it's ordered as expected :) for example from other machine: args: hugepagesz=1G rootflags=discard console=tty0 resume=/dev/mapper/rhvh_lynx15-swap rd.lvm.lv=rhvh_lynx15/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_lynx15/swap default_hugepagesz=1G hugepages=4 nofb hugepages=1024 crashkernel=auto intel_iommu=on console=ttyS1,115200 boot=UUID=c8e6e403-b288-4c8c-b2f2-a5513fa8c561 quiet hugepagesz=2M img.bootid=rhvh-4.4.0.18-0.20200417.0+1 null
Did some tests and investigation, the issue should be related to rhvh imgbased: 1. This issue can't be reproduced on rhel 8.2 2. For rhvh 4.4, the kernel cmdline options are in disorder, because the options in /boot/loader/entries/rhvh-*.conf are in disorder, this conf file is generated by imgbased using grubby cmd: grubby --copy-default --add-kernel $linux --initrd $initrd --args $args --title $title --bls-directory $dir The $args is returned by the following func: def _get_cmdline(self): defgrub = utils.ShellVarFile("%s/etc/default/grub" % self._root) cmdline = defgrub.get("GRUB_CMDLINE_LINUX", "").strip('"').split() args = "rd.lvm.lv={0} root=/dev/{0}".format(self._lv.lvm_name).split() boot_uuid = utils.findmnt(["UUID"], path="/boot") if boot_uuid: args.append("boot=UUID={}".format(boot_uuid)) args.append("rootflags=discard") return " ".join(list(set(cmdline).union(set(args)))) As you can see, in the func, cmdline is gotten from GRUB_CMDLINE_LINUX in /etc/default/grub, but the elements' order of cmdline is ruined by converting it to a set in the return sentence. One way to keep the sequence while removing duplicate is: cmdline.extend(args) return " ".join(sorted(list(set(cmdline)), key=cmdline.index)) Tested the above solution, it worked: [root@ati-fc-01 ~]# cat /etc/default/grub GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX='nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 console=ttyS1,115200 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap' GRUB_DISABLE_RECOVERY="true" GRUB_ENABLE_BLSCFG=true GRUB_DISABLE_OS_PROBER='true' [root@ati-fc-01 ~]# cat /boot/loader/entries/rhvh-4.4.0.18-0.20200417.0+1-4.18.0-193.el8.x86_64.conf title rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64) version 4.18.0-193.el8.x86_64 linux //rhvh-4.4.0.18-0.20200417.0+1/vmlinuz-4.18.0-193.el8.x86_64 initrd //rhvh-4.4.0.18-0.20200417.0+1/initramfs-4.18.0-193.el8.x86_64.img options nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 console=ttyS1,115200 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap root=/dev/rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 boot=UUID=49b34a76-23dd-4e08-8847-51745e789de0 rootflags=discard img.bootid=rhvh-4.4.0.18-0.20200417.0+1 id rhel-20200424101126-4.18.0-193.el8.x86_64 grub_users $grub_users grub_arg --unrestricted grub_class kernel [root@ati-fc-01 ~]# cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)//rhvh-4.4.0.18-0.20200417.0+1/vmlinuz-4.18.0-193.el8.x86_64 nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 console=ttyS1,115200 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap root=/dev/rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 boot=UUID=49b34a76-23dd-4e08-8847-51745e789de0 rootflags=discard img.bootid=rhvh-4.4.0.18-0.20200417.0+1 As you can see, the elements' order on options line in /boot/loader/entries/rhvh-4.4.0.18-0.20200417.0+1-4.18.0-193.el8.x86_64.conf is the same as in /etc/default/grub. And hugepage worked as expected: [root@ati-fc-01 ~]# cat /sys/kernel/mm/hugepages/hugepages-{1048576kB,2048kB}/nr_hugepages 4 1024
patch merged, is that all that is needed? if so please move to MODIFIED
(In reply to Michal Skrivanek from comment #12) > patch merged, is that all that is needed? if so please move to MODIFIED There was no imgbased build due to CI issue, I am on that.
Verified with: RHVH-4.4-20200611.0-RHVH-x86_64-dvd1.iso Steps: 1. Install RHVH with ks file which defines bootloader --location=mbr --append="default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024" 2. Check hugepage options order in /proc/cmdline Results: 1. The hugepage options order in /proc/cmdline is the same as the order in ks file [root@ati-fc-01 ~]# cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)//rhvh-4.4.1.1-0.20200611.0+1/vmlinuz-4.18.0-193.8.1.el8_2.x86_64 default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.1.1-0.20200611.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap root=/dev/rhvh_ati-fc-01/rhvh-4.4.1.1-0.20200611.0+1 boot=UUID=1ca65438-eec1-497e-b8cc-a87bde0490e8 rootflags=discard img.bootid=rhvh-4.4.1.1-0.20200611.0+1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV Host (redhat-virtualization-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3316