Bug 1827232 - [RHVH 4.4] sometimes when defining 2 sizes of huge pages the parameters order changed and all memory occupied by the huge pages.
Summary: [RHVH 4.4] sometimes when defining 2 sizes of huge pages the parameters order...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: imgbased
Version: 4.4.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.4.1
: 4.4.1
Assignee: Nir Levy
QA Contact: Qin Yuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-23 13:29 UTC by Kobi Hakimi
Modified: 2020-08-04 16:22 UTC (History)
13 users (show)

Fixed In Version: imgbased-1.2.10
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-04 16:22:45 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:3316 0 None None None 2020-08-04 16:22:58 UTC
oVirt gerrit 108625 0 master MERGED bootsetup: keep cmdline arguments order 2020-08-04 02:56:31 UTC

Description Kobi Hakimi 2020-04-23 13:29:58 UTC
Description of problem:
[RHVH 4.4] sometimes when defining 2 sizes of huge pages the parameters order changed and all memory occupied by the huge pages.


Version-Release number of selected component (if applicable):
rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64)

How reproducible:
50%
sometimes it's reproduced because the args ordered randomly and sometimes it's ordered as expected :)
as you can see the example below in "Additional Info:"

Steps to Reproduce:
1. reprovision the host with rhvh-4.4.
2. in our case we defined in the provisioning the following huge pages params: 
default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 

Actual results:
in the grub file (/etc/default/grub) you can see in the right order:
GRUB_CMDLINE_LINUX='nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 crashkernel=auto resume=/dev/mapper/rhvh_lynx02-swap rd.lvm.lv=rhvh_lynx02/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_lynx02/swap console=ttyS1,115200'

but in the rhvh args as you can see when running "nodectl info":
bootloader:
  default: rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64)
  entries:
    rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64):
      index: 0
      kernel: /boot//rhvh-4.4.0.18-0.20200417.0+1/vmlinuz-4.18.0-193.el8.x86_64
      args: rd.lvm.lv=rhvh_lynx02/rhvh-4.4.0.18-0.20200417.0+1 nofb hugepagesz=1G intel_iommu=on hugepages=1024 boot=UUID=a8114db4-8745-4a68-8871-2299dd5eaba2 crashkernel=auto hugepages=4 console=tty0 default_hugepagesz=1G console=ttyS1,115200 rootflags=discard rd.lvm.lv=rhvh_lynx02/swap resume=/dev/mapper/rhvh_lynx02-swap hugepagesz=2M quiet img.bootid=rhvh-4.4.0.18-0.20200417.0+1
      root: /dev/rhvh_lynx02/rhvh-4.4.0.18-0.20200417.0+1
      initrd: /boot//rhvh-4.4.0.18-0.20200417.0+1/initramfs-4.18.0-193.el8.x86_64.img
      title: rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64)
      blsid: rhvh-4.4.0.18-0.20200417.0+1-4.18.0-193.el8.x86_64
layers:
  rhvh-4.4.0.18-0.20200417.0:
    rhvh-4.4.0.18-0.20200417.0+1
current_layer: rhvh-4.4.0.18-0.20200417.0+1

in args you can see we have "...hugepagesz=1G ... hugepages=1024..."

>>> which lead to trying to allocate 1024 hugepages with size 1G and we can see the following error in journalctl:
Apr 22 19:04:25 lynx02.lab.eng.tlv2.redhat.com kernel: HugeTLB: allocating 1024 of page size 1.00 GiB failed.  Only allocated 28 hugepages.

>>> which lead to occupied all memory by the huge pages as you can see:
free -h
              total        used        free      shared  buff/cache   available
Mem:           31Gi        29Gi       131Mi       6.0Mi       1.9Gi       1.8Gi
Swap:          15Gi       4.0Mi        15Gi

Expected results:
to keep the boot loader order, to prevent these mistakes.

Additional info:
its 50% reproduced because the args ordered randomly.
so sometimes it's ordered as expected :)
for example from other machine:
      args: hugepagesz=1G rootflags=discard console=tty0 resume=/dev/mapper/rhvh_lynx15-swap rd.lvm.lv=rhvh_lynx15/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_lynx15/swap default_hugepagesz=1G hugepages=4 nofb hugepages=1024 crashkernel=auto intel_iommu=on console=ttyS1,115200 boot=UUID=c8e6e403-b288-4c8c-b2f2-a5513fa8c561 quiet hugepagesz=2M img.bootid=rhvh-4.4.0.18-0.20200417.0+1 null

Comment 1 Qin Yuan 2020-04-24 10:37:44 UTC
Did some tests and investigation, the issue should be related to rhvh imgbased:

1. This issue can't be reproduced on rhel 8.2

2. For rhvh 4.4, the kernel cmdline options are in disorder, because the options in /boot/loader/entries/rhvh-*.conf are in disorder, this conf file is generated by imgbased using grubby cmd:
    grubby --copy-default --add-kernel $linux --initrd $initrd --args $args --title $title --bls-directory $dir

The $args is returned by the following func:
    def _get_cmdline(self):
        defgrub = utils.ShellVarFile("%s/etc/default/grub" % self._root)
        cmdline = defgrub.get("GRUB_CMDLINE_LINUX", "").strip('"').split()
        args = "rd.lvm.lv={0} root=/dev/{0}".format(self._lv.lvm_name).split()
        boot_uuid = utils.findmnt(["UUID"], path="/boot")
        if boot_uuid:
            args.append("boot=UUID={}".format(boot_uuid))
        args.append("rootflags=discard")
        return " ".join(list(set(cmdline).union(set(args))))

As you can see, in the func, cmdline is gotten from GRUB_CMDLINE_LINUX in /etc/default/grub, but the elements' order of cmdline is ruined by converting it to a set in the return sentence.

One way to keep the sequence while removing duplicate is:
        cmdline.extend(args)
        return " ".join(sorted(list(set(cmdline)), key=cmdline.index))

Tested the above solution, it worked:

[root@ati-fc-01 ~]# cat /etc/default/grub 
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX='nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 console=ttyS1,115200 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap'
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
GRUB_DISABLE_OS_PROBER='true'

[root@ati-fc-01 ~]# cat /boot/loader/entries/rhvh-4.4.0.18-0.20200417.0+1-4.18.0-193.el8.x86_64.conf 
title rhvh-4.4.0.18-0.20200417.0 (4.18.0-193.el8.x86_64)
version 4.18.0-193.el8.x86_64
linux //rhvh-4.4.0.18-0.20200417.0+1/vmlinuz-4.18.0-193.el8.x86_64
initrd //rhvh-4.4.0.18-0.20200417.0+1/initramfs-4.18.0-193.el8.x86_64.img
options nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 console=ttyS1,115200 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap root=/dev/rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 boot=UUID=49b34a76-23dd-4e08-8847-51745e789de0 rootflags=discard img.bootid=rhvh-4.4.0.18-0.20200417.0+1
id rhel-20200424101126-4.18.0-193.el8.x86_64
grub_users $grub_users
grub_arg --unrestricted
grub_class kernel

[root@ati-fc-01 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)//rhvh-4.4.0.18-0.20200417.0+1/vmlinuz-4.18.0-193.el8.x86_64 nofb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 console=ttyS1,115200 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap root=/dev/rhvh_ati-fc-01/rhvh-4.4.0.18-0.20200417.0+1 boot=UUID=49b34a76-23dd-4e08-8847-51745e789de0 rootflags=discard img.bootid=rhvh-4.4.0.18-0.20200417.0+1

As you can see, the elements' order on options line in /boot/loader/entries/rhvh-4.4.0.18-0.20200417.0+1-4.18.0-193.el8.x86_64.conf is the same as in /etc/default/grub.

And hugepage worked as expected:
[root@ati-fc-01 ~]# cat  /sys/kernel/mm/hugepages/hugepages-{1048576kB,2048kB}/nr_hugepages
4
1024

Comment 12 Michal Skrivanek 2020-06-05 15:28:32 UTC
patch merged, is that all that is needed? if so please move to MODIFIED

Comment 13 Nir Levy 2020-06-05 15:46:57 UTC
(In reply to Michal Skrivanek from comment #12)
> patch merged, is that all that is needed? if so please move to MODIFIED

There was no imgbased build due to CI issue, I am on that.

Comment 18 Qin Yuan 2020-06-17 13:36:42 UTC
Verified with:
RHVH-4.4-20200611.0-RHVH-x86_64-dvd1.iso

Steps:
1. Install RHVH with ks file which defines bootloader --location=mbr --append="default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024"
2. Check hugepage options order in /proc/cmdline

Results:
1. The hugepage options order in /proc/cmdline is the same as the order in ks file

[root@ati-fc-01 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)//rhvh-4.4.1.1-0.20200611.0+1/vmlinuz-4.18.0-193.8.1.el8_2.x86_64 default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 crashkernel=200M resume=/dev/mapper/rhvh_ati--fc--01-swap rd.lvm.lv=rhvh_ati-fc-01/rhvh-4.4.1.1-0.20200611.0+1 rd.lvm.lv=rhvh_ati-fc-01/swap root=/dev/rhvh_ati-fc-01/rhvh-4.4.1.1-0.20200611.0+1 boot=UUID=1ca65438-eec1-497e-b8cc-a87bde0490e8 rootflags=discard img.bootid=rhvh-4.4.1.1-0.20200611.0+1

Comment 20 errata-xmlrpc 2020-08-04 16:22:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Host (redhat-virtualization-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3316


Note You need to log in before you can comment on or make changes to this bug.