Description of problem:
As subject, if we didn't set nodeset in memorybacking, no error reported.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Set hugepage=1G for numa node 0 & 1.
echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
echo 10 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
2. Install a rhel 7.6 guest with virt-install, no matter which nodeset we set, '0' or '1', installation failed.
virt-install --os-variant=rhel7 --name=rhel7.6_nonrt --memory=8192,hugepages=yes --memorybacking hugepages=yes,size=2,unit=M,nodeset=0,locked=yes --numatune=0 --vcpus=6,cpuset=30,31,29,27,25,23 --disk path=/home/images_nfv-virt-rt-kvm/rhel7.6_nonrt.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 -l http://download.eng.pek2.redhat.com/pub/rhel/nightly/RHEL-7.6-20180612.n.0/compose/Server/x86_64/os/ -x ks=http://10.66.9.128/kickstart-rhel7.cfg --network bridge=switch,model=virtio,mac=28:66:da:5f:dd:31
3. Install a rhel 7.6 guest with virt-install, and without setting nodeset, '--memorybacking hugepages=yes,size=2,unit=M,locked=yes', installation succeeded.
Result after step 2.
Retrieving file vmlinuz... | 6.2 MB 00:00:00
Retrieving file initrd.img... | 52 MB 00:00:00
ERROR hugepages: node 0 not found
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
virsh --connect qemu:///system start rhel7.6_nonrt
otherwise, please restart your installation.
No error reported.
Some settings in our host.
(1)# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-903.el7.x86_64 root=/dev/mapper/rhel_dell--per730--27-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per730-27/root rd.lvm.lv=rhel_dell-per730-27/swap console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on skew_tick=1 nohz=on nohz_full=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,30,28,26,24,22,20,18,16 rcu_nocbs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,30,28,26,24,22,20,18,16 tuned.non_isolcpus=00005555 intel_pstate=disable nosoftlockup
(2)# numactl --show
preferred node: current
physcpubind: 0 2 4 6 8 10 12 14
membind: 0 1
(3)# cat /proc/meminfo |grep Huge
AnonHugePages: 4096 kB
Hugepagesize: 1048576 kB
(4) 'virsh define' command works well when set nodeset.
Created attachment 1452313 [details]
KVM-RT testing also hit this problem:
# virsh start rhel7.6_rt_8vcpu
error: Failed to start domain rhel7.6_rt_8vcpu
error: hugepages: node 0 not found
(In reply to Pei Zhang from comment #2)
> Created attachment 1452313 [details]
> VM XML
> KVM-RT testing also hit this problem:
> # virsh start rhel7.6_rt_8vcpu
> error: Failed to start domain rhel7.6_rt_8vcpu
> error: hugepages: node 0 not found
This should be a regression bug. Since libvirt-3.9.0-14.el7.x86_64 works well.
Update the title accordingly.
The issue doesn't happen in libvirt-3.9.0-14.el7_5.6.x86_64.
And in libvirt-4.4.0-2.el7.x86_64, if we add xml like below between
<cpu mode='host-passthrough' check='none'>...</cpu> ,
the domain can be started without error.
<cell id='0' cpus='1' memory='976563' unit='KiB' memAccess='shared'/>
Do you have an update on this issue? This issue seems very impactful or VMs using hugepages.
Have you guys tried dropping nodeset= from the 'page' configuration? For example:
<page size='1048576' unit='KiB'/>
I'm under the impression that nodeset in 'page' specifies the NUMA node in the *guest*, not in the host ('memtune' specifies the NUMA node in the host). And I'd guess that the guests that are not starting anymore are not NUMA guests.
So, libvirt-4.4.0 now fails to start a guest if nodeset is set in 'page' and if the guest is not configured to be a NUMA guest. Previous versions worked just fine, it was probably ignoring nodeset=.
If this is correct, then this looks like a serious regression since it will brake working guests (note: I've seen OpenStack setting nodeset= for hugepages too, so I can foresee a huge breakage in 7.6).
Recommendation: revert this change and go back ignoring nodeset for non-NUMA guests.
We have tried dropping nodeset= from the 'page' configuration, command is
Result: No error reported and the guest can be successfully installed.
So this was introduced as fix for different BZ 1534418. I need to investigate libvirt code whether there can be another way how to fix it instead of reverting the change.
Your impression is correct, it specifies the numa node inside the guest. One possible fix could be to consider having always one numa node if numa is not specified which would make nodeset=0 working.
(In reply to Pavel Hrdina from comment #10)
> Your impression is correct, it specifies the numa node inside the guest.
> One possible fix could be to consider having always one numa node if numa is
> not specified which would make nodeset=0 working.
Unfortunately this is not a solution. This issue is a XML regression, having nodeset= has always worked and OSP is using it. Libvirt has to skip checking nodeset= when the guest is UMA (not NUMA).
(In reply to Luiz Capitulino from comment #11)
> (In reply to Pavel Hrdina from comment #10)
> > Your impression is correct, it specifies the numa node inside the guest.
> > One possible fix could be to consider having always one numa node if numa is
> > not specified which would make nodeset=0 working.
> Unfortunately this is not a solution. This issue is a XML regression, having
> nodeset= has always worked and OSP is using it. Libvirt has to skip checking
> nodeset= when the guest is UMA (not NUMA).
We shouldn't skip checking it - this is important validation to catch application configuration mistakes. I think we could make an exception here if the guest does *not* have NUMA specified and nodeset=XXX attribute has the value 0, then we can trivially allow it. If nodeset has any non-zero value, however, this is a clear application mistake - likely a sign that they've confused host & guest nodes, so definitely should raise an error.
So, let's separate what the regression is and what we might want to actually fix.
Since any nodeset= value used to work, the regression does include any nodeset=
value. I, for one, do have xmls with nodeset=1 and the bug certainly reproduces for me.
Now, if we can guarantee that OSP has nodeset=0 hardcoded someway and if libvirt only cares about OSP then I'd find fixing only for nodeset=0 acceptable (although I'd vote for entirely fixing the regression, since nodeset= is irrelevant if the guest is UMA).
Finally, I think this is all 50% my fault. I did confuse host and guest nodes for this setting and only learned my mistake when this BZ was filled. However, the damage was already done since this is documented in KVM-RT docs. The other 50% is libvirt not enforcing this before and the xml documentation which is certainly not clear enough (and should be fixed too).
Upstream patches posted:
Thanks a lot Pavel, it's awesome to see this moving forward!
Can you just confirm that it's the nodeset=0 case that we're fixing?
Author: Pavel Hrdina <firstname.lastname@example.org>
Date: Wed Aug 8 17:03:40 2018 +0200
conf: Introduce virDomainDefPostParseMemtune
Hi Luiz, yes, it's the 'nodeset=0' case that we are fixing.
Verified with libvirt-4.5.0-7.virtcov.el7.x86_64 & qemu-kvm-rhev-2.12.0-7.el7.x86_64. The domain can be created successfully.
virt-install --os-variant=rhel7 --name=rhel7.6_nonrt1 --memory=1536,hugepages=yes --memorybacking hugepages=yes,size=2,unit=M,nodeset=0,locked=yes --numatune=0 --cpus=6,cpuset=0,1,2,3,4,5,6 --disk path=/var/lib/libvirt/images/test1.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 -l http://download.eng.pek2.redhat.com/pub/rhel/nightly/RHEL-7.6-20180911.n.1/compose/Server/x86_64/os/ -x ks=http://10.66.9.128/kickstart-rhel7.cfg
Retrieving file vmlinuz... | 6.3 MB 00:00:00
Retrieving file initrd.img... | 52 MB 00:00:00
Allocating 'test1.qcow2' | 20 GB 00:00:04
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.