Bug 1591235
Summary: | virt-install/virsh reports 'node 0/1 not found' error when specify nodeset in memorybacking | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Sitong Liu <siliu> | ||||
Component: | libvirt | Assignee: | Pavel Hrdina <phrdina> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jing Qi <jinqi> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.6 | CC: | berrange, chayang, juzhang, lcapitulino, lhuang, lmen, pezhang, phrdina, siliu, xuzhang, yalzhang | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-4.5.0-7.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1615461 (view as bug list) | Environment: | |||||
Last Closed: | 2018-10-30 09:56:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1615461 | ||||||
Attachments: |
|
Description
Sitong Liu
2018-06-14 11:02:41 UTC
Created attachment 1452313 [details]
VM XML
KVM-RT testing also hit this problem:
# virsh start rhel7.6_rt_8vcpu
error: Failed to start domain rhel7.6_rt_8vcpu
error: hugepages: node 0 not found
(In reply to Pei Zhang from comment #2) > Created attachment 1452313 [details] > VM XML > > KVM-RT testing also hit this problem: > > # virsh start rhel7.6_rt_8vcpu > error: Failed to start domain rhel7.6_rt_8vcpu > error: hugepages: node 0 not found Versions: 3.10.0-904.rt56.850.el7.x86_64 libvirt-4.4.0-2.el7.x86_64 tuned-2.9.0-1.el7.noarch qemu-kvm-rhev-2.12.0-3.el7.x86_64 This should be a regression bug. Since libvirt-3.9.0-14.el7.x86_64 works well. Update the title accordingly. The issue doesn't happen in libvirt-3.9.0-14.el7_5.6.x86_64. And in libvirt-4.4.0-2.el7.x86_64, if we add xml like below between <cpu mode='host-passthrough' check='none'>...</cpu> , the domain can be started without error. <numa> <cell id='0' cpus='1' memory='976563' unit='KiB' memAccess='shared'/> ... </numa> Pavel, Do you have an update on this issue? This issue seems very impactful or VMs using hugepages. Have you guys tried dropping nodeset= from the 'page' configuration? For example: <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> <locked/> </memoryBacking> I'm under the impression that nodeset in 'page' specifies the NUMA node in the *guest*, not in the host ('memtune' specifies the NUMA node in the host). And I'd guess that the guests that are not starting anymore are not NUMA guests. So, libvirt-4.4.0 now fails to start a guest if nodeset is set in 'page' and if the guest is not configured to be a NUMA guest. Previous versions worked just fine, it was probably ignoring nodeset=. If this is correct, then this looks like a serious regression since it will brake working guests (note: I've seen OpenStack setting nodeset= for hugepages too, so I can foresee a huge breakage in 7.6). Recommendation: revert this change and go back ignoring nodeset for non-NUMA guests. Hi Luiz, We have tried dropping nodeset= from the 'page' configuration, command is --memorybacking hugepages=yes,size=2,unit=M,locked=yes Result: No error reported and the guest can be successfully installed. Best regards, Sitong So this was introduced as fix for different BZ 1534418. I need to investigate libvirt code whether there can be another way how to fix it instead of reverting the change. Your impression is correct, it specifies the numa node inside the guest. One possible fix could be to consider having always one numa node if numa is not specified which would make nodeset=0 working. (In reply to Pavel Hrdina from comment #10) > Your impression is correct, it specifies the numa node inside the guest. > One possible fix could be to consider having always one numa node if numa is > not specified which would make nodeset=0 working. Unfortunately this is not a solution. This issue is a XML regression, having nodeset= has always worked and OSP is using it. Libvirt has to skip checking nodeset= when the guest is UMA (not NUMA). (In reply to Luiz Capitulino from comment #11) > (In reply to Pavel Hrdina from comment #10) > > > Your impression is correct, it specifies the numa node inside the guest. > > One possible fix could be to consider having always one numa node if numa is > > not specified which would make nodeset=0 working. > > Unfortunately this is not a solution. This issue is a XML regression, having > nodeset= has always worked and OSP is using it. Libvirt has to skip checking > nodeset= when the guest is UMA (not NUMA). We shouldn't skip checking it - this is important validation to catch application configuration mistakes. I think we could make an exception here if the guest does *not* have NUMA specified and nodeset=XXX attribute has the value 0, then we can trivially allow it. If nodeset has any non-zero value, however, this is a clear application mistake - likely a sign that they've confused host & guest nodes, so definitely should raise an error. So, let's separate what the regression is and what we might want to actually fix. Since any nodeset= value used to work, the regression does include any nodeset= value. I, for one, do have xmls with nodeset=1 and the bug certainly reproduces for me. Now, if we can guarantee that OSP has nodeset=0 hardcoded someway and if libvirt only cares about OSP then I'd find fixing only for nodeset=0 acceptable (although I'd vote for entirely fixing the regression, since nodeset= is irrelevant if the guest is UMA). Finally, I think this is all 50% my fault. I did confuse host and guest nodes for this setting and only learned my mistake when this BZ was filled. However, the damage was already done since this is documented in KVM-RT docs. The other 50% is libvirt not enforcing this before and the xml documentation which is certainly not clear enough (and should be fixed too). Upstream patches posted: https://www.redhat.com/archives/libvir-list/2018-July/msg00667.html Thanks a lot Pavel, it's awesome to see this moving forward! Can you just confirm that it's the nodeset=0 case that we're fixing? Upstream commit: commit 0a476f152150f62306f9f8d124aa44e4adb9158c Author: Pavel Hrdina <phrdina> Date: Wed Aug 8 17:03:40 2018 +0200 conf: Introduce virDomainDefPostParseMemtune Hi Luiz, yes, it's the 'nodeset=0' case that we are fixing. Verified with libvirt-4.5.0-7.virtcov.el7.x86_64 & qemu-kvm-rhev-2.12.0-7.el7.x86_64. The domain can be created successfully. virt-install --os-variant=rhel7 --name=rhel7.6_nonrt1 --memory=1536,hugepages=yes --memorybacking hugepages=yes,size=2,unit=M,nodeset=0,locked=yes --numatune=0 --cpus=6,cpuset=0,1,2,3,4,5,6 --disk path=/var/lib/libvirt/images/test1.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 -l http://download.eng.pek2.redhat.com/pub/rhel/nightly/RHEL-7.6-20180911.n.1/compose/Server/x86_64/os/ -x ks=http://10.66.9.128/kickstart-rhel7.cfg Starting install... Retrieving file vmlinuz... | 6.3 MB 00:00:00 Retrieving file initrd.img... | 52 MB 00:00:00 Allocating 'test1.qcow2' | 20 GB 00:00:04 ...... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |