Bug 2156289
Summary: | Guest doesn't fail to start directly due to an unavailable configuration | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Hu Shuai (Fujitsu) <hshuai> | ||||||
Component: | libvirt | Assignee: | Andrea Bolognani <abologna> | ||||||
libvirt sub component: | General | QA Contact: | liang cong <lcong> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | low | ||||||||
Priority: | medium | CC: | abologna, alexander.lougovski, eric.auger, jdenemar, jsuchane, jtomko, lcong, lijin, lmen, mprivozn, virt-maint, yalzhang | ||||||
Version: | 9.1 | Keywords: | AutomationTriaged, Reopened, Triaged | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-9.0.0-1.el9 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2023-05-09 07:27:43 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | 9.0.0 | ||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 1934490 [details]
libvirtd log
(In reply to Hu Shuai (Fujitsu) from comment #0) > Created attachment 1934489 [details] > guest xml > > Description of problem: > Guest doesn't fail to start during the configuration check when the > memory_mode is "strict" and the memory_nodeset for numatune is unavailable. > > Version-Release number of selected component (if applicable): > libvirt-8.0.0-12.module+el8.8.0+17545+95582d4e.aarch64 > > How reproducible: > 100% > > Steps to Reproduce: > 1. prepare a guest xml like the attachment(host just have 2 numa nodes) > ``` > <numatune><memory mode="strict" nodeset="200-300" placement="static" > /></numatune> > ``` > 2. virsh define avocado-vt-vm1.xml && virsh start avocado-vt-vm1 > > Actual results: > ``` > # virsh start avocado-vt-vm1 > error: Failed to start domain 'avocado-vt-vm1' > error: Unable to write to > '/sys/fs/cgroup/cpuset/machine.slice/machine- > qemu\x2d1\x2davocado\x2dvt\x2dvm1.scope/libvirt/emulator/cpuset.mems': > Numerical result out of range > ``` > > Expected results: > ``` > # virsh start avocado-vt-vm1 > error: Failed to start domain 'avocado-vt-vm1' > error: internal error: Process exited prior to exec: libvirt: error : > unsupported configuration: NUMA node 200 is unavailable > ``` > > Additional info: > If the memory_mode is 'interleave', 'preferred', or 'restrictive', it fails > with the expected result. I have verified this behavior with upstream libvirt 8.0.0. It's not limited to aarch64: it can be reproduced on x86_64 as well. Can you please confirm whether you think this is a regression compared to a previous version of RHEL? From your report, and also based on my own observations, it looks like, regardless of the exact failure, the VM will not be able to start when configured to pin memory to non-existent NUMA nodes. So in that sense this is more of a cosmetic issue than a functional one. Does that sound like a fair assessment? Anyway, I'm digging and looking for a solution. Upstream libvirt already behaves properly in the mode=strict case thanks to commit a6929d62cf5ca6bef076876f3354375f3a719df0 Author: Michal Prívozník <mprivozn> Date: Tue Feb 22 09:02:17 2022 +0100 qemu: Don't ignore failure when building default memory backend When building the default memory backend (which has id='pc.ram') and no guest NUMA is configured then qemuBuildMemCommandLineMemoryDefaultBackend() is called. However, its return value is ignored which means that on invalid configuration (e.g. when non-existent hugepage size was requested) an error is reported into the logs but QEMU is started anyway. And while QEMU does error out its error message doesn't give much clue what's going on: qemu-system-x86_64: Memory backend 'pc.ram' not found While at it, introduce a test case. While I could chose a nice looking value (e.g. 4MiB) that's exactly what I wanted to avoid, because while such value might not be possible on x84_64 it may be possible on other arches (e.g. ppc is notoriously known for supporting wide range of HP sizes). Let's stick with obviously wrong value of 5MiB. Reported-by: Charles Polisher <chas> Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> https://gitlab.com/libvirt/libvirt/-/commit/a6929d62cf5ca6bef076876f3354375f3a719df0 so we'd need to backport that commit. However, I have noticed that the behavior for mode=restrictive has regressed upstream, and it now presents the same issue reported here. So I'm digging further, to ensure that the behavior is consistent across the board.
>
> From your report, and also based on my own observations, it looks
> like, regardless of the exact failure, the VM will not be able to
> start when configured to pin memory to non-existent NUMA nodes. So in
> that sense this is more of a cosmetic issue than a functional one.
> Does that sound like a fair assessment?
Indeed from the main description it looks like the error message has changed but in both cases it seems we get
error: Failed to start domain 'avocado-vt-vm1'
and it sounds like the guest fails to start.
Patches posted upstream. https://listman.redhat.com/archives/libvir-list/2023-January/236581.html (In reply to Andrea Bolognani from comment #2) > Can you please confirm whether you think this is a regression > compared to a previous version of RHEL? I tested this on RHEL8.6 and got same result. It seems that it's a latent issue. Env: DISTRO: RHEL-8.6.0-20220420.3 kernel-4.18.0-372.9.1.el8.aarch64 libvirt-8.0.0-5.module+el8.6.0+14480+c0a3aa0f.aarch64 qemu-kvm-6.2.0-11.module+el8.6.0+14707+5aa4b42d.aarch64 Hi, please can you confirm that the guest is not started as suggested by the log: error: Failed to start domain 'avocado-vt-vm1' and your concern rather is the confusing error message. Thanks Eric (In reply to Eric Auger from comment #9) > Hi, > > please can you confirm that the guest is not started as suggested by the log: > > error: Failed to start domain 'avocado-vt-vm1' Yes, the guest did not start. > and your concern rather is the confusing error message. Sorry for my inaccurate description, the guest does not start successfully. This is a negative test. I gave the memory_nodeset an unavailable value, so I wish it failed to start during the configuration check due to this unavailable value. (In reply to Hu Shuai (Fujitsu) from comment #10) > (In reply to Eric Auger from comment #9) > > please can you confirm that the guest is not started as suggested by the log: > > > > error: Failed to start domain 'avocado-vt-vm1' > > Yes, the guest did not start. > > > and your concern rather is the confusing error message. > > Sorry for my inaccurate description, the guest does not start successfully. > This is a negative test. I gave the memory_nodeset an unavailable value, so > I wish it > failed to start during the configuration check due to this unavailable value. Based on what you just confirmed, that this is a long-standing issue with no functional impact, I don't think a backport is warranted. The remaining part of the issue will be addressed upstream and will naturally make its way to RHEL 9 through a rebase, but as far as RHEL 8 is concerned I'm inclined to consider it WONTFIX and move on. Does this sound reasonable? (In reply to Andrea Bolognani from comment #11) > Based on what you just confirmed, that this is a long-standing issue > with no functional impact, I don't think a backport is warranted. > > The remaining part of the issue will be addressed upstream and will > naturally make its way to RHEL 9 through a rebase, but as far as RHEL > 8 is concerned I'm inclined to consider it WONTFIX and move on. > > Does this sound reasonable? Yes, it's reasonable. Fix merged upstream. commit e152f0718f70be62fc8773ffeadde29456218680 Author: Andrea Bolognani <abologna> Date: Tue Jan 3 18:46:05 2023 +0100 qemu: Always check nodeset provided to numatune Up until commit 629282d88454, using mode=restrictive caused virNumaSetupMemoryPolicy() to be called from qemuProcessHook(), and that in turn resulted in virNumaNodesetIsAvailable() being called and the nodeset being validated. After that change, the only validation for the nodeset is the one happening in qemuBuildMemoryBackendProps(), which is skipped when using mode=restrictive. Make sure virNumaNodesetIsAvailable() is called whenever a nodeset has been provided by the user, regardless of the mode. https://bugzilla.redhat.com/show_bug.cgi?id=2156289 Signed-off-by: Andrea Bolognani <abologna> Reviewed-by: Michal Privoznik <mprivozn> v8.10.0-215-ge152f0718f As agreed (Comment #11, Comment #12) the fix will NOT be backported to RHEL 8. Accordingly, closing as NEXTRELEASE. (In reply to Andrea Bolognani from comment #13) > As agreed (Comment #11, Comment #12) the fix will NOT be backported > to RHEL 8. Accordingly, closing as NEXTRELEASE. I hadn't noticed that the bug had been moved to RHEL 9. Since it IS going to be fixed there with the next rebase, reopening and moving to POST. Can the qa_ack+ be reset on this so we can get release+ - wasn't clear to me whether hshuai can set qa_ack+... Thanks! 1. on rhel8.8 x86_64 build libvirt-8.0.0-13.module+el8.8.0+17719+f18c2d1b.x86_64 this issue is reproducible, but only for "strict" mode. 2. on rhel9.2 x86_64 build libvirt-8.10.0-2.el9.x86_64 this issue is not reproducible but for "restrictive" mode has similar issue as bug#2137804 (In reply to liang cong from comment #17) > 1. on rhel8.8 x86_64 build > libvirt-8.0.0-13.module+el8.8.0+17719+f18c2d1b.x86_64 > this issue is reproducible, but only for "strict" mode. > 2. on rhel9.2 x86_64 build libvirt-8.10.0-2.el9.x86_64 > this issue is not reproducible but for "restrictive" mode has similar issue > as bug#2137804 The fix for mode=restrictive is in libvirt 9.0.0, which should land in RHEL 9 shortly. Bug 2137804 indeed seems to be about the behavior that I just fixed with the commit mentioned in Comment 13, so I think we should close that bug as a duplicate of this one. Michal, do you agree? (In reply to Andrea Bolognani from comment #18) > Michal, do you agree? I do. *** Bug 2137804 has been marked as a duplicate of this bug. *** Verified on rhel9.2 aarch64 with libvirt-9.0.0-1.el9.aarch64. ``` # virsh start avocado-vt-vm1 error: Failed to start domain 'avocado-vt-vm1' error: unsupported configuration: NUMA node 200 is unavailable ``` Verified on x86_64 build: # rpm -q libvirt qemu-kvm libvirt-9.0.0-2.el9.x86_64 qemu-kvm-7.2.0-5.el9.x86_64 Test steps: 1. Host has 2 numa nodes only. 2. Prepare a guest xml with numatune config like below: ... <numatune> <memory mode="strict" nodeset="0-5" /> </numatune> ... 3. Start the guest virsh define vm1.xml && virsh start vm1 # virsh define vm1.xml && virsh start vm1 Domain 'vm1' defined from vm1.xml error: Failed to start domain 'vm1' error: unsupported configuration: NUMA node 2 is unavailable Mark it verified according to comment 22 and comment 21 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2171 |
Created attachment 1934489 [details] guest xml Description of problem: Guest doesn't fail to start during the configuration check when the memory_mode is "strict" and the memory_nodeset for numatune is unavailable. Version-Release number of selected component (if applicable): libvirt-8.0.0-12.module+el8.8.0+17545+95582d4e.aarch64 How reproducible: 100% Steps to Reproduce: 1. prepare a guest xml like the attachment(host just have 2 numa nodes) ``` <numatune><memory mode="strict" nodeset="200-300" placement="static" /></numatune> ``` 2. virsh define avocado-vt-vm1.xml && virsh start avocado-vt-vm1 Actual results: ``` # virsh start avocado-vt-vm1 error: Failed to start domain 'avocado-vt-vm1' error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d1\x2davocado\x2dvt\x2dvm1.scope/libvirt/emulator/cpuset.mems': Numerical result out of range ``` Expected results: ``` # virsh start avocado-vt-vm1 error: Failed to start domain 'avocado-vt-vm1' error: internal error: Process exited prior to exec: libvirt: error : unsupported configuration: NUMA node 200 is unavailable ``` Additional info: If the memory_mode is 'interleave', 'preferred', or 'restrictive', it fails with the expected result.