RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2156289 - Guest doesn't fail to start directly due to an unavailable configuration
Summary: Guest doesn't fail to start directly due to an unavailable configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.1
Hardware: All
OS: Unspecified
medium
low
Target Milestone: rc
: ---
Assignee: Andrea Bolognani
QA Contact: liang cong
URL:
Whiteboard:
: 2137804 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-26 08:47 UTC by Hu Shuai (Fujitsu)
Modified: 2023-05-09 08:13 UTC (History)
12 users (show)

Fixed In Version: libvirt-9.0.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:27:43 UTC
Type: Bug
Target Upstream Version: 9.0.0
Embargoed:


Attachments (Terms of Use)
guest xml (4.34 KB, application/xml)
2022-12-26 08:47 UTC, Hu Shuai (Fujitsu)
no flags Details
libvirtd log (1.60 MB, text/plain)
2022-12-26 08:48 UTC, Hu Shuai (Fujitsu)
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-143183 0 None None None 2022-12-26 08:49:46 UTC
Red Hat Product Errata RHBA-2023:2171 0 None None None 2023-05-09 07:29:09 UTC

Description Hu Shuai (Fujitsu) 2022-12-26 08:47:27 UTC
Created attachment 1934489 [details]
guest xml

Description of problem:
Guest doesn't fail to start during the configuration check when the memory_mode is "strict" and the memory_nodeset for numatune is unavailable.

Version-Release number of selected component (if applicable):
libvirt-8.0.0-12.module+el8.8.0+17545+95582d4e.aarch64

How reproducible:
100%

Steps to Reproduce:
1. prepare a guest xml like the attachment(host just have 2 numa nodes)
```
<numatune><memory mode="strict" nodeset="200-300" placement="static" /></numatune>
```
2. virsh define avocado-vt-vm1.xml && virsh start avocado-vt-vm1

Actual results:
```
# virsh start avocado-vt-vm1
error: Failed to start domain 'avocado-vt-vm1'
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d1\x2davocado\x2dvt\x2dvm1.scope/libvirt/emulator/cpuset.mems': Numerical result out of range
```

Expected results:
```
# virsh start avocado-vt-vm1
error: Failed to start domain 'avocado-vt-vm1'
error: internal error: Process exited prior to exec: libvirt:  error : unsupported configuration: NUMA node 200 is unavailable
```

Additional info:
If the memory_mode is 'interleave', 'preferred', or 'restrictive', it fails with the expected result.

Comment 1 Hu Shuai (Fujitsu) 2022-12-26 08:48:53 UTC
Created attachment 1934490 [details]
libvirtd log

Comment 2 Andrea Bolognani 2023-01-03 13:32:43 UTC
(In reply to Hu Shuai (Fujitsu) from comment #0)
> Created attachment 1934489 [details]
> guest xml
> 
> Description of problem:
> Guest doesn't fail to start during the configuration check when the
> memory_mode is "strict" and the memory_nodeset for numatune is unavailable.
> 
> Version-Release number of selected component (if applicable):
> libvirt-8.0.0-12.module+el8.8.0+17545+95582d4e.aarch64
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. prepare a guest xml like the attachment(host just have 2 numa nodes)
> ```
> <numatune><memory mode="strict" nodeset="200-300" placement="static"
> /></numatune>
> ```
> 2. virsh define avocado-vt-vm1.xml && virsh start avocado-vt-vm1
> 
> Actual results:
> ```
> # virsh start avocado-vt-vm1
> error: Failed to start domain 'avocado-vt-vm1'
> error: Unable to write to
> '/sys/fs/cgroup/cpuset/machine.slice/machine-
> qemu\x2d1\x2davocado\x2dvt\x2dvm1.scope/libvirt/emulator/cpuset.mems':
> Numerical result out of range
> ```
> 
> Expected results:
> ```
> # virsh start avocado-vt-vm1
> error: Failed to start domain 'avocado-vt-vm1'
> error: internal error: Process exited prior to exec: libvirt:  error :
> unsupported configuration: NUMA node 200 is unavailable
> ```
> 
> Additional info:
> If the memory_mode is 'interleave', 'preferred', or 'restrictive', it fails
> with the expected result.

I have verified this behavior with upstream libvirt 8.0.0. It's not
limited to aarch64: it can be reproduced on x86_64 as well.

Can you please confirm whether you think this is a regression
compared to a previous version of RHEL?

From your report, and also based on my own observations, it looks
like, regardless of the exact failure, the VM will not be able to
start when configured to pin memory to non-existent NUMA nodes. So in
that sense this is more of a cosmetic issue than a functional one.
Does that sound like a fair assessment?

Anyway, I'm digging and looking for a solution.

Comment 3 Andrea Bolognani 2023-01-03 17:13:26 UTC
Upstream libvirt already behaves properly in the mode=strict case
thanks to

  commit a6929d62cf5ca6bef076876f3354375f3a719df0
  Author: Michal Prívozník <mprivozn>
  Date:   Tue Feb 22 09:02:17 2022 +0100

    qemu: Don't ignore failure when building default memory backend
    
    When building the default memory backend (which has id='pc.ram')
    and no guest NUMA is configured then
    qemuBuildMemCommandLineMemoryDefaultBackend() is called. However,
    its return value is ignored which means that on invalid
    configuration (e.g. when non-existent hugepage size was
    requested) an error is reported into the logs but QEMU is started
    anyway. And while QEMU does error out its error message doesn't
    give much clue what's going on:
    
      qemu-system-x86_64: Memory backend 'pc.ram' not found
    
    While at it, introduce a test case. While I could chose a nice
    looking value (e.g. 4MiB) that's exactly what I wanted to avoid,
    because while such value might not be possible on x84_64 it may
    be possible on other arches (e.g. ppc is notoriously known for
    supporting wide range of HP sizes). Let's stick with obviously
    wrong value of 5MiB.
    
    Reported-by: Charles Polisher <chas>
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>
 
  https://gitlab.com/libvirt/libvirt/-/commit/a6929d62cf5ca6bef076876f3354375f3a719df0

so we'd need to backport that commit.

However, I have noticed that the behavior for mode=restrictive has
regressed upstream, and it now presents the same issue reported here.
So I'm digging further, to ensure that the behavior is consistent
across the board.

Comment 4 Eric Auger 2023-01-03 17:25:29 UTC
> 
> From your report, and also based on my own observations, it looks
> like, regardless of the exact failure, the VM will not be able to
> start when configured to pin memory to non-existent NUMA nodes. So in
> that sense this is more of a cosmetic issue than a functional one.
> Does that sound like a fair assessment?

Indeed from the main description it looks like the error message has changed but in both cases it seems we get

error: Failed to start domain 'avocado-vt-vm1'

and it sounds like the guest fails to start.

Comment 7 Andrea Bolognani 2023-01-03 18:40:07 UTC
Patches posted upstream.

  https://listman.redhat.com/archives/libvir-list/2023-January/236581.html

Comment 8 Hu Shuai (Fujitsu) 2023-01-04 09:41:39 UTC
(In reply to Andrea Bolognani from comment #2)

> Can you please confirm whether you think this is a regression
> compared to a previous version of RHEL?

I tested this on RHEL8.6 and got same result. It seems that it's a latent issue.
Env:
  DISTRO: RHEL-8.6.0-20220420.3
  kernel-4.18.0-372.9.1.el8.aarch64
  libvirt-8.0.0-5.module+el8.6.0+14480+c0a3aa0f.aarch64
  qemu-kvm-6.2.0-11.module+el8.6.0+14707+5aa4b42d.aarch64

Comment 9 Eric Auger 2023-01-04 09:54:55 UTC
Hi,

please can you confirm that the guest is not started as suggested by the log:

error: Failed to start domain 'avocado-vt-vm1'

and your concern rather is the confusing error message.

Thanks

Eric

Comment 10 Hu Shuai (Fujitsu) 2023-01-04 10:16:40 UTC
(In reply to Eric Auger from comment #9)
> Hi,
> 
> please can you confirm that the guest is not started as suggested by the log:
> 
> error: Failed to start domain 'avocado-vt-vm1'

Yes, the guest did not start.

> and your concern rather is the confusing error message.

Sorry for my inaccurate description, the guest does not start successfully.
This is a negative test. I gave the memory_nodeset an unavailable value, so I wish it
failed to start during the configuration check due to this unavailable value.

Comment 11 Andrea Bolognani 2023-01-04 12:43:32 UTC
(In reply to Hu Shuai (Fujitsu) from comment #10)
> (In reply to Eric Auger from comment #9)
> > please can you confirm that the guest is not started as suggested by the log:
> > 
> > error: Failed to start domain 'avocado-vt-vm1'
> 
> Yes, the guest did not start.
> 
> > and your concern rather is the confusing error message.
> 
> Sorry for my inaccurate description, the guest does not start successfully.
> This is a negative test. I gave the memory_nodeset an unavailable value, so
> I wish it
> failed to start during the configuration check due to this unavailable value.

Based on what you just confirmed, that this is a long-standing issue
with no functional impact, I don't think a backport is warranted.

The remaining part of the issue will be addressed upstream and will
naturally make its way to RHEL 9 through a rebase, but as far as RHEL
8 is concerned I'm inclined to consider it WONTFIX and move on.

Does this sound reasonable?

Comment 12 Hu Shuai (Fujitsu) 2023-01-05 02:17:33 UTC
(In reply to Andrea Bolognani from comment #11)

> Based on what you just confirmed, that this is a long-standing issue
> with no functional impact, I don't think a backport is warranted.
> 
> The remaining part of the issue will be addressed upstream and will
> naturally make its way to RHEL 9 through a rebase, but as far as RHEL
> 8 is concerned I'm inclined to consider it WONTFIX and move on.
> 
> Does this sound reasonable?

Yes, it's reasonable.

Comment 13 Andrea Bolognani 2023-01-09 10:15:30 UTC
Fix merged upstream.

  commit e152f0718f70be62fc8773ffeadde29456218680
  Author: Andrea Bolognani <abologna>
  Date:   Tue Jan 3 18:46:05 2023 +0100

    qemu: Always check nodeset provided to numatune
    
    Up until commit 629282d88454, using mode=restrictive caused
    virNumaSetupMemoryPolicy() to be called from qemuProcessHook(),
    and that in turn resulted in virNumaNodesetIsAvailable() being
    called and the nodeset being validated.
    
    After that change, the only validation for the nodeset is the one
    happening in qemuBuildMemoryBackendProps(), which is skipped when
    using mode=restrictive.
    
    Make sure virNumaNodesetIsAvailable() is called whenever a
    nodeset has been provided by the user, regardless of the mode.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=2156289
    
    Signed-off-by: Andrea Bolognani <abologna>
    Reviewed-by: Michal Privoznik <mprivozn>

  v8.10.0-215-ge152f0718f

As agreed (Comment #11, Comment #12) the fix will NOT be backported
to RHEL 8. Accordingly, closing as NEXTRELEASE.

Comment 14 Andrea Bolognani 2023-01-09 15:24:51 UTC
(In reply to Andrea Bolognani from comment #13)
> As agreed (Comment #11, Comment #12) the fix will NOT be backported
> to RHEL 8. Accordingly, closing as NEXTRELEASE.

I hadn't noticed that the bug had been moved to RHEL 9. Since it IS
going to be fixed there with the next rebase, reopening and moving to
POST.

Comment 16 John Ferlan 2023-01-10 17:15:57 UTC
Can the qa_ack+ be reset on this so we can get release+ - wasn't clear to me whether hshuai can set qa_ack+... Thanks!

Comment 17 liang cong 2023-01-11 11:35:54 UTC
1. on rhel8.8 x86_64 build libvirt-8.0.0-13.module+el8.8.0+17719+f18c2d1b.x86_64
this issue is reproducible, but only for "strict" mode.
2. on rhel9.2 x86_64 build libvirt-8.10.0-2.el9.x86_64
this issue is not reproducible but for "restrictive" mode has similar issue as bug#2137804

Comment 18 Andrea Bolognani 2023-01-11 18:26:44 UTC
(In reply to liang cong from comment #17)
> 1. on rhel8.8 x86_64 build
> libvirt-8.0.0-13.module+el8.8.0+17719+f18c2d1b.x86_64
> this issue is reproducible, but only for "strict" mode.
> 2. on rhel9.2 x86_64 build libvirt-8.10.0-2.el9.x86_64
> this issue is not reproducible but for "restrictive" mode has similar issue
> as bug#2137804

The fix for mode=restrictive is in libvirt 9.0.0, which should land
in RHEL 9 shortly.

Bug 2137804 indeed seems to be about the behavior that I just fixed
with the commit mentioned in Comment 13, so I think we should close
that bug as a duplicate of this one. Michal, do you agree?

Comment 19 Michal Privoznik 2023-01-16 09:51:09 UTC
(In reply to Andrea Bolognani from comment #18)
> Michal, do you agree?

I do.

Comment 20 Andrea Bolognani 2023-01-16 10:50:47 UTC
*** Bug 2137804 has been marked as a duplicate of this bug. ***

Comment 21 Hu Shuai (Fujitsu) 2023-01-17 05:31:08 UTC
Verified on rhel9.2 aarch64 with libvirt-9.0.0-1.el9.aarch64.
```
# virsh start avocado-vt-vm1
error: Failed to start domain 'avocado-vt-vm1'
error: unsupported configuration: NUMA node 200 is unavailable
```

Comment 22 liang cong 2023-01-31 01:58:33 UTC
Verified on x86_64 build:
# rpm -q libvirt qemu-kvm
libvirt-9.0.0-2.el9.x86_64
qemu-kvm-7.2.0-5.el9.x86_64

Test steps:
1. Host has 2 numa nodes only.
2. Prepare a guest xml with numatune config like below:
...
  <numatune>
      <memory mode="strict" nodeset="0-5" />
  </numatune>
...
3. Start the guest
virsh define vm1.xml && virsh start vm1
# virsh define vm1.xml && virsh start vm1
Domain 'vm1' defined from vm1.xml

error: Failed to start domain 'vm1'
error: unsupported configuration: NUMA node 2 is unavailable

Comment 25 liang cong 2023-02-02 01:31:29 UTC
Mark it verified according to comment 22 and comment 21

Comment 27 errata-xmlrpc 2023-05-09 07:27:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171


Note You need to log in before you can comment on or make changes to this bug.