Bug 1890665 - Update numa node value is not applied after the VM restart
Summary: Update numa node value is not applied after the VM restart
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.4.3.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.4.5
: ---
Assignee: Liran Rotenberg
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-22 16:52 UTC by Polina
Modified: 2021-03-18 15:15 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.4.5.3
Doc Type: Bug Fix
Doc Text:
Cause: Missing values caused failures during editing operations while the Virtual Machine is running. Consequence: Saving the values failed. Fix: Added validation to the values before using them. Result: CPU and Numa Node updates complete successfully when the process is running.
Clone Of:
Environment:
Last Closed: 2021-03-18 15:15:34 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+
pm-rhel: planning_ack+
ahadas: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
dumpxml before and after restart (40.00 KB, application/x-tar)
2020-10-22 16:52 UTC, Polina
no flags Details
engine.log (693.10 KB, application/gzip)
2021-01-21 10:11 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 112747 0 master MERGED core: Fixed issues with Numa Node updating 2021-02-16 17:17:28 UTC
oVirt gerrit 112957 0 master MERGED core+api: fix numa next run update 2021-02-16 17:17:28 UTC

Description Polina 2020-10-22 16:52:13 UTC
Created attachment 1723586 [details]
dumpxml before and after restart

Description of problem:Edit running VM that configured with 1 cpu, one numa node pinned. Update to cpu=2, numa node = 2. Got the message saying that restart is needed . Restart the VM - the cpu is set to 2, numanode is set to 0 and actually no numa is pinned.


Version-Release number of selected component (if applicable): ovirt-engine-4.4.3.8-0.1.el8ev.noarch

How reproducible:always

Steps to Reproduce:
1. Run VM built on the base of latest-rhel-guest-image-8.2-infra, configured with CPU=1, pinned to host, NUMA Node Count=1. (virsh dump attached)

2. Update for the running VM CPU=2, NUMA Node Count=1. It brings the message saying that the update Numa node requires VM restarting.

3. Restart The VM (shutdown, start).

Actual results:The CPU is set to 2. Numa node is set to 0. and actually no numa node pinned (xml dump attached). It could be seen after shutdown.

Expected results: NUMA Node Count=2


Additional info:

Comment 1 Arik 2020-10-26 12:26:59 UTC
The problem is that numa pinning is clear completely.
The expected result: to keep the original cpu pinned (the second cpu should not be pinned unless explicitly requested)

Comment 2 Polina 2020-10-26 12:37:40 UTC
there is a mistake in the description https://bugzilla.redhat.com/show_bug.cgi?id=1890665#c0 Step2. The update is for both CPU and the NUMA count. As result, numa pinning is cleared. 

2. Update for the running VM CPU=2, NUMA Node Count=2.

Comment 3 Polina 2020-10-26 13:43:14 UTC
the additional buggy situation in this are. I think it is related to the same code, that is why I don't enter the separate bz:

Step1. Sent to the host with two numa nodes:

POST /ovirt-engine/api/vms?auto_pinning_policy=existing
<vm>
  <name>auto_cpu_vm3</name>
  <template>
    <name>infra-template</name>
  </template>
  <cluster>   <name>golden_env_mixed_1</name>
  </cluster>
  <cpu>
      <topology>
          <sockets>2</sockets>
          <threads>2</threads>
          <cores>6</cores>
      </topology>
  </cpu>
  <placement_policy>
      <hosts>
        <host>
          <name>host_mixed_1</name>
        </host>
    </hosts>
  </placement_policy>
</vm>

Step2. Run the VM and run virsh -r dumpxml vm_id.
 <numatune>
    <memnode cellid='0' mode='strict' nodeset='1'/>
    <memnode cellid='1' mode='strict' nodeset='0'/>
  </numatune>

Step3.
Update the VM to have 2 CPUs, and cpu topology 0#0_1#1. Restart the VM

Result:
The NUMA mode has been changed to the 'Interleave' instead of 'Strict' though I didn't touch it and expected to remain 'Strict'

Comment 4 Polina 2020-12-30 15:03:30 UTC
not for QE:
while verification the following scenario must be tested. Now it is buggy.
1. Let's say we have a VM with two vNuma nodes, no matter what are the values for tune_mode.
GET https://{{host}}/ovirt-engine/api/vms/3379ac72-3c8c-4e20-8872-db600d27736b/numanodes 
shows both nodes OK. 

2. Send update request for the VM:
put https://{{host}}/ovirt-engine/api/vms/3379ac72-3c8c-4e20-8872-db600d27736b
<vm>
    <numa_tune_mode>strict</numa_tune_mode>
</vm>
The value doesn't matter . The VM is show in UI as pending changes which is correct.

3. But the same GET request at this step returns an empty response

the GET request at this step will return the previous response (before the update since the VM is not restarted yet, but not empty)

Comment 5 Polina 2021-01-21 10:07:57 UTC
verification on ovirt-engine-4.4.4.7-0.1.el8ev.noarch

1. The scenario described in https://bugzilla.redhat.com/show_bug.cgi?id=1890665#c0 (+ correction 1890665#c2) fails with the engine internal error (log attached)

   2021-01-21 11:56:11,862+02 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-11) [084e5aba-82fa-4394-9d31-44657c6ed503] Command 'org.ovirt.engine.core.bll.UpdateVmCommand' failed: null
   2021-01-21 11:56:11,863+02 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-11) [084e5aba-82fa-4394-9d31-44657c6ed503] Exception: java.lang.NullPointerException


2. The scenario described in https://bugzilla.redhat.com/show_bug.cgi?id=1890665#c3 works

3. The scenario described in https://bugzilla.redhat.com/show_bug.cgi?id=1890665#c4 doesn't work.
     - Start VM with two vNuma nodes, no matter what are the values for tune_mode.
     GET https://{{host}}/ovirt-engine/api/vms/3379ac72-3c8c-4e20-8872-db600d27736b/numanodes 
     shows both nodes OK. 
     - Send update request for the VM:
     put https://{{host}}/ovirt-engine/api/vms/3379ac72-3c8c-4e20-8872-db600d27736b
     <vm>
         <numa_tune_mode>strict</numa_tune_mode>
     </vm>
     The value doesn't matter . The VM is show in UI as pending changes which is correct.
     - But the same GET request at this step returns an empty response

Please, let me know if to report a new bug for the described in 1. ? or I should re-assign this one?
Also if the described in 3. in scope of this bug ?

Comment 6 Polina 2021-01-21 10:11:37 UTC
Created attachment 1749335 [details]
engine.log

Comment 7 Liran Rotenberg 2021-01-21 18:13:22 UTC
Switching to POST. 
With the new patch the first scenario in comment #0 and comment #4 works for me.

About your comment in comment #4:
This requires more effort. Once we create a next_run configuration we clear the VmNumaNodes of that VM at DB level. On shutdown it will update to the next_run configuration.
What I discovered is that when doing multiple next_run updates it will eventually(second update) will be empty and will set on shutdown to empty! The VM NUMA nodes didn't persist on multiple updates.

For seeing the value, note that for the current update you should see like it didn't change(as it is next_run update). To see the value of the next_run (on the VM) you will need to query it:
GET https://{{host}}/ovirt-engine/api/vms/3379ac72-3c8c-4e20-8872-db600d27736b?next_run=true

In - GET https://{{host}}/ovirt-engine/api/vms/3379ac72-3c8c-4e20-8872-db600d27736b/numanodes you will see the nodes like they didn't change. But once the VM shutdown it will change to the next run value you set.

Comment 8 RHEL Program Management 2021-01-21 18:13:30 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 9 Polina 2021-02-10 10:21:11 UTC
verified on ovirt-engine-4.4.5.2-0.1.el8ev.noarch

Comment 10 Sandro Bonazzola 2021-03-18 15:15:34 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.