Bug 1462676

Summary: Cpu-hotplug configuration in RHV-M does not work sometimes while it should do
Product: [oVirt] ovirt-engine Reporter: jiyan <jiyan>
Component: BLL.VirtAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED NOTABUG QA Contact: meital avital <mavital>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.3.2CC: bugs, dyuan, jiyan, lmen, tjelinek, xuzhang
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-29 08:13:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs for Step3 none

Description jiyan 2017-06-19 09:08:04 UTC
Created attachment 1289052 [details]
Logs for Step3

Description of problem:
Cpu-hotplug configuration in RHV-M does not work sometimes while it should do.

Version-Release number of selected component (if applicable):
RHV-M server:
rhevm-4.1.3.2-0.1.el7.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.1.3.2-0.1.el7.noarch

RHV-M register host:
qemu-kvm-rhev-2.9.0-9.el7.x86_64
libvirt-3.2.0-9.el7.x86_64
kernel-3.10.0-679.el7.x86_64
vdsm-4.19.18-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure 'Virtual Sockets' as '1' and check the environment as following:
1.1> In the RHV-M GUI, remove 'CPU' filter from 'none' scheduling policy, and make'cluster' select the 'none' scheduling policy.

1.2> In the RHV-M GUI, configure the data center with hosts and storageļ¼Œ then New a vm called vm1, confirm the vm can start successfully.

1.3> Configure 'system' configuration of vm as following, and check the vm can start the vm normally.
  Total Virtual CPUs:4
  Virtual Sockets:1
  Cores per Virtual Socket:1
  threads per Core:4

1.4> check the libvirt dumpxml file in register host and execute command 'lscpu' in guest:

In Host check libvirt dumpxml:
#virsh dumpxml vm1
  <vcpu placement='static' current='4'>64</vcpu>
  <cpu mode='custom' match='exact' check='full'>
    <topology sockets='16' cores='1' threads='4'/>
  </cpu>

In vm/Guest:
#lscpu
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1


2. Cpu hot-plug 'Virtual Sockets' as 2  -- succeed
2.1> After Step1, Configure 'system' configuration of vm as following and remain the vm running
  Total Virtual CPUs:8
  Virtual Sockets:2
  Cores per Virtual Socket:1
  threads per Core:4

2.2> check the libvirt dumpxml file in register host and execute command 'lscpu' in guest:

In Host check libvirt dumpxml:
#virsh dumpxml vm1
    <vcpu placement='static' current='8'>64</vcpu>
  <vcpus>
    <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
    <vcpu id='1' enabled='yes' hotpluggable='no' order='2'/>
    <vcpu id='2' enabled='yes' hotpluggable='no' order='3'/>
    <vcpu id='3' enabled='yes' hotpluggable='no' order='4'/>
    <vcpu id='4' enabled='yes' hotpluggable='yes' order='5'/>
    <vcpu id='5' enabled='yes' hotpluggable='yes' order='6'/>
    <vcpu id='6' enabled='yes' hotpluggable='yes' order='7'/>
    <vcpu id='7' enabled='yes' hotpluggable='yes' order='8'/>
    <vcpu id='8' enabled='no' hotpluggable='yes'/>
    ...
    <vcpu id='63' enabled='no' hotpluggable='yes'/>
  <cpu mode='custom' match='exact' check='full'>
    <topology sockets='16' cores='1' threads='4'/>
  </cpu>

In vm/Guest:
#lscpu
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          1


3. Cpu hot-plug 'Virtual Sockets' as 4  -- faile
3.1> After Step1, Configure 'system' configuration of vm as following and remain the vm running
  Total Virtual CPUs:16
  Virtual Sockets:4
  Cores per Virtual Socket:1
  threads per Core:4

3.2> The error info raises as following
Error while executing action: 
vm1:
CPU_HOTPLUG_TOPOLOGY_INVALID


Actual results:
As step 3.2 shows

Expected results:
Refer to Step 2

Additional info:
The attachment includes logs as following:
log1/RHV-server-engine.log
log1/RHV-host-libvirtd.log
log1/RHV-host-qemu-vm1.log
log1/RHV-host-vdsm.log

Comment 1 Tomas Jelinek 2017-06-21 10:37:01 UTC
(In reply to jiyan from comment #0)
> Created attachment 1289052 [details]
> Logs for Step3
> 
> Description of problem:
> Cpu-hotplug configuration in RHV-M does not work sometimes while it should
> do.
> 
> Version-Release number of selected component (if applicable):
> RHV-M server:
> rhevm-4.1.3.2-0.1.el7.noarch
> ovirt-engine-setup-plugin-ovirt-engine-4.1.3.2-0.1.el7.noarch
> 
> RHV-M register host:
> qemu-kvm-rhev-2.9.0-9.el7.x86_64
> libvirt-3.2.0-9.el7.x86_64
> kernel-3.10.0-679.el7.x86_64
> vdsm-4.19.18-1.el7ev.x86_64
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Configure 'Virtual Sockets' as '1' and check the environment as following:
> 1.1> In the RHV-M GUI, remove 'CPU' filter from 'none' scheduling policy,
> and make'cluster' select the 'none' scheduling policy.
> 
> 1.2> In the RHV-M GUI, configure the data center with hosts and storageļ¼Œ
> then New a vm called vm1, confirm the vm can start successfully.
> 
> 1.3> Configure 'system' configuration of vm as following, and check the vm
> can start the vm normally.
>   Total Virtual CPUs:4
>   Virtual Sockets:1
>   Cores per Virtual Socket:1
>   threads per Core:4
> 
> 1.4> check the libvirt dumpxml file in register host and execute command
> 'lscpu' in guest:
> 
> In Host check libvirt dumpxml:
> #virsh dumpxml vm1
>   <vcpu placement='static' current='4'>64</vcpu>
>   <cpu mode='custom' match='exact' check='full'>
>     <topology sockets='16' cores='1' threads='4'/>
>   </cpu>
> 
> In vm/Guest:
> #lscpu
> CPU(s):                4
> On-line CPU(s) list:   0-3
> Thread(s) per core:    1

the reason for this is that the host is an AMD (see bug 1462183)

> Core(s) per socket:    4
> Socket(s):             1
> NUMA node(s):          1
> 
> 
> 2. Cpu hot-plug 'Virtual Sockets' as 2  -- succeed
> 2.1> After Step1, Configure 'system' configuration of vm as following and
> remain the vm running
>   Total Virtual CPUs:8
>   Virtual Sockets:2
>   Cores per Virtual Socket:1
>   threads per Core:4
> 
> 2.2> check the libvirt dumpxml file in register host and execute command
> 'lscpu' in guest:
> 
> In Host check libvirt dumpxml:
> #virsh dumpxml vm1
>     <vcpu placement='static' current='8'>64</vcpu>
>   <vcpus>
>     <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
>     <vcpu id='1' enabled='yes' hotpluggable='no' order='2'/>
>     <vcpu id='2' enabled='yes' hotpluggable='no' order='3'/>
>     <vcpu id='3' enabled='yes' hotpluggable='no' order='4'/>
>     <vcpu id='4' enabled='yes' hotpluggable='yes' order='5'/>
>     <vcpu id='5' enabled='yes' hotpluggable='yes' order='6'/>
>     <vcpu id='6' enabled='yes' hotpluggable='yes' order='7'/>
>     <vcpu id='7' enabled='yes' hotpluggable='yes' order='8'/>
>     <vcpu id='8' enabled='no' hotpluggable='yes'/>
>     ...
>     <vcpu id='63' enabled='no' hotpluggable='yes'/>
>   <cpu mode='custom' match='exact' check='full'>
>     <topology sockets='16' cores='1' threads='4'/>
>   </cpu>
> 
> In vm/Guest:
> #lscpu
> CPU(s):                8
> On-line CPU(s) list:   0-7
> Thread(s) per core:    1
> Core(s) per socket:    4
> Socket(s):             2
> NUMA node(s):          1
> 
> 
> 3. Cpu hot-plug 'Virtual Sockets' as 4  -- faile
> 3.1> After Step1, Configure 'system' configuration of vm as following and
> remain the vm running
>   Total Virtual CPUs:16
>   Virtual Sockets:4
>   Cores per Virtual Socket:1
>   threads per Core:4
> 
> 3.2> The error info raises as following
> Error while executing action: 
> vm1:
> CPU_HOTPLUG_TOPOLOGY_INVALID

this happens when you have a host which has less CPUs than you try to hotplug. 
e.g. does your host have at least 16 CPUs?

> 
> 
> Actual results:
> As step 3.2 shows
> 
> Expected results:
> Refer to Step 2
> 
> Additional info:
> The attachment includes logs as following:
> log1/RHV-server-engine.log
> log1/RHV-host-libvirtd.log
> log1/RHV-host-qemu-vm1.log
> log1/RHV-host-vdsm.log

Comment 2 jiyan 2017-06-27 11:32:42 UTC
Hi, Tomas.

I tested same scenario in other different environment.


The physical host cpu info is as follows:
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel

The test steps as follows:
Step 1:
1.1> Configure 'system' configuration of vm as following, and check the vm
can start the vm normally.
Total Virtual CPUs:12
Virtual Sockets:1
Cores per Virtual Socket:6
threads per Core:2

1.2> check the libvirt dumpxml file in register host and execute command
'lscpu' in guest:

In Host check libvirt dumpxml:
#virsh dumpxml vm1
<vcpu placement='static' current='12'>192</vcpu>
  <cpu mode='custom' match='exact' check='full'>
    <topology sockets='16' cores='6' threads='2'/>
    <numa>
      <cell id='0' cpus='0-11' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
 
In vm/Guest:
#lscpu# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel

Step 2:
2.1> Configure 'system' configuration of vm as following, and check the vm
can start the vm normally.
Total Virtual CPUs:48
Virtual Sockets:4
Cores per Virtual Socket:6
threads per Core:2

2.2> check the libvirt dumpxml file in register host and execute command
'lscpu' in guest:
In Host check libvirt dumpxml:
#virsh dumpxml vm1
 <vcpu placement='static' current='48'>192</vcpu>
  <vcpus>
    <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
    <vcpu id='1' enabled='yes' hotpluggable='no' order='2'/>
    <vcpu id='2' enabled='yes' hotpluggable='no' order='3'/>
...
    <vcpu id='11' enabled='yes' hotpluggable='no' order='12'/>
    <vcpu id='12' enabled='yes' hotpluggable='yes' order='13'/>
    <vcpu id='13' enabled='yes' hotpluggable='yes' order='14'/>
...
    <vcpu id='45' enabled='yes' hotpluggable='yes' order='46'/>
    <vcpu id='46' enabled='yes' hotpluggable='yes' order='47'/>
    <vcpu id='47' enabled='yes' hotpluggable='yes' order='48'/>
    <vcpu id='48' enabled='no' hotpluggable='yes'/>
    <vcpu id='49' enabled='no' hotpluggable='yes'/>
...
    <vcpu id='189' enabled='no' hotpluggable='yes'/>
    <vcpu id='190' enabled='no' hotpluggable='yes'/>
    <vcpu id='191' enabled='no' hotpluggable='yes'/>
  </vcpus>

  <cpu mode='custom' match='exact' check='full'>
    <topology sockets='16' cores='6' threads='2'/>
    <numa>
      <cell id='0' cpus='0-11' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>

In vm/Guest:
#lscpu# lscpu
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             4
NUMA node(s):          1
Vendor ID:             GenuineIntel



>this happens when you have a host which has less CPUs than you try to hotplug. 
>'cpu' filter is not related to the CPU Hotplug, hotplug enforces this limit and lets the overcommit considerations only to scheduler.



In the scenario above, this also happens the host which has less CPUs than trying to hotplug, but that seems work. But in the scenario as follows, in Step 3, it failed. Both of them happen that the host which has less CPUs than trying to hotplug, one succeed while the other failed.


When I try to configure as following,the error info raises:
Step 3:
3.1> Configure 'system' configuration of vm as following, and check the vm
can start the vm normally.
Total Virtual CPUs:60
Virtual Sockets:5
Cores per Virtual Socket:6
threads per Core:2

The error info:
Error while executing action: 
vm1:
CPU_HOTPLUG_TOPOLOGY_INVALID

Comment 3 Tomas Jelinek 2017-06-29 08:13:42 UTC
(In reply to jiyan from comment #2)
> Hi, Tomas.
> 
> I tested same scenario in other different environment.
> 
> 
> The physical host cpu info is as follows:
> # lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                24
> On-line CPU(s) list:   0-23
> Thread(s) per core:    2
> Core(s) per socket:    6
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> 
> The test steps as follows:
> Step 1:
> 1.1> Configure 'system' configuration of vm as following, and check the vm
> can start the vm normally.
> Total Virtual CPUs:12
> Virtual Sockets:1
> Cores per Virtual Socket:6
> threads per Core:2
> 
> 1.2> check the libvirt dumpxml file in register host and execute command
> 'lscpu' in guest:
> 
> In Host check libvirt dumpxml:
> #virsh dumpxml vm1
> <vcpu placement='static' current='12'>192</vcpu>
>   <cpu mode='custom' match='exact' check='full'>
>     <topology sockets='16' cores='6' threads='2'/>
>     <numa>
>       <cell id='0' cpus='0-11' memory='1048576' unit='KiB'/>
>     </numa>
>   </cpu>
>  
> In vm/Guest:
> #lscpu# lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                12
> On-line CPU(s) list:   0-11
> Thread(s) per core:    2
> Core(s) per socket:    6
> Socket(s):             1
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> 
> Step 2:
> 2.1> Configure 'system' configuration of vm as following, and check the vm
> can start the vm normally.
> Total Virtual CPUs:48
> Virtual Sockets:4
> Cores per Virtual Socket:6
> threads per Core:2
> 
> 2.2> check the libvirt dumpxml file in register host and execute command
> 'lscpu' in guest:
> In Host check libvirt dumpxml:
> #virsh dumpxml vm1
>  <vcpu placement='static' current='48'>192</vcpu>
>   <vcpus>
>     <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
>     <vcpu id='1' enabled='yes' hotpluggable='no' order='2'/>
>     <vcpu id='2' enabled='yes' hotpluggable='no' order='3'/>
> ...
>     <vcpu id='11' enabled='yes' hotpluggable='no' order='12'/>
>     <vcpu id='12' enabled='yes' hotpluggable='yes' order='13'/>
>     <vcpu id='13' enabled='yes' hotpluggable='yes' order='14'/>
> ...
>     <vcpu id='45' enabled='yes' hotpluggable='yes' order='46'/>
>     <vcpu id='46' enabled='yes' hotpluggable='yes' order='47'/>
>     <vcpu id='47' enabled='yes' hotpluggable='yes' order='48'/>
>     <vcpu id='48' enabled='no' hotpluggable='yes'/>
>     <vcpu id='49' enabled='no' hotpluggable='yes'/>
> ...
>     <vcpu id='189' enabled='no' hotpluggable='yes'/>
>     <vcpu id='190' enabled='no' hotpluggable='yes'/>
>     <vcpu id='191' enabled='no' hotpluggable='yes'/>
>   </vcpus>
> 
>   <cpu mode='custom' match='exact' check='full'>
>     <topology sockets='16' cores='6' threads='2'/>
>     <numa>
>       <cell id='0' cpus='0-11' memory='1048576' unit='KiB'/>
>     </numa>
>   </cpu>
> 
> In vm/Guest:
> #lscpu# lscpu
> # lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                48
> On-line CPU(s) list:   0-47
> Thread(s) per core:    2
> Core(s) per socket:    6
> Socket(s):             4
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> 
> 

this is all correct

> 
> >this happens when you have a host which has less CPUs than you try to hotplug. 
> >'cpu' filter is not related to the CPU Hotplug, hotplug enforces this limit and lets the overcommit considerations only to scheduler.
> 
> 
> 
> In the scenario above, this also happens the host which has less CPUs than
> trying to hotplug, but that seems work. But in the scenario as follows, in
> Step 3, it failed. Both of them happen that the host which has less CPUs
> than trying to hotplug, one succeed while the other failed.
> 
> 
> When I try to configure as following,the error info raises:
> Step 3:
> 3.1> Configure 'system' configuration of vm as following, and check the vm
> can start the vm normally.
> Total Virtual CPUs:60
> Virtual Sockets:5
> Cores per Virtual Socket:6
> threads per Core:2
> 
> The error info:
> Error while executing action: 
> vm1:
> CPU_HOTPLUG_TOPOLOGY_INVALID

yes, because you can do overcommit only on start, not on hotplug. So, as far as I see all is working well, closing