Bug 1304387 - HE VM hot plug is not working, it's missing the real max number of cpu definition
Summary: HE VM hot plug is not working, it's missing the real max number of cpu defini...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.HostedEngine
Version: 3.6.2.6
Hardware: x86_64
OS: Linux
high
medium vote
Target Milestone: ovirt-4.0.4
: 4.0.4
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard:
Depends On: 1326817
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-03 13:21 UTC by Artyom
Modified: 2017-05-11 09:27 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: We did not set the maximum allowed number of CPUs for the hosted engine VM. Consequence: Hosted engine CPU hotplug was not working. Fix: The maximum number of CPUs is properly computed and passed to the hosted engine configuration. Result: Hosted engine CPU hotplug should work now.
Clone Of:
Environment:
Last Closed: 2016-09-26 12:32:34 UTC
oVirt Team: SLA
ykaul: ovirt-4.0.z+
mgoldboi: planning_ack+
msivak: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
engine and vdsm logs (1.57 MB, application/zip)
2016-02-03 13:21 UTC, Artyom
no flags Details
vdsm and engine logs (2.51 MB, application/zip)
2016-09-04 13:48 UTC, Artyom
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 55541 master MERGED HE: Store maximum number of vcpus to the OVF file 2016-07-13 13:23:39 UTC
oVirt gerrit 55818 master MERGED Ignore memory and CPU hotplug errors for hosted engine VM 2016-04-11 07:18:51 UTC
oVirt gerrit 55932 ovirt-engine-3.6 MERGED Ignore memory and CPU hotplug errors for hosted engine VM 2016-04-11 11:00:43 UTC
oVirt gerrit 55938 ovirt-engine-3.6.5 MERGED Ignore memory and CPU hotplug errors for hosted engine VM 2016-04-11 12:05:33 UTC
oVirt gerrit 61307 ovirt-engine-4.0 MERGED HE: Store maximum number of vcpus to the OVF file 2016-07-31 13:12:14 UTC

Description Artyom 2016-02-03 13:21:26 UTC
Created attachment 1120783 [details]
engine and vdsm logs

Description of problem:
Cpu hotplug not works for HE vm with error message:
Message: Failed to hot set number of CPUS to VM HostedEngine. Underlying error message: invalid argument: requested vcpus is greater than max allowable vcpus for the domain: 4 > 2

Version-Release number of selected component (if applicable):
rhevm-3.6.2.6-0.1.el6.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted-engine
2. Enter to the engine and attach storage domain(to start auto-import process)
3. Update HE vm sockets in hotplug mode

Actual results:
Hotplug cpu failed on the HE vm with error

Expected results:
Hotplug cpu works for HE vm


Additional info:
dumpxml show me <vcpu placement='static'>2</vcpu>

Comment 1 Red Hat Bugzilla Rules Engine 2016-02-17 11:40:22 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 2 Artyom 2016-04-13 16:10:00 UTC
Checked on rhevm-3.6.5.3-0.1.el6.noarch
CPU hotplug still does not work for HE VM, with the same error.
After HE VM restarted I can see new values for CPU, but bug connects to CPU hotplug, so I move it to assigned.

Comment 3 Red Hat Bugzilla Rules Engine 2016-04-13 16:10:05 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Yaniv Kaul 2016-04-14 07:40:38 UTC
Moving to 3.6.6, this is certainly not a blocker.

Comment 5 Red Hat Bugzilla Rules Engine 2016-04-17 07:59:14 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 6 Sandro Bonazzola 2016-05-02 10:00:00 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 7 Yaniv Lavi 2016-05-23 13:16:22 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 8 Yaniv Lavi 2016-05-23 13:20:12 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 10 Red Hat Bugzilla Rules Engine 2016-07-25 10:33:55 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 11 Artyom 2016-09-04 13:48:04 UTC
Created attachment 1197639 [details]
vdsm and engine logs

Checked on rhevm-4.0.4-0.1.el7ev.noarch

1) Deploy HE
2) Add storage domain to the engine
3) Edit number of the HE VM CPU's from 2 to 4 - still receive error in the engine and HE VM still has only two CPU's

I can see number of problems when the agent create HE VM:
1) HE VM: <vcpu placement='static'>2</vcpu>
   Regular VM: <vcpu placement='static' current='1'>16</vcpu>
2) HE VM: <type arch='x86_64' machine='rhel6.5.0'>hvm</type>
   Regular VM: <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>

Comment 12 Red Hat Bugzilla Rules Engine 2016-09-04 13:48:29 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 13 Martin Sivák 2016-09-05 13:39:21 UTC
(In reply to Artyom from comment #11)
> Created attachment 1197639 [details]
> vdsm and engine logs
> 
> Checked on rhevm-4.0.4-0.1.el7ev.noarch
> 
> 1) Deploy HE
> 2) Add storage domain to the engine
> 3) Edit number of the HE VM CPU's from 2 to 4 - still receive error in the
> engine and HE VM still has only two CPU's


Artyom, how long did you wait for the CPU setting to propagate? It might take up to an hour unless you modify the OVF sync interval in engine-config.

Comment 14 Artyom 2016-09-05 14:44:27 UTC
I waited 5 minutes, but I though we already have the fix that updates OVF straight forward without need to wait for OVF interval

Comment 15 Martin Sivák 2016-09-05 14:49:59 UTC
> I waited 5 minutes, but I though we already have the fix that updates OVF
> straight forward without need to wait for OVF interval

We do, but it does not work properly :/

Can you please retest this with the OvfUpdateInterval set to one minute?(In reply to Artyom from comment #14)

Comment 16 Artyom 2016-09-05 15:26:18 UTC
I have HE VM with 2 CPU's

1) Put HE environment to global maintenance
2) Update OVF interval to one minute - # engine-config -s OvfUpdateIntervalInMinutes=1
3) Restart engine - # systemctl restart ovirt-engine
4) Disable global maintenance
5) Update HE VM number of CPU's to 4 - Still can see error message from libvirt
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2559, in setNumberOfCpus
    libvirt.VIR_DOMAIN_AFFECT_CURRENT)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 916, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2433, in setVcpusFlags
    if ret == -1: raise libvirtError ('virDomainSetVcpusFlags() failed', dom=self)
libvirtError: invalid argument: requested vcpus is greater than max allowable vcpus for the domain: 4 > 2
6) Wait 5 minutes
7) Check HE VM number of CPU's
# lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
...


Some additional info, I can see that OVF updated correctly
umber of virtual CPU</rasd:Description><rasd:InstanceId>1</rasd:InstanceId><rasd:ResourceType>3</rasd:ResourceType><rasd:num_of_sockets>4</rasd:num_of_sockets><rasd:cpu_per_socket>1</rasd:cpu_per_socket><rasd:threads_per_cpu>1</rasd:threads_per_cpu><rasd:max_num_of_vcpus>16</rasd:max_num_of_vcpus></Item><Item><rasd:Caption>4096 MB of memory</rasd:Caption><rasd:Description>Memory Size</rasd:Description><rasd:InstanceId>2</rasd:InstanceId><rasd:ResourceType>4</rasd:ResourceType><rasd:AllocationUnits>MegaBytes</rasd:AllocationUnits><rasd:VirtualQuantity>4096</rasd:VirtualQuantity></Item><Item><rasd:Caption>virtio-disk0</rasd:Caption><rasd:InstanceId>af6e9e52-3f0f-443d-83f5-465539c7506d</rasd:InstanceId><ras

Ok looks I found what the problem:
When you just finish deploy you HE VM has incorrect values for <vcpu placement='static'>2</vcpu>(I believe it happens until you restart HE VM with the updated OVF)

Comment 17 Artyom 2016-09-05 15:31:21 UTC
So I will verify this bug and open another one connect to start HE VM without OVF(just after HE deployment)

Verified on rhevm-4.0.4-0.1.el7ev.noarch

1) Deploy HE
2) Wait until engine will generate HE OVF
3) Enable global maintenance
4) Restart HE VM
5) Update HE VM number of CPU's to 4
6) Check that HE VM has correct number of CPU's


Note You need to log in before you can comment on or make changes to this bug.