Bug 1179592
| Summary: | CPU QoS limitation requires libvirt 1.1.3 in el7 | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Artyom <alukiano> | |
| Component: | mom | Assignee: | Martin Sivák <msivak> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Artyom <alukiano> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 3.5.0 | CC: | danken, dfediuck, ecohen, eedri, gklein, iheim, jdenemar, lsurette, mavital, michal.skrivanek, msivak, nsednev, sherold, s.kieske, yeylon, ylavi | |
| Target Milestone: | --- | Keywords: | Triaged | |
| Target Release: | 3.5.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | sla | |||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1181157 (view as bug list) | Environment: | ||
| Last Closed: | 2015-02-17 17:10:32 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1181157, 1184929 | |||
| Bug Blocks: | 1164311 | |||
RHEL 7 uses libvirt-1.1.1 and the metadata xml feature that is needed for this to work seems to be missing from that version. I was told that it was originally included to libvirt-1.1.3 which did not make it into RHEL 7. So currently the quota is always treated as 100% on RHEL 7 and the computed numbers (quota 25000, period 12500) cause no cpu usage throttling at all. RHEL 6.6 should have the necessary libvirt feature backported and should therefore work properly. Artyom: can you please retest with RHEL 6.6 hosts? I should add that I saw the proper values to bubble through VDSM APIs so it is really only an issue with the:
Thread-4352::DEBUG::2015-01-07 16:47:01,450::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.updateVmPolicy' in bridge with {u'params': {u'vmId': u'4d7aa507-1b32-4618-a5a2-884500dbbbc1', u'vcpuLimit': u'2'}, u'vmID': u'4d7aa507-1b32-4618-a5a2-884500dbbbc1'}
Thread-4352::DEBUG::2015-01-07 16:47:01,454::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 74 edom: 10 level: 2 message: argument unsupported: QEMU driver does not support <metadata> element
Thread-4352::ERROR::2015-01-07 16:47:01,454::vm::3821::vm.Vm::(_getVmPolicy) vmId=`4d7aa507-1b32-4618-a5a2-884500dbbbc1`::getVmPolicy failed
Traceback (most recent call last):
File "/usr/share/vdsm/virt/vm.py", line 3818, in _getVmPolicy
METADATA_VM_TUNE_URI, 0)
File "/usr/share/vdsm/virt/vm.py", line 689, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 942, in metadata
if ret is None: raise libvirtError ('virDomainGetMetadata() failed', dom=self)
libvirtError: argument unsupported: QEMU driver does not support <metadata> element
No reason to block RC on a wrong libvirt version. what's the libvirt dependency? do we expect 7.0.z update? if so, when? for rhel6.6 it also not work:
Thread-131338::DEBUG::2015-01-11 12:35:18,673::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata element is not present
Thread-131338::ERROR::2015-01-11 12:35:18,675::__init__::493::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 488, in _serveRequest
res = method(**params)
File "/usr/share/vdsm/rpc/Bridge.py", line 284, in _dynamicMethod
return self._fixupRet(className, methodName, ret)
File "/usr/share/vdsm/rpc/Bridge.py", line 234, in _fixupRet
self._typeFixup('return', retType, result)
File "/usr/share/vdsm/rpc/Bridge.py", line 214, in _typeFixup
if k in item:
TypeError: argument of type 'NoneType' is not iterable
So it also not receive limit from engine.
vdsm-4.16.8.1-5.el6ev.x86_64
libvirt-0.10.2-46.el6_6.2.x86_64
Actual only for RHEL6.6
After one minute I see that parameter updated to correct value, so error above not correct to QoS
I also see that metadata passed correct:
<metadata>
<ovirt:qos xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
<ovirt:vcpuLimit>10</ovirt:vcpuLimit>
</ovirt:qos>
And period and quota have correct values:
<period>12500</period>
<quota>2500</quota>
tested for 5, 10, 25 and 50 percents
I see that for error above we already have bug: https://bugzilla.redhat.com/show_bug.cgi?id=1142851 *** Bug 1179591 has been marked as a duplicate of this bug. *** Moving to MODIFIED to wait for a relevant RHEL version with a new libvirt version. Moving to POST on eedri's request. It should be moved to MODIFIED once the libvirt version is available. Works for me on these components: mom-0.4.1-4.el7ev.noarch libvirt-client-1.1.1-29.el7_0.7.x86_64 sanlock-3.1.0-2.el7.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64 vdsm-4.16.8.1-6.el7ev.x86_64 rhevm-3.5.0-0.31.el6ev.noarch RHEVH7.0 with these components not working: qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64 sanlock-3.1.0-2.el7.x86_64 mom-0.4.1-4.el7ev.noarch vdsm-4.16.8.1-6.el7ev.x86_64 libvirt-client-1.1.1-29.el7_0.4.x86_64 Please align RHEVHs to libvirt-client-1.1.1-29.el7_0.7.x86_64 or above. doron, libvirt errata is shipped live, can this bug move to ON_QA? On RHEL7.1 CPU SLA QOS is not working:
vdsClient -s 0 list table
virsh -r dumpxml StressVM1_CPU_RHEL7_1
<domain type='kvm' id='6'>
<name>StressVM1_CPU_RHEL7_1</name>
<uuid>12b8466c-491b-49ff-a063-fe40a180ff4a</uuid>
<metadata>
<ovirt:qos xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
<ovirt:vcpuLimit>2</ovirt:vcpuLimit>
</ovirt:qos>
</metadata>
<memory unit='KiB'>1048576</memory>
<currentMemory unit='KiB'>1048576</currentMemory>
<vcpu placement='static' current='4'>16</vcpu>
<cputune>
<shares>1020</shares>
<period>25000</period>
<quota>25000</quota>
</cputune>
<resource>
Components been used:
sanlock-3.2.2-2.el7.x86_64
qemu-kvm-rhev-2.1.2-23.el7.x86_64
libvirt-client-1.2.8-16.el7.x86_64
mom-0.4.1-4.el7ev.noarch
vdsm-4.16.8.1-6.el7ev.x86_64
Linux version 3.10.0-227.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Tue Jan 27 11:55:32 EST 2015
Linux alma03.qa.lab.tlv.redhat.com 3.10.0-227.el7.x86_64 #1 SMP Tue Jan 27 11:55:32 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch
rhevm-3.5.0-0.31.el6ev.noarch
Previously I ran the guest VM with 4 virtual CPUs and feature not limited the CPU usage to 2%, although policy was set, I retested the same guest VM with 1 virtual CPU and feature works fine:
<domain type='kvm' id='7'>
<name>StressVM1_CPU_RHEL7_1</name>
<uuid>12b8466c-491b-49ff-a063-fe40a180ff4a</uuid>
<metadata>
<ovirt:qos xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
<ovirt:vcpuLimit>2</ovirt:vcpuLimit>
</ovirt:qos>
</metadata>
<memory unit='KiB'>1048576</memory>
<currentMemory unit='KiB'>1048576</currentMemory>
<vcpu placement='static' current='1'>16</vcpu>
<cputune>
<shares>1020</shares>
<period>25000</period>
<quota>2000</quota>
</cputune>
<resource>
<partition>/machine</partition>
</resource>
<sysinfo type='smbios'>
<system>
<entry name='manufacturer'>Red Hat</entry>
<entry name='product'>RHEV Hypervisor</entry>
<entry name='version'>7.1-0.3.el7</entry>
<entry name='serial'>4C4C4544-0059-4410-8053-B7C04F573032</entry>
<entry name='uuid'>12b8466c-491b-49ff-a063-fe40a180ff4a</entry>
</system>
</sysinfo>
Please pay attention on different quota values between two scenarios, for 4 virtual CPU quota value of 25000 received, whereas for 1 virtual CPU it's 2000.
Verified on libvirt-client-1.1.1-29.el7_0.7.x86_64 On two limits: 25, 50 and with different amount of cpu's rhev 3.5.0 was released. closing. |
Description of problem: I created two cpu profiles, one with QoS that have limitation value 50 and second that have limitation value 25, and I attach this profiles to vm one by one, but I see that values for quota and period under dumpxml stay the same. I set bug under mom, because I know that we use mom policy to apply quota and period for vm. Version-Release number of selected component (if applicable): rhevm-3.5.0-0.27.el6ev.noarch vdsm-4.16.8.1-4.el7ev.x86_64 mom-0.4.1-4.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Create two Cpu QoS under the same datacenter one with limitation value 25 and second with 50 <qoss> <qos type="cpu" href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/2571eeea-25a7-4b09-9c37-d82591733f26" id="2571eeea-25a7-4b09-9c37-d82591733f26"> <name>test_1</name> <data_center href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916" id="f8f5eaee-8fd0-4b45-87db-62d61b03a916"/> <cpu_limit>50</cpu_limit> </qos> <qos type="cpu" href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/883d876e-2038-4cd4-8c35-e9b52f2f4380" id="883d876e-2038-4cd4-8c35-e9b52f2f4380"> <name>test_2</name> <data_center href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916" id="f8f5eaee-8fd0-4b45-87db-62d61b03a916"/> <cpu_limit>25</cpu_limit> </qos> </qoss> 2. Create two cpu profile with different QoS in the same cluster <cpu_profiles> <cpu_profile href= "/ovirt-engine/api/cpuprofiles/5be5c0b7-5b91-4ac4-9d53-ef6f987bff05" id="5be5c0b7-5b91-4ac4-9d53-ef6f987bff05"> <name>test_1</name> <qos href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/2571eeea-25a7-4b09-9c37-d82591733f26" id="2571eeea-25a7-4b09-9c37-d82591733f26"/> <cluster href= "/ovirt-engine/api/clusters/67866b36-fd68-4106-8758-34cf31b0c3d4" id="67866b36-fd68-4106-8758-34cf31b0c3d4"/> </cpu_profile> <cpu_profile href= "/ovirt-engine/api/cpuprofiles/b015da68-b7a5-4a4b-8389-5cbc8ce58f73" id="b015da68-b7a5-4a4b-8389-5cbc8ce58f73"> <name>test_2</name> <qos href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/883d876e-2038-4cd4-8c35-e9b52f2f4380" id="883d876e-2038-4cd4-8c35-e9b52f2f4380"/> <cluster href= "/ovirt-engine/api/clusters/67866b36-fd68-4106-8758-34cf31b0c3d4" id="67866b36-fd68-4106-8758-34cf31b0c3d4"/> </cpu_profile> </cpu_profiles> 3. Create some vm, and run it first with first QoS and after with second. First run: <cpu_profile href= "/ovirt-engine/api/cpuprofiles/5be5c0b7-5b91-4ac4-9d53-ef6f987bff05" id="5be5c0b7-5b91-4ac4-9d53-ef6f987bff05"/> dumpxml <vcpu placement='static' current='4'>32</vcpu> <cputune> <shares>1020</shares> <period>12500</period> <quota>25000</quota> </cputune> Second run: <cpu_profile href= "/ovirt-engine/api/cpuprofiles/b015da68-b7a5-4a4b-8389-5cbc8ce58f73" id="b015da68-b7a5-4a4b-8389-5cbc8ce58f73"/> dumpxml <vcpu placement='static' current='4'>32</vcpu> <cputune> <shares>1020</shares> <period>12500</period> <quota>25000</quota> </cputune> Actual results: <quota> and <period> under cpu_tunning have the same values under dumpxml, for different limitation value Expected results: <quota> and <period> have different values under different limitations Additional info: ok, I will start with this that not really understand why we use this kind of formula: period = anchor / #NumOfCpuInHost quota = (anchor*(#userSelection/100)) / #numOfVcpusInVm why we need this anchor, and why we change period time(default 1000000) I played a little with values of period and quota and virsh create, and limitation work pretty well for formula: period = default_value quota = period * (pcpu/vcpu) * (limitation/100) I check it on vm with 4 cpu's and on host with 8 cpu's with limitation 10, 25 and 50 From some reason the same proportion, but with small period work not precious or not work at all(seems to me like bug in cgroups)