Bug 1179592 - CPU QoS limitation requires libvirt 1.1.3 in el7
Summary: CPU QoS limitation requires libvirt 1.1.3 in el7
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: mom
Version: 3.5.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.5.0
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard: sla
: 1179591 (view as bug list)
Depends On: 1181157 1184929
Blocks: rhev35gablocker
TreeView+ depends on / blocked
 
Reported: 2015-01-07 07:58 UTC by Artyom
Modified: 2016-02-10 20:15 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1181157 (view as bug list)
Environment:
Last Closed: 2015-02-17 17:10:32 UTC
oVirt Team: SLA


Attachments (Terms of Use)

Description Artyom 2015-01-07 07:58:16 UTC
Description of problem:
I created two cpu profiles, one with QoS that have limitation value 50 and second that have limitation value 25, and I attach this profiles to vm one by one, but I see that values for quota and period under dumpxml stay the same.
I set bug under mom, because I know that we use mom policy to apply quota and period for vm.

Version-Release number of selected component (if applicable):
rhevm-3.5.0-0.27.el6ev.noarch
vdsm-4.16.8.1-4.el7ev.x86_64
mom-0.4.1-4.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create two Cpu QoS under the same datacenter one with limitation value 25 and second with 50
<qoss>
<qos type="cpu" href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/2571eeea-25a7-4b09-9c37-d82591733f26" id="2571eeea-25a7-4b09-9c37-d82591733f26">
<name>test_1</name>
<data_center href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916" id="f8f5eaee-8fd0-4b45-87db-62d61b03a916"/>
<cpu_limit>50</cpu_limit>
</qos>
<qos type="cpu" href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/883d876e-2038-4cd4-8c35-e9b52f2f4380" id="883d876e-2038-4cd4-8c35-e9b52f2f4380">
<name>test_2</name>
<data_center href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916" id="f8f5eaee-8fd0-4b45-87db-62d61b03a916"/>
<cpu_limit>25</cpu_limit>
</qos>
</qoss>
2. Create two cpu profile with different QoS in the same cluster
<cpu_profiles>
<cpu_profile href= "/ovirt-engine/api/cpuprofiles/5be5c0b7-5b91-4ac4-9d53-ef6f987bff05" id="5be5c0b7-5b91-4ac4-9d53-ef6f987bff05">
<name>test_1</name>
<qos href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/2571eeea-25a7-4b09-9c37-d82591733f26" id="2571eeea-25a7-4b09-9c37-d82591733f26"/>
<cluster href= "/ovirt-engine/api/clusters/67866b36-fd68-4106-8758-34cf31b0c3d4" id="67866b36-fd68-4106-8758-34cf31b0c3d4"/>
</cpu_profile>
<cpu_profile href= "/ovirt-engine/api/cpuprofiles/b015da68-b7a5-4a4b-8389-5cbc8ce58f73" id="b015da68-b7a5-4a4b-8389-5cbc8ce58f73">
<name>test_2</name>
<qos href= "/ovirt-engine/api/datacenters/f8f5eaee-8fd0-4b45-87db-62d61b03a916/qoss/883d876e-2038-4cd4-8c35-e9b52f2f4380" id="883d876e-2038-4cd4-8c35-e9b52f2f4380"/>
<cluster href= "/ovirt-engine/api/clusters/67866b36-fd68-4106-8758-34cf31b0c3d4" id="67866b36-fd68-4106-8758-34cf31b0c3d4"/>
</cpu_profile>
</cpu_profiles>
3. Create some vm, and run it first with first QoS and after with second.

First run:

<cpu_profile href= "/ovirt-engine/api/cpuprofiles/5be5c0b7-5b91-4ac4-9d53-ef6f987bff05" id="5be5c0b7-5b91-4ac4-9d53-ef6f987bff05"/>

dumpxml
<vcpu placement='static' current='4'>32</vcpu>
  <cputune>
    <shares>1020</shares>
    <period>12500</period>
    <quota>25000</quota>
  </cputune>

Second run:
<cpu_profile href= "/ovirt-engine/api/cpuprofiles/b015da68-b7a5-4a4b-8389-5cbc8ce58f73" id="b015da68-b7a5-4a4b-8389-5cbc8ce58f73"/>

dumpxml
<vcpu placement='static' current='4'>32</vcpu>
  <cputune>
    <shares>1020</shares>
    <period>12500</period>
    <quota>25000</quota>
  </cputune>

Actual results:
<quota> and <period> under cpu_tunning have the same values under dumpxml, for different limitation value

Expected results:
<quota> and <period> have different values under different limitations

Additional info:
ok, I will start with this that not really understand why we use this kind of formula:
period = anchor / #NumOfCpuInHost
quota = (anchor*(#userSelection/100)) / #numOfVcpusInVm
why we need this anchor, and why we change period time(default 1000000)
I played a little with values of period and quota and virsh create, and limitation work pretty well for formula:
period = default_value
quota = period * (pcpu/vcpu) * (limitation/100)
I check it on vm with 4 cpu's and on host with 8 cpu's
with limitation 10, 25 and 50
From some reason the same proportion, but with small period work not precious or  not work at all(seems to me like bug in cgroups)

Comment 1 Martin Sivák 2015-01-07 15:46:06 UTC
RHEL 7 uses libvirt-1.1.1 and the metadata xml feature that is needed for this to work seems to be missing from that version. I was told that it was originally included to libvirt-1.1.3 which did not make it into RHEL 7.

So currently the quota is always treated as 100% on RHEL 7 and the computed numbers (quota 25000, period 12500) cause no cpu usage throttling at all.

RHEL 6.6 should have the necessary libvirt feature backported and should therefore work properly.

Artyom: can you please retest with RHEL 6.6 hosts?

Comment 2 Martin Sivák 2015-01-07 15:48:45 UTC
I should add that I saw the proper values to bubble through VDSM APIs so it is really only an issue with the:

Thread-4352::DEBUG::2015-01-07 16:47:01,450::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.updateVmPolicy' in bridge with {u'params': {u'vmId': u'4d7aa507-1b32-4618-a5a2-884500dbbbc1', u'vcpuLimit': u'2'}, u'vmID': u'4d7aa507-1b32-4618-a5a2-884500dbbbc1'}

Thread-4352::DEBUG::2015-01-07 16:47:01,454::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 74 edom: 10 level: 2 message: argument unsupported: QEMU driver does not support <metadata> element

Thread-4352::ERROR::2015-01-07 16:47:01,454::vm::3821::vm.Vm::(_getVmPolicy) vmId=`4d7aa507-1b32-4618-a5a2-884500dbbbc1`::getVmPolicy failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3818, in _getVmPolicy
    METADATA_VM_TUNE_URI, 0)
  File "/usr/share/vdsm/virt/vm.py", line 689, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 942, in metadata
    if ret is None: raise libvirtError ('virDomainGetMetadata() failed', dom=self)
libvirtError: argument unsupported: QEMU driver does not support <metadata> element

Comment 3 Doron Fediuck 2015-01-08 13:39:42 UTC
No reason to block RC on a wrong libvirt version.

Comment 4 Michal Skrivanek 2015-01-08 15:06:45 UTC
what's the libvirt dependency?
do we expect 7.0.z update? if so, when?

Comment 5 Artyom 2015-01-11 10:38:23 UTC
for rhel6.6 it also not work:
Thread-131338::DEBUG::2015-01-11 12:35:18,673::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata element is not present
Thread-131338::ERROR::2015-01-11 12:35:18,675::__init__::493::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 488, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 284, in _dynamicMethod
    return self._fixupRet(className, methodName, ret)
  File "/usr/share/vdsm/rpc/Bridge.py", line 234, in _fixupRet
    self._typeFixup('return', retType, result)
  File "/usr/share/vdsm/rpc/Bridge.py", line 214, in _typeFixup
    if k in item:
TypeError: argument of type 'NoneType' is not iterable

So it also not receive limit from engine.

vdsm-4.16.8.1-5.el6ev.x86_64
libvirt-0.10.2-46.el6_6.2.x86_64

Comment 6 Artyom 2015-01-11 11:03:05 UTC
Actual only for RHEL6.6
After one minute I see that parameter updated to correct value, so error above not correct to QoS
I also see that metadata passed correct:
<metadata>
    <ovirt:qos xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
        <ovirt:vcpuLimit>10</ovirt:vcpuLimit>
</ovirt:qos>
And period and quota have correct values:

<period>12500</period>
<quota>2500</quota>

tested for 5, 10, 25 and 50 percents

Comment 7 Artyom 2015-01-11 11:19:45 UTC
I see that for error above we already have bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1142851

Comment 8 Doron Fediuck 2015-01-12 08:29:43 UTC
*** Bug 1179591 has been marked as a duplicate of this bug. ***

Comment 9 Doron Fediuck 2015-01-15 13:44:15 UTC
Moving to MODIFIED to wait for a relevant RHEL version with a new libvirt version.

Comment 10 Martin Sivák 2015-01-19 11:31:55 UTC
Moving to POST on eedri's request. It should be moved to MODIFIED once the libvirt version is available.

Comment 13 Nikolai Sednev 2015-02-01 16:17:04 UTC
Works for me on these components:
mom-0.4.1-4.el7ev.noarch
libvirt-client-1.1.1-29.el7_0.7.x86_64
sanlock-3.1.0-2.el7.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64
vdsm-4.16.8.1-6.el7ev.x86_64


rhevm-3.5.0-0.31.el6ev.noarch

RHEVH7.0 with these components not working:
qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64
sanlock-3.1.0-2.el7.x86_64
mom-0.4.1-4.el7ev.noarch
vdsm-4.16.8.1-6.el7ev.x86_64
libvirt-client-1.1.1-29.el7_0.4.x86_64

Please align RHEVHs to libvirt-client-1.1.1-29.el7_0.7.x86_64 or above.

Comment 14 Eyal Edri 2015-02-05 13:06:11 UTC
doron, libvirt errata is shipped live, can this bug move to ON_QA?

Comment 16 Nikolai Sednev 2015-02-08 14:59:37 UTC
On RHEL7.1 CPU SLA QOS is not working:
vdsClient -s 0 list table
virsh -r dumpxml StressVM1_CPU_RHEL7_1

<domain type='kvm' id='6'>                                                                                  
  <name>StressVM1_CPU_RHEL7_1</name>                                                                        
  <uuid>12b8466c-491b-49ff-a063-fe40a180ff4a</uuid>                                                         
  <metadata>                                                                                                
    <ovirt:qos xmlns:ovirt="http://ovirt.org/vm/tune/1.0">                                                  
        <ovirt:vcpuLimit>2</ovirt:vcpuLimit>                                                                
</ovirt:qos>                                                                                                
  </metadata>                                                                                               
  <memory unit='KiB'>1048576</memory>                                                                       
  <currentMemory unit='KiB'>1048576</currentMemory>                                                         
  <vcpu placement='static' current='4'>16</vcpu>                                                            
  <cputune>                                                                                                 
    <shares>1020</shares>                                                                                   
    <period>25000</period>                                                                                  
    <quota>25000</quota>                                                                                    
  </cputune>                                                                                                
  <resource>    

Components been used:
sanlock-3.2.2-2.el7.x86_64
qemu-kvm-rhev-2.1.2-23.el7.x86_64
libvirt-client-1.2.8-16.el7.x86_64
mom-0.4.1-4.el7ev.noarch
vdsm-4.16.8.1-6.el7ev.x86_64
Linux version 3.10.0-227.el7.x86_64 (mockbuild@x86-035.build.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Tue Jan 27 11:55:32 EST 2015
Linux alma03.qa.lab.tlv.redhat.com 3.10.0-227.el7.x86_64 #1 SMP Tue Jan 27 11:55:32 EST 2015 x86_64 x86_64 x86_64 GNU/Linux


rhevm-guest-agent-common-1.0.10-2.el6ev.noarch
rhevm-3.5.0-0.31.el6ev.noarch

Comment 18 Nikolai Sednev 2015-02-08 15:11:48 UTC
Previously I ran the guest VM with 4 virtual CPUs and feature not limited the CPU usage to 2%, although policy was set, I retested the same guest VM with 1 virtual CPU and feature works fine:
<domain type='kvm' id='7'>                             
  <name>StressVM1_CPU_RHEL7_1</name>                   
  <uuid>12b8466c-491b-49ff-a063-fe40a180ff4a</uuid>    
  <metadata>                                           
    <ovirt:qos xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
        <ovirt:vcpuLimit>2</ovirt:vcpuLimit>              
</ovirt:qos>                                              
  </metadata>                                             
  <memory unit='KiB'>1048576</memory>                     
  <currentMemory unit='KiB'>1048576</currentMemory>       
  <vcpu placement='static' current='1'>16</vcpu>          
  <cputune>                                               
    <shares>1020</shares>                                 
    <period>25000</period>                                
    <quota>2000</quota>                                   
  </cputune>                                              
  <resource>                                              
    <partition>/machine</partition>                       
  </resource>                                             
    <sysinfo type='smbios'>                               
      <system>                                            
        <entry name='manufacturer'>Red Hat</entry>        
        <entry name='product'>RHEV Hypervisor</entry>     
        <entry name='version'>7.1-0.3.el7</entry>         
        <entry name='serial'>4C4C4544-0059-4410-8053-B7C04F573032</entry>
        <entry name='uuid'>12b8466c-491b-49ff-a063-fe40a180ff4a</entry>  
      </system>                                                          
    </sysinfo>                             

Please pay attention on different quota values between two scenarios, for 4 virtual CPU quota value of 25000 received, whereas for 1 virtual CPU it's 2000.

Comment 22 Artyom 2015-02-09 14:48:31 UTC
Verified on libvirt-client-1.1.1-29.el7_0.7.x86_64
On two limits: 25, 50 and with different amount of cpu's

Comment 23 Eyal Edri 2015-02-17 17:10:32 UTC
rhev 3.5.0 was released. closing.


Note You need to log in before you can comment on or make changes to this bug.