Bug 1435218 - [scale] - getAllVmIoTunePolicies hit the performance
Summary: [scale] - getAllVmIoTunePolicies hit the performance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: ---
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.1.3
: 4.19.16
Assignee: Andrej Krejcir
QA Contact: Ilanit Stein
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-23 12:12 UTC by Eldad Marciano
Modified: 2017-07-14 03:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-07-06 13:31:28 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 74875 0 master MERGED io-tune: Cache io-tune policy in vdsm for faster reading 2017-04-20 12:53:35 UTC
oVirt gerrit 75706 0 ovirt-4.1 MERGED io-tune: Cache io-tune policy in vdsm for faster reading 2017-05-23 08:34:05 UTC

Description Eldad Marciano 2017-03-23 12:12:40 UTC
Description of problem:

on a loaded host with ~500 vms.
getAllVmIoTunePolicies hit the performance, and causing other vdsm tasks to be delayed.

this scenario led to TooManyTasks exception in vdsm.
ERROR (JsonRpcServer) [jsonrpc.JsonRpcServer] could not allocate request thread (__init__:626)

  File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 361, in put

    raise TooManyTasks()


In order to understand better whats going on we need to profile mom and provide better further information.

most of the workload becomes from mom, and while other vdsm tasks delayed, we see sometimes the host become non-responsive in the engine 

workaround:
turnoff mom momd  

Version-Release number of selected component (if applicable):
4.1.1
vdsm-4.19.9-1.el7ev.x86_64
libvirt-client-2.0.0-10.el7_3.4.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. running ~500 vms.


Actual results:
most of the workload becomes from mom, and while other vdsm tasks delayed, we see sometimes the host become non-responsive in the engine 


Expected results:
normal response time for getAllVmIoTunePolicies

Additional info:

Comment 2 Martin Sivák 2017-03-23 14:30:42 UTC
Mom executes this method once per 15 seconds. That is hardly unreasonable. MOM can't be optimized further, it needs the data and it needs them reasonably often.

This can be probably improved on the vdsm/libvirt side so the method finishes faster.

Comment 3 Michal Skrivanek 2017-03-24 09:54:45 UTC
The feature design is problematic.
VDSM has all configuration data, it doesn't need to go to libvirt for them (it's the only entity that sets them). Since mom is already using a bulk call for VM stats a trigger can be added to request iotune policies only when they actually change.
I suggest to take into consideration Francesco's collectd monitoring work as well.

In the short term you can increase the 15s interval

Comment 4 Martin Sivák 2017-03-29 10:43:25 UTC
I think a cache layer should indeed be in vdsm.

Comment 7 Eldad Marciano 2017-06-04 22:34:56 UTC
verified on top of:
vdsm-4.19.17-1.el7ev.x86_64
mom-0.5.9-1.el7ev.noarch

with 500 vms.

the overall CPU utilizaion @host is very stable.
the response time for 'vdsm-client Host getAllVmIoTunePolicies' is in milliseconds due to the cache fix! (same response time by vdsm logs)

what makes the CPU workload to be nicer and smoother.

no 'TooManyTasks' were found as described https://bugzilla.redhat.com/show_bug.cgi?id=1435218#c0

moving to verified.


Note You need to log in before you can comment on or make changes to this bug.