Bug 1265205 - [scale] high vdsm threads overhead
[scale] high vdsm threads overhead
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.5.1
Unspecified Unspecified
high Severity high
: ovirt-3.5.6
: 3.5.6
Assigned To: Francesco Romani
mlehrer
virt
: ZStream
Depends On: 1247075
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-22 07:34 EDT by rhev-integ
Modified: 2016-02-10 14:24 EST (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, VDSM uses a high number of system threads, and caused high resource usage making the system slow. With this update, a new option, cpu_affinity, in vdsm.conf was added so users can tune the CPU affinity. The option is disabled by default. To enable the new option, edit the vdsm.conf file. Under [vars] section, edit the 'cpu_affinity' option. It accepts a comma separated whitelist of CPU cores on which VDSM is allowed to run. The default is "", meaning VDSM can be scheduled by the operating system to run on any core. Some valid examples include: "1", "0,1", "0,2,3". The resource usage of VDSM dramatically improves when the cpu_affinity is enabled. It is recommended to enable the cpu_affinity option if VDSM uses too much CPU. It is also recommended to set the affinity to one CPU only and to avoid cpu #0, because other system task may default to that CPU.
Story Points: ---
Clone Of: 1247075
Environment:
Last Closed: 2015-12-01 15:40:23 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 45282 None None None Never
oVirt gerrit 45738 master MERGED scale: limit cpu usage using cpu-affinity Never
oVirt gerrit 46502 ovirt-3.6 MERGED scale: limit cpu usage using cpu-affinity Never
oVirt gerrit 46522 ovirt-3.5 MERGED scale: limit cpu usage using cpu-affinity Never
oVirt gerrit 47013 None MERGED takset: fix taskset and tests on python 2.6 Never

  None (edit)
Comment 1 Francesco Romani 2015-09-30 08:28:22 EDT
patch merged upstream -> MODIFIED
Comment 2 Francesco Romani 2015-10-06 07:43:40 EDT
moving back to MODIFIED. Patch 47013 fixes tests only, production code works OK.
Comment 3 Michal Skrivanek 2015-10-21 07:55:20 EDT
I want MODIFIED!
Bot, please leave me alone...:-)
Comment 5 Gil Klein 2015-11-10 11:38:38 EST
Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
but the new vdsm configuration is not part of the default settings of vdsm.conf

To enable this feature we had to manually set ""cpu_affinity = 1"

The default settings on a new system are:

# rpm -qa|grep vdsm
vdsm-xmlrpc-4.16.29-1.el7ev.noarch
vdsm-yajsonrpc-4.16.29-1.el7ev.noarch
vdsm-cli-4.16.29-1.el7ev.noarch
vdsm-4.16.29-1.el7ev.x86_64
vdsm-python-zombiereaper-4.16.29-1.el7ev.noarch
vdsm-python-4.16.29-1.el7ev.noarch
vdsm-hook-ethtool-options-4.16.29-1.el7ev.noarch
vdsm-jsonrpc-4.16.29-1.el7ev.noarch

# cat vdsm.conf
[vars]
ssl = true

[addresses]
management_port = 54321

Francesco/Michal, Should we respin to enable it by default?
Comment 6 Francesco Romani 2015-11-10 12:22:37 EST
(In reply to Gil Klein from comment #5)
> Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
> but the new vdsm configuration is not part of the default settings of
> vdsm.conf
> 
> To enable this feature we had to manually set ""cpu_affinity = 1"
[...] 

> # cat vdsm.conf
> [vars]
> ssl = true
> 
> [addresses]
> management_port = 54321
> 
> Francesco/Michal, Should we respin to enable it by default?

Please note that for 3.5, having it disabled by default was intentional (at least in my intentions :) ) because we are pretty deep in maintenance mode for 3.5.x series, so, despite the benefits we observed, this seemed like a too invasive change.

Furthermore, to enable it is very easy.

For 3.6.x and further we are discussing if enabling it by default on
https://bugzilla.redhat.com/show_bug.cgi?id=1279431
Comment 7 Francesco Romani 2015-11-10 12:23:19 EST
restoring NEEDINFO on Michal as per comment 5.
Comment 8 Michal Skrivanek 2015-11-10 13:00:32 EST
(In reply to Gil Klein from comment #5)
> Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
> but the new vdsm configuration is not part of the default settings

Intentionally

 Francesco/Michal, Should we respin to enable it by default?

We'd love to get more feedback before that. It is tracked in bug 1279431
Comment 9 Gil Klein 2015-11-10 13:20:38 EST
(In reply to Michal Skrivanek from comment #8)
> (In reply to Gil Klein from comment #5)
> > Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
> > but the new vdsm configuration is not part of the default settings
> 
> Intentionally
> 
>  Francesco/Michal, Should we respin to enable it by default?
> 
> We'd love to get more feedback before that. It is tracked in bug 1279431
Make sense. How do we make sure it is documented?
Comment 10 Julie 2015-11-10 19:38:43 EST
If this bug requires doc text for errata release, please provide draft text in the doc text field in the following format:

Cause:
Consequence:
Fix:
Result:

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.
Comment 11 mlehrer 2015-11-11 08:27:58 EST
Update from scale:

Initial checks show cpu_affinity correctly limits vdsm utilization to 1 core.
Preparing VDSM regression workloads w/and without CPU_Affinity to verify VDSM behavior; will update relevant stakeholders afterward.
Comment 12 Julie 2015-11-15 18:43:58 EST
hi Francesco,
     I have updated the doc text. Please let me know if you have any feedback. 
Also, I don't think we tell users to edit the vdsm.conf file for other options so I'm a bit confused if this feature should be added to the main docs suite. Including Andrew and Yaniv. Do we need a docs bug for this?

Cheers,
Julie
Comment 13 Francesco Romani 2015-11-16 08:57:22 EST
(In reply to Julie from comment #12)
> hi Francesco,
>      I have updated the doc text. Please let me know if you have any
> feedback. 
> Also, I don't think we tell users to edit the vdsm.conf file for other
> options so I'm a bit confused if this feature should be added to the main
> docs suite. Including Andrew and Yaniv. Do we need a docs bug for this?
> 
> Cheers,
> Julie


Hi Julie. The doc text seems fine.
I think the user should be aware of this option, which may have a rather big impact on the node running VDSM.

However, I don't know the best way to convey this information, so I can't really help here.
Comment 15 Julie 2015-11-16 18:22:04 EST
Hi Yaniv, 
   Please see #comment12.

Cheers,
Julie
Comment 16 Yaniv Lavi 2015-11-17 04:57:14 EST
(In reply to Julie from comment #15)
> Hi Yaniv, 
>    Please see #comment12.
> 
> Cheers,
> Julie

I think a kbase on tuning this in scale machine is the best way to document this since we do not have a tuning guide.
Comment 17 mlehrer 2015-11-17 05:15:15 EST
Scale re-run complete, updated info at docs/DOC-1055603

Showed:
Reduced VDSM Process CPU Utilization when CPU Affinity is set to 1.
Improved transactional throughput & http response time for VDSM workload.

QEMU-KVM utilization unrelated; caused by rate of VM start up.
Moving to verified.
Comment 19 errata-xmlrpc 2015-12-01 15:40:23 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2530.html

Note You need to log in before you can comment on or make changes to this bug.