Bug 1265205

Summary: [scale] high vdsm threads overhead
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: vdsmAssignee: Francesco Romani <fromani>
Status: CLOSED ERRATA QA Contact: mlehrer
Severity: high Docs Contact:
Priority: high    
Version: 3.5.1CC: adahms, ahoness, bazulay, danken, ecohen, fromani, gklein, istein, juwu, lpeer, lsurette, mgoldboi, michal.skrivanek, mtessun, pstehlik, pzhukov, ybronhei, ycui, yeylon, ylavi
Target Milestone: ovirt-3.5.6Keywords: ZStream
Target Release: 3.5.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, VDSM uses a high number of system threads, and caused high resource usage making the system slow. With this update, a new option, cpu_affinity, in vdsm.conf was added so users can tune the CPU affinity. The option is disabled by default. To enable the new option, edit the vdsm.conf file. Under [vars] section, edit the 'cpu_affinity' option. It accepts a comma separated whitelist of CPU cores on which VDSM is allowed to run. The default is "", meaning VDSM can be scheduled by the operating system to run on any core. Some valid examples include: "1", "0,1", "0,2,3". The resource usage of VDSM dramatically improves when the cpu_affinity is enabled. It is recommended to enable the cpu_affinity option if VDSM uses too much CPU. It is also recommended to set the affinity to one CPU only and to avoid cpu #0, because other system task may default to that CPU.
Story Points: ---
Clone Of: 1247075 Environment:
Last Closed: 2015-12-01 20:40:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1247075    
Bug Blocks:    

Comment 1 Francesco Romani 2015-09-30 12:28:22 UTC
patch merged upstream -> MODIFIED

Comment 2 Francesco Romani 2015-10-06 11:43:40 UTC
moving back to MODIFIED. Patch 47013 fixes tests only, production code works OK.

Comment 3 Michal Skrivanek 2015-10-21 11:55:20 UTC
I want MODIFIED!
Bot, please leave me alone...:-)

Comment 5 Gil Klein 2015-11-10 16:38:38 UTC
Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
but the new vdsm configuration is not part of the default settings of vdsm.conf

To enable this feature we had to manually set ""cpu_affinity = 1"

The default settings on a new system are:

# rpm -qa|grep vdsm
vdsm-xmlrpc-4.16.29-1.el7ev.noarch
vdsm-yajsonrpc-4.16.29-1.el7ev.noarch
vdsm-cli-4.16.29-1.el7ev.noarch
vdsm-4.16.29-1.el7ev.x86_64
vdsm-python-zombiereaper-4.16.29-1.el7ev.noarch
vdsm-python-4.16.29-1.el7ev.noarch
vdsm-hook-ethtool-options-4.16.29-1.el7ev.noarch
vdsm-jsonrpc-4.16.29-1.el7ev.noarch

# cat vdsm.conf
[vars]
ssl = true

[addresses]
management_port = 54321

Francesco/Michal, Should we respin to enable it by default?

Comment 6 Francesco Romani 2015-11-10 17:22:37 UTC
(In reply to Gil Klein from comment #5)
> Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
> but the new vdsm configuration is not part of the default settings of
> vdsm.conf
> 
> To enable this feature we had to manually set ""cpu_affinity = 1"
[...] 

> # cat vdsm.conf
> [vars]
> ssl = true
> 
> [addresses]
> management_port = 54321
> 
> Francesco/Michal, Should we respin to enable it by default?

Please note that for 3.5, having it disabled by default was intentional (at least in my intentions :) ) because we are pretty deep in maintenance mode for 3.5.x series, so, despite the benefits we observed, this seemed like a too invasive change.

Furthermore, to enable it is very easy.

For 3.6.x and further we are discussing if enabling it by default on
https://bugzilla.redhat.com/show_bug.cgi?id=1279431

Comment 7 Francesco Romani 2015-11-10 17:23:19 UTC
restoring NEEDINFO on Michal as per comment 5.

Comment 8 Michal Skrivanek 2015-11-10 18:00:32 UTC
(In reply to Gil Klein from comment #5)
> Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
> but the new vdsm configuration is not part of the default settings

Intentionally

 Francesco/Michal, Should we respin to enable it by default?

We'd love to get more feedback before that. It is tracked in bug 1279431

Comment 9 Gil Klein 2015-11-10 18:20:38 UTC
(In reply to Michal Skrivanek from comment #8)
> (In reply to Gil Klein from comment #5)
> > Looks like the code fix made it in, to vdsm-4.16.29-1.el7ev.x86_64, 
> > but the new vdsm configuration is not part of the default settings
> 
> Intentionally
> 
>  Francesco/Michal, Should we respin to enable it by default?
> 
> We'd love to get more feedback before that. It is tracked in bug 1279431
Make sense. How do we make sure it is documented?

Comment 10 Julie 2015-11-11 00:38:43 UTC
If this bug requires doc text for errata release, please provide draft text in the doc text field in the following format:

Cause:
Consequence:
Fix:
Result:

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 11 mlehrer 2015-11-11 13:27:58 UTC
Update from scale:

Initial checks show cpu_affinity correctly limits vdsm utilization to 1 core.
Preparing VDSM regression workloads w/and without CPU_Affinity to verify VDSM behavior; will update relevant stakeholders afterward.

Comment 12 Julie 2015-11-15 23:43:58 UTC
hi Francesco,
     I have updated the doc text. Please let me know if you have any feedback. 
Also, I don't think we tell users to edit the vdsm.conf file for other options so I'm a bit confused if this feature should be added to the main docs suite. Including Andrew and Yaniv. Do we need a docs bug for this?

Cheers,
Julie

Comment 13 Francesco Romani 2015-11-16 13:57:22 UTC
(In reply to Julie from comment #12)
> hi Francesco,
>      I have updated the doc text. Please let me know if you have any
> feedback. 
> Also, I don't think we tell users to edit the vdsm.conf file for other
> options so I'm a bit confused if this feature should be added to the main
> docs suite. Including Andrew and Yaniv. Do we need a docs bug for this?
> 
> Cheers,
> Julie


Hi Julie. The doc text seems fine.
I think the user should be aware of this option, which may have a rather big impact on the node running VDSM.

However, I don't know the best way to convey this information, so I can't really help here.

Comment 15 Julie 2015-11-16 23:22:04 UTC
Hi Yaniv, 
   Please see #comment12.

Cheers,
Julie

Comment 16 Yaniv Lavi 2015-11-17 09:57:14 UTC
(In reply to Julie from comment #15)
> Hi Yaniv, 
>    Please see #comment12.
> 
> Cheers,
> Julie

I think a kbase on tuning this in scale machine is the best way to document this since we do not have a tuning guide.

Comment 17 mlehrer 2015-11-17 10:15:15 UTC
Scale re-run complete, updated info at docs/DOC-1055603

Showed:
Reduced VDSM Process CPU Utilization when CPU Affinity is set to 1.
Improved transactional throughput & http response time for VDSM workload.

QEMU-KVM utilization unrelated; caused by rate of VM start up.
Moving to verified.

Comment 19 errata-xmlrpc 2015-12-01 20:40:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2530.html