Bug 1564834 - [RFE] create a tuned profile for RHV
Summary: [RFE] create a tuned profile for RHV
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Milan Zamazal
QA Contact: Lukas Svaty
URL:
Whiteboard:
: 1565934 (view as bug list)
Depends On: 1569375
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-08 06:03 UTC by Yuval Turgeman
Modified: 2022-06-20 13:52 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-08 16:50:08 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3399201 0 None None None 2018-04-11 11:59:28 UTC

Description Yuval Turgeman 2018-04-08 06:03:17 UTC
Description of problem:
/etc/sysctl.d/vdsm.conf sets conflicting values from the virtual-host profile that ships in tuned:


[root@node-32465 ~]# grep dirty /etc/sysctl.d/vdsm.conf
# Set dirty page parameters
vm.dirty_ratio = 5
vm.dirty_background_ratio = 2

[root@node-32465 ~]# grep dirty /usr/lib/tuned/throughput-performance/tuned.conf 
# The generator of dirty data starts writeback at this percentage (system default
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
[root@node-32465 ~]# grep dirty /usr/lib/tuned/virtual-host/tuned.conf 
vm.dirty_background_ratio = 5
[root@node-32465 ~]# 


This causes `tuned-adm verify` to fail (unless setting reapply_sysctl=0 to /etc/tuned/tuned-main.conf) with the following error:

2018-04-08 06:01:31,946 ERROR    tuned.plugins.base: verify: failed: 'vm.dirty_ratio' = '5', expected '40'
2018-04-08 06:01:31,975 ERROR    tuned.plugins.base: verify: failed: 'vm.dirty_background_ratio' = '2', expected '5'

Comment 1 Tal Nisan 2018-04-09 09:23:06 UTC
I'm not sure it's Virt or SLA, please change if I got it wrong

Comment 2 Yaniv Kaul 2018-04-09 09:23:57 UTC
(In reply to Yuval Turgeman from comment #0)
> Description of problem:
> /etc/sysctl.d/vdsm.conf sets conflicting values from the virtual-host
> profile that ships in tuned:
> 
> 
> [root@node-32465 ~]# grep dirty /etc/sysctl.d/vdsm.conf
> # Set dirty page parameters
> vm.dirty_ratio = 5
> vm.dirty_background_ratio = 2

It's time we remove these. They were relevant, perhaps, in RHEL 6 times.
Nir - thoughts?

Comment 3 Yaniv Kaul 2018-04-09 09:25:05 UTC
They were added here:
commit f4c534d78d7b88f9f4e31cecdc13d15d392421ac
Author: Federico Simoncelli <fsimonce>
Date:   Mon Sep 26 15:28:36 2011 +0000

    BZ#740887 Tune cache dirty ratio
    
    Tuning the dirty_ratio and dirty_background_ratio kernel parameters
    increases I/O throughput from the guests, improves fairness between
    the guests and reduces the ability of a buffered writer to starve
    guests.
    
    Change-Id: Ibf5c8e4c0637c60092b89fba103b96b37bdafaa0
    Reviewed-on: http://gerrit.usersys.redhat.com/970
    Reviewed-by: Dan Kenigsberg <danken>
    Tested-by: Dan Kenigsberg <danken>

Comment 4 Nir Soffer 2018-04-10 13:48:36 UTC
According to https://bugzilla.redhat.com/show_bug.cgi?id=740887#c6 this was 
suggested by the performance team. It is possible that this is not needed in 
RHEL 7 but we don't have any evidence for that.

If we want to remove this we need ack from the performance team.

Comment 5 Yaniv Kaul 2018-04-10 13:51:57 UTC
(In reply to Nir Soffer from comment #4)
> According to https://bugzilla.redhat.com/show_bug.cgi?id=740887#c6 this was 
> suggested by the performance team. It is possible that this is not needed in 
> RHEL 7 but we don't have any evidence for that.
> 
> If we want to remove this we need ack from the performance team.

Do we have something newer for RHEL 7?

Comment 6 Sanjay Rao 2018-04-10 14:10:26 UTC
I am ok with the settings that are part of Virtual Host profile. I was not aware that vdsmd was attempting to change them. 

Lowering the dirty_background_ratio to 2 is not necessarily a bad thing. It will start flushing dirty blocks in the background soon. 

But lowering dirty ratio to 5, is very aggressive and it can cause hosts to freeze every time the limit is hit, to complete flushing the data. That's why we recommended 10 for dirty_ratio.

Was there a reason why vdsmd was attempting to tweak these values?

Comment 7 Nir Soffer 2018-04-10 14:18:17 UTC
Sanjay, see https://bugzilla.redhat.com/show_bug.cgi?id=740887 for the reasons for
these settings.

Comment 8 Sanjay Rao 2018-04-10 14:37:45 UTC
I think the reasoning makes sense because we want more free allocatable memory for VMs. So lowering the dirty_background_ratio makes sense. In fact on large memory system, we advise customers to even use dirty_background_bytes. 

For large memory systems, dirty_ratio=5 might also make sense but doing it across all implementations might cause the issue I mentioned in my comment above. We can consider changing dirty_background_ratio to 2 in virtual host profile and leave dirty_ratio at 10.

I also see that swappiness is set to 10 in Virtual Host profile. That's also an important setting in reclaiming memory and putting it on freelist so that new VMs can have memory when they are started. That value can be lowered to 5. That increases CPU consumption but it prevents swapping on busy systems which is very expensive so it is worth it.

Comment 9 Ryan Barry 2018-04-11 11:59:29 UTC
*** Bug 1565934 has been marked as a duplicate of this bug. ***

Comment 10 Michal Skrivanek 2018-04-19 07:39:40 UTC
(In reply to Sanjay Rao from comment #8)
> I think the reasoning makes sense because we want more free allocatable
> memory for VMs. So lowering the dirty_background_ratio makes sense. In fact
> on large memory system, we advise customers to even use
> dirty_background_bytes. 
> 
> For large memory systems, dirty_ratio=5 might also make sense but doing it
> across all implementations might cause the issue I mentioned in my comment
> above. We can consider changing dirty_background_ratio to 2 in virtual host
> profile and leave dirty_ratio at 10.
> 
> I also see that swappiness is set to 10 in Virtual Host profile. That's also
> an important setting in reclaiming memory and putting it on freelist so that
> new VMs can have memory when they are started. That value can be lowered to
> 5. That increases CPU consumption but it prevents swapping on busy systems
> which is very expensive so it is worth it.

I opened bug 1569375 on tuned to update. Once there we can remove the modifications in vdsm

Comment 11 Martin Tessun 2018-06-13 13:01:27 UTC
As said in comment#10 we should have the correct tuned profile for hosts.
Currently Performance QE is working on a replacement/update on the virtualization-host profile.

Comment 13 Ryan Barry 2019-01-21 14:54:01 UTC
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both

Comment 15 Daniel Gur 2019-08-28 13:13:56 UTC
sync2jira

Comment 16 Daniel Gur 2019-08-28 13:18:11 UTC
sync2jira

Comment 17 Ryan Barry 2019-08-30 12:45:04 UTC
Since the platform changes merged but still don't align, Martin suggested creating our own profile and shipping it.

Comment 18 Milan Zamazal 2021-08-31 16:29:43 UTC
Looking at the values currently reported from `tuned-adm verify', we are at the same place as in the beginning now. But as I interpret https://bugzilla.redhat.com/show_bug.cgi?id=1569375#c5, the platform believes the virtual-host profile settings are right.

Insisting on our own values based on 10 years old measurements performed on NFS in RHEL 6 and making our own profile to resolve the discrepancy without further evidence doesn't sound very wise. Which doesn't mean our settings aren't still helpful in typical RHV use cases. I just can't see convincing facts anywhere that would tell us whether to follow Comment 10 or Comment 17. Claims like "the settings are fine" or "we don't align so let's create our own profile" don't seem to provide enough ground for an informed decision. Is there any background information that would help clarify what and *why* is the right thing to do?

Comment 19 Nir Soffer 2021-09-01 09:19:11 UTC
I think we should use the existing virtual-host profile and if it is not
good enough the tuned folks should improve it. This will benefit the entire
platform instead of only RHV.

If we have performance results with RHEL 8.4+ showing that current virtual-host
profile is worse, and there is no way to improve it, it does make sense to
create our own profile instead of modifying /etc/sysctl.d/ directly, which
breaks tuned-adm.

The profile is basically:

    $ cat /etc/tuned/ovirt-host/tuned.conf
    [main]
    summary=Optimize for running KVM guests on oVirt host
    include=virtual-host

    [sysctl]
    # Providing balance of low response time and high throughput when using
    # NFS storage.
    # See https://bugzilla.redhat.com/740887#c6
    vm.dirty_ratio = 5
    vm.dirty_background_ratio = 2

Comment 20 Milan Zamazal 2021-09-01 10:17:45 UTC
(In reply to Nir Soffer from comment #19)
> I think we should use the existing virtual-host profile and if it is not
> good enough the tuned folks should improve it. This will benefit the entire
> platform instead of only RHV.
> 
> If we have performance results with RHEL 8.4+ showing that current
> virtual-host
> profile is worse, and there is no way to improve it, it does make sense to
> create our own profile instead of modifying /etc/sysctl.d/ directly, which
> breaks tuned-adm.

This sounds like the best way to handle it. I just wonder about Comment 17.

Martin, do you remember why we still wanted to push our own profile, as mentioned in Comment 17?

Comment 21 Sandro Bonazzola 2022-03-29 16:16:40 UTC
We are past 4.5.0 feature freeze, please re-target.

Comment 22 Michal Skrivanek 2022-04-08 16:50:08 UTC
no updates for a long time, missed 4.5 GA, closing


Note You need to log in before you can comment on or make changes to this bug.