Bug 1516123 - tuned-adm timeout while adding the host in manager and the deployment will fail/take time to complete
Summary: tuned-adm timeout while adding the host in manager and the deployment will fa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: redhat-release-rhev-hypervisor
Version: 4.1.7
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ovirt-4.2.1
: ---
Assignee: Yuval Turgeman
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On: 1443142 1523194
Blocks: CEECIR_RHV43_proposed 1553258
TreeView+ depends on / blocked
 
Reported: 2017-11-22 06:04 UTC by nijin ashok
Modified: 2019-05-16 13:09 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
This update ensures that tuned.service is enabled by default to enable tuned-adm to set the active profile.
Clone Of:
: 1553258 (view as bug list)
Environment:
Last Closed: 2018-05-15 17:57:44 UTC
oVirt Team: Node
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3295251 0 None None None 2017-12-20 10:05:25 UTC
Red Hat Product Errata RHSA-2018:1524 0 None None None 2018-05-15 17:58:49 UTC
oVirt gerrit 86356 0 master MERGED Enable tuned.service by default 2020-12-17 01:45:33 UTC
oVirt gerrit 86537 0 ovirt-4.2 MERGED Enable tuned.service by default 2020-12-17 01:45:31 UTC
oVirt gerrit 88606 0 ovirt-4.1 MERGED Enable tuned.service by default 2020-12-17 01:45:33 UTC

Description nijin ashok 2017-11-22 06:04:56 UTC
Description of problem:

Adding a newly installed RHV-H host (rhvh-4.1-0.20171101) in a manager is failing/taking time to complete since "tuned-adm profile virtual-host" command timeout after 10 minutes. 

In 4.1.1, the manager will close the connection before the timeout causing the whole installation to break. However, if you try the installation again, it will work fine. Or if you start the tuned service manually before adding the host to manager, the installation work fine.

Below are the relevant logs in 4.1.1 manager.

===
Deploy log

2017-11-22 10:40:27 DEBUG otopi.plugins.ovirt_host_deploy.tune.tuned plugin.executeRaw:813 execute: ('/sbin/tuned-adm', 'profile', 'virtual-host'), executable='None', cwd='None', env=None
2017-11-22 10:50:28 DEBUG otopi.plugins.ovirt_host_deploy.tune.tuned plugin.executeRaw:863 execute-result: ('/sbin/tuned-adm', 'profile', 'virtual-host'), rc=1
2017-11-22 10:50:28 DEBUG otopi.plugins.ovirt_host_deploy.tune.tuned plugin.execute:921 execute-output: ('/sbin/tuned-adm', 'profile', 'virtual-host') stdout:
Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.
2017-11-22 10:50:28 WARNING otopi.plugins.ovirt_host_deploy.tune.tuned tuned._misc:108 Cannot set tuned profile

< --- Since manager closed the connection because of tuned issue, the signing of certifiacte failed --->

2017-11-22 10:50:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       ***Q:MULTI-STRING VDSM_CERTIFICATE_CHAIN --=451b80dc-996f-432e-9e4f-2b29ef6d1141=-- --=451b80dc-996f-ABORT-9e4f-2b29ef6d1141=--
2017-11-22 10:50:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%QEnd: VDSM_CERTIFICATE_CHAIN
2017-11-22 10:50:28 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-nXRZZmXmQ8/pythonlib/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/tmp/ovirt-nXRZZmXmQ8/otopi-plugins/ovirt-host-common/vdsm/pki.py", line 319, in _misc
    '\n\nPlease input VDSM certificate chain that '
  File "/tmp/ovirt-nXRZZmXmQ8/otopi-plugins/otopi/dialog/machine.py", line 245, in queryMultiString
    v = self._readline()
  File "/tmp/ovirt-nXRZZmXmQ8/pythonlib/otopi/dialog.py", line 248, in _readline
    raise IOError(_('End of file'))
IOError: End of file

Engine log

2017-11-22 00:10:25,575-05 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [355b20ec] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Correlation ID: 355b20ec, Call Stack: null, Custom Event ID: -1, Message: Installing Host 10.74.130.130. Setting time.
2017-11-22 00:14:57,803-05 INFO  [org.ovirt.engine.core.sso.servlets.OAuthRevokeServlet] (default task-101) [] User admin@internal successfully logged out
2017-11-22 00:14:57,821-05 INFO  [org.ovirt.engine.core.bll.aaa.TerminateSessionsForTokenCommand] (default task-102) [37a1cc49] Running command: TerminateSessionsForTokenCommand internal: true.
2017-11-22 00:20:06,835-05 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [355b20ec] Error during deploy dialog: java.io.IOException: Unexpected connection termination
===

However, in 4.1.7, the RHV-M is atleast waiting for the tuned to complete and hence deployment is successful however it's taking extra 10 minutes.

===
Deploy log

2017-11-22 09:51:26 DEBUG otopi.plugins.ovirt_host_deploy.tune.tuned plugin.executeRaw:813 execute: ('/sbin/tuned-adm', 'profile', 'virtual-host'), executable='None', cwd='None', env=None

<---- Took 10 minutes --->

2017-11-22 10:01:27 DEBUG otopi.plugins.ovirt_host_deploy.tune.tuned plugin.executeRaw:863 execute-result: ('/sbin/tuned-adm', 'profile', 'virtual-host'), rc=1
2017-11-22 10:01:27 DEBUG otopi.plugins.ovirt_host_deploy.tune.tuned plugin.execute:921 execute-output: ('/sbin/tuned-adm', 'profile', 'virtual-host') stdout:
Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.

Engine log

2017-11-22 09:51:24,832+05 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [44385486] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Correlation ID: 44385486, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Installing Host 10.74.130.130. Setting time.

<---- Took 10 minutes --->

2017-11-22 10:01:28,263+05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [44385486] EVENT_ID: VDS_INSTALL_IN_PROGRESS_WARNING(510), Correlation ID: 44385486, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host 10.74.130.130 installation in progress . Cannot set tuned profile.
===

So this happens only when you execute tuned-adm immediately after starting the tuned service.

We can replicate the same scenario by running the below command which will hang for 10 minutes.

# systemctl start tuned;tuned-adm profile virtual-host
Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.

However, if you just add an additional one second delay between the commands, it will work. Below command is working fine.

systemctl start tuned;sleep 1;tuned-adm profile virtual-host

As per the logs of tuned-adm, it's setting the profile correctly but it's not getting out of the shell. I can see below logs in the tuneadm logs and it looks like profile is applied correctly.

==
2017-11-22 11:09:40,509 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2017-11-22 11:09:40,510 DEBUG    tuned.utils.commands: Executing ['sysctl', '--system'].
2017-11-22 11:09:40,540 INFO     tuned.daemon.daemon: static tuning from profile 'virtual-host' applied

==

The strange thing is it's not reproducible on an RHEL host which is having the exact set of packages. I have the below set of packages in both RHEL and RHV-H.

===
rpm -qa|egrep -i "dbus|tuned"
abrt-dbus-2.1.11-48.el7.x86_64
dbus-python-1.1.1-9.el7.x86_64
tuned-2.8.0-5.el7.noarch
dbus-libs-1.6.12-17.el7.x86_64
dbus-glib-0.100-7.el7.x86_64
python-slip-dbus-0.4.0-2.el7.noarch
dbus-1.6.12-17.el7.x86_64
===

I was not able to understand why it's only happening with RHV-H.


Version-Release number of selected component (if applicable):

ovirt-host-deploy-1.6.7-1.el7ev.noarch
tuned-2.8.0-5.el7.noarch

How reproducible:

100%

Steps to Reproduce:

1. Add a fresh, newly installed RHV-H host in a manager. The installation will take extra 10 minutes in tuned-adm command or it will fail.

To get the tuned timeout error on the command line on an already existing host, you have to follow the below steps.

tuned-adm off
systemctl stop tuned
systemctl start tuned;tuned-adm profile virtual-host


Actual results:

Installation is failing/taking time to complete.


Expected results:

Installation should work.

Additional info:

Comment 2 Yaniv Kaul 2017-11-22 06:57:12 UTC
Why isn't this a tuned bug?
Specifically, I think it may have to do with https://bugzilla.redhat.com/show_bug.cgi?id=1258868

Comment 3 nijin ashok 2017-11-22 07:08:00 UTC
(In reply to Yaniv Kaul from comment #2)
> Why isn't this a tuned bug?
> Specifically, I think it may have to do with
> https://bugzilla.redhat.com/show_bug.cgi?id=1258868

Somehow, I am unable to reproduce this on an RHEL server with the same set of packages and it only happens with RHV-H when I tried. That's the reason I opened it for RHV.

I can surely open for tuned if that's needed.

Comment 5 Sandro Bonazzola 2017-11-22 07:52:50 UTC
Moving to node team since this seems to affect RHV-H only

Comment 6 Yuval Turgeman 2017-11-22 12:25:13 UTC
Reproduced this on a Centos7.4 system well

Comment 7 Yuval Turgeman 2017-11-22 12:25:44 UTC
as well*

Comment 8 Yedidyah Bar David 2017-11-22 13:10:19 UTC
Managed to reproduce this once, but not more than that.

I had a CentOS 7.4 VM installed on Oct 25, rebooted a few times, didn't do much otherwise on it. When I tried it was up for something like 9 days.

When I started, tuned was up, and had (the default?) profile virtual-guest.

Did this:

service tuned stop
service tuned start
tuned-adm profile virtual-host

It seemed stuck, but when I checked tuned.log it showed that it did accept the command and handled it, just like in the attached logs.

Tried several times to reproduce, with and without strace on the tuned-adm command, and it was always quick, no delay. Tried after reboot, same. Tried to reboot the VM from a snapshot I took shortly after installing it, same. Perhaps it's a timing issue or something like that.

Anyway, it was CentOS, not node/RHVH.

Comment 9 Yedidyah Bar David 2017-11-22 13:23:56 UTC
If we have a clear and 100% reliable reproducer, we might be able to come up with some workaround - but there isn't a bug in host-deploy. All it does is:

        self.services.state('tuned', True)
        rc, stdout, stderr = self.execute(
            (   
                self.command.get('tuned-adm'),
                'profile',
                self._profile,
            ),
            raiseOnError=False,
        )
        if rc != 0:
            self.logger.warning(_('Cannot set tuned profile'))
        else:
            self.services.startup('tuned', True)

Last relevant change was to not fail if tuned-adm fails, ~ 5 years ago:

https://gerrit.ovirt.org/10444

So at some point someone decided it's not critical.

A possible workaround is to call, instead of self.execute, self.executePipeRaw, which has a parameter 'timeout', and pass there some value, say 30 seconds.

Comment 10 Yaniv Kaul 2017-11-22 13:26:35 UTC
(In reply to Yedidyah Bar David from comment #9)
> If we have a clear and 100% reliable reproducer, we might be able to come up
> with some workaround - but there isn't a bug in host-deploy. All it does is:

Restarting dbus (if it's indeed the same issue as I mentioned in comment 2 ) helps.

> 
>         self.services.state('tuned', True)
>         rc, stdout, stderr = self.execute(
>             (   
>                 self.command.get('tuned-adm'),
>                 'profile',
>                 self._profile,
>             ),
>             raiseOnError=False,
>         )
>         if rc != 0:
>             self.logger.warning(_('Cannot set tuned profile'))
>         else:
>             self.services.startup('tuned', True)
> 
> Last relevant change was to not fail if tuned-adm fails, ~ 5 years ago:
> 
> https://gerrit.ovirt.org/10444
> 
> So at some point someone decided it's not critical.
> 
> A possible workaround is to call, instead of self.execute,
> self.executePipeRaw, which has a parameter 'timeout', and pass there some
> value, say 30 seconds.

Comment 11 Yuval Turgeman 2017-11-22 13:52:37 UTC
I could only reproduce this with `tuned-adm off` before stopping tuned.
IIUC tuned-adm sends a message to dbus and waits for a "profile changed" response that never happens I'm guessing because the daemon is firing up so dbus can't find it.  I tried with --async and it looks like the profile is set correctly (tuned-adm verify is ok).

Comment 12 Yaniv Kaul 2017-12-05 18:51:55 UTC
(In reply to Yuval Turgeman from comment #11)
> I could only reproduce this with `tuned-adm off` before stopping tuned.
> IIUC tuned-adm sends a message to dbus and waits for a "profile changed"
> response that never happens I'm guessing because the daemon is firing up so
> dbus can't find it.  I tried with --async and it looks like the profile is
> set correctly (tuned-adm verify is ok).

This is a tuned-adm bug - can we move this to tuned?

Comment 13 Yuval Turgeman 2017-12-06 13:17:57 UTC
It's probably tuned or dbus, but we should definitely move this.  The question is, why do we use tuned for virtual-host profile if vdsm manages its own kernel params with /etc/sysctl.d/vdsm.conf ?  I mean, if we run `tuned-adm verify` after setting the virtual-host profile, it would fail, because vdsm.conf overrides some parameters.  Not that I'm against tuned, but I think it's better to have one place (either tuned or sysctl.d) that sets those params.

Comment 14 Yaniv Kaul 2018-01-03 07:54:29 UTC
(In reply to Yuval Turgeman from comment #13)
> It's probably tuned or dbus, but we should definitely move this.  The
> question is, why do we use tuned for virtual-host profile if vdsm manages
> its own kernel params with /etc/sysctl.d/vdsm.conf ?  I mean, if we run
> `tuned-adm verify` after setting the virtual-host profile, it would fail,
> because vdsm.conf overrides some parameters.  Not that I'm against tuned,
> but I think it's better to have one place (either tuned or sysctl.d) that
> sets those params.

I agree, and I think it should be tuned. Can you specify the different setting we have? I assume we have a good reason which may not be applicable to others (OpenStack) from diverging from virtual-host profile.

In any case, can you move the bug to tuned?

Comment 15 Yuval Turgeman 2018-01-03 12:07:15 UTC
Sure, only one difference:

static/etc/sysctl.d/vdsm.conf:vm.dirty_background_ratio = 2

usr/lib/tuned/virtual-host/tuned.conf:vm.dirty_background_ratio = 5

I opened bug 1523194 for tuned, and added it here as "depends on", do you want to move this bug to tuned as well ?

Comment 16 Yaniv Kaul 2018-01-03 12:09:51 UTC
(In reply to Yuval Turgeman from comment #15)
> Sure, only one difference:
> 
> static/etc/sysctl.d/vdsm.conf:vm.dirty_background_ratio = 2
> 
> usr/lib/tuned/virtual-host/tuned.conf:vm.dirty_background_ratio = 5

Nir, any idea why we have a different value in VDSM than tuned for this parameter?

> 
> I opened bug 1523194 for tuned, and added it here as "depends on", do you
> want to move this bug to tuned as well ?

We can probably close this bug, or use it as 'create a dependency on tuned version XYZ' kind of bug.

Comment 17 Nir Soffer 2018-01-21 18:18:36 UTC
(In reply to Yaniv Kaul from comment #16)
> (In reply to Yuval Turgeman from comment #15)
> Nir, any idea why we have a different value in VDSM than tuned for this
> parameter?

These settings were added for bug 740887, suggested by the performance team
for rhel 6.x.

I don't know if these are needed for rhel 7 and can be replaced by dynamic setting
by tuned.

Dan, what do you think?

Comment 18 Petr Matyáš 2018-01-24 14:09:18 UTC
Verified on redhat-release-virtualization-host-4.2-0.6.el7.x86_64

Comment 23 errata-xmlrpc 2018-05-15 17:57:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1524

Comment 24 Franta Kust 2019-05-16 13:09:33 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.