Bug 1553258
Summary: | [downstream clone - 4.1.10] tuned-adm timeout while adding the host in manager and the deployment will fail/take time to complete | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
Component: | redhat-release-rhev-hypervisor | Assignee: | Yuval Turgeman <yturgema> |
Status: | CLOSED ERRATA | QA Contact: | Pavol Brilla <pbrilla> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.1.7 | CC: | cshao, danken, dfediuck, didi, dougsland, eheftman, fgarciad, lsurette, lveyde, mgoldboi, mkalinin, nashok, nsoffer, pstehlik, rbalakri, rbarry, Rhev-m-bugs, srevivo, ycui, ykaul, yturgema |
Target Milestone: | ovirt-4.1.10 | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Previously, when adding a new host, ovirt-host-deploy failed because the active profile was not set by tuned.service.
In this release, tuned.service will be enabled by default, enabling ovirt-host-deploy to complete successfully when adding a new host.
|
Story Points: | --- |
Clone Of: | 1516123 | Environment: | |
Last Closed: | 2018-03-20 16:41:23 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1516123 | ||
Bug Blocks: |
Description
RHV bug bot
2018-03-08 15:16:08 UTC
Why isn't this a tuned bug? Specifically, I think it may have to do with https://bugzilla.redhat.com/show_bug.cgi?id=1258868 (Originally by Yaniv Kaul) (In reply to Yaniv Kaul from comment #2) > Why isn't this a tuned bug? > Specifically, I think it may have to do with > https://bugzilla.redhat.com/show_bug.cgi?id=1258868 Somehow, I am unable to reproduce this on an RHEL server with the same set of packages and it only happens with RHV-H when I tried. That's the reason I opened it for RHV. I can surely open for tuned if that's needed. (Originally by Nijin Ashok) Moving to node team since this seems to affect RHV-H only (Originally by Sandro Bonazzola) Reproduced this on a Centos7.4 system well (Originally by Yuval Turgeman) as well* (Originally by Yuval Turgeman) Managed to reproduce this once, but not more than that. I had a CentOS 7.4 VM installed on Oct 25, rebooted a few times, didn't do much otherwise on it. When I tried it was up for something like 9 days. When I started, tuned was up, and had (the default?) profile virtual-guest. Did this: service tuned stop service tuned start tuned-adm profile virtual-host It seemed stuck, but when I checked tuned.log it showed that it did accept the command and handled it, just like in the attached logs. Tried several times to reproduce, with and without strace on the tuned-adm command, and it was always quick, no delay. Tried after reboot, same. Tried to reboot the VM from a snapshot I took shortly after installing it, same. Perhaps it's a timing issue or something like that. Anyway, it was CentOS, not node/RHVH. (Originally by didi) If we have a clear and 100% reliable reproducer, we might be able to come up with some workaround - but there isn't a bug in host-deploy. All it does is: self.services.state('tuned', True) rc, stdout, stderr = self.execute( ( self.command.get('tuned-adm'), 'profile', self._profile, ), raiseOnError=False, ) if rc != 0: self.logger.warning(_('Cannot set tuned profile')) else: self.services.startup('tuned', True) Last relevant change was to not fail if tuned-adm fails, ~ 5 years ago: https://gerrit.ovirt.org/10444 So at some point someone decided it's not critical. A possible workaround is to call, instead of self.execute, self.executePipeRaw, which has a parameter 'timeout', and pass there some value, say 30 seconds. (Originally by didi) (In reply to Yedidyah Bar David from comment #9) > If we have a clear and 100% reliable reproducer, we might be able to come up > with some workaround - but there isn't a bug in host-deploy. All it does is: Restarting dbus (if it's indeed the same issue as I mentioned in comment 2 ) helps. > > self.services.state('tuned', True) > rc, stdout, stderr = self.execute( > ( > self.command.get('tuned-adm'), > 'profile', > self._profile, > ), > raiseOnError=False, > ) > if rc != 0: > self.logger.warning(_('Cannot set tuned profile')) > else: > self.services.startup('tuned', True) > > Last relevant change was to not fail if tuned-adm fails, ~ 5 years ago: > > https://gerrit.ovirt.org/10444 > > So at some point someone decided it's not critical. > > A possible workaround is to call, instead of self.execute, > self.executePipeRaw, which has a parameter 'timeout', and pass there some > value, say 30 seconds. (Originally by Yaniv Kaul) I could only reproduce this with `tuned-adm off` before stopping tuned. IIUC tuned-adm sends a message to dbus and waits for a "profile changed" response that never happens I'm guessing because the daemon is firing up so dbus can't find it. I tried with --async and it looks like the profile is set correctly (tuned-adm verify is ok). (Originally by Yuval Turgeman) (In reply to Yuval Turgeman from comment #11) > I could only reproduce this with `tuned-adm off` before stopping tuned. > IIUC tuned-adm sends a message to dbus and waits for a "profile changed" > response that never happens I'm guessing because the daemon is firing up so > dbus can't find it. I tried with --async and it looks like the profile is > set correctly (tuned-adm verify is ok). This is a tuned-adm bug - can we move this to tuned? (Originally by Yaniv Kaul) It's probably tuned or dbus, but we should definitely move this. The question is, why do we use tuned for virtual-host profile if vdsm manages its own kernel params with /etc/sysctl.d/vdsm.conf ? I mean, if we run `tuned-adm verify` after setting the virtual-host profile, it would fail, because vdsm.conf overrides some parameters. Not that I'm against tuned, but I think it's better to have one place (either tuned or sysctl.d) that sets those params. (Originally by Yuval Turgeman) (In reply to Yuval Turgeman from comment #13) > It's probably tuned or dbus, but we should definitely move this. The > question is, why do we use tuned for virtual-host profile if vdsm manages > its own kernel params with /etc/sysctl.d/vdsm.conf ? I mean, if we run > `tuned-adm verify` after setting the virtual-host profile, it would fail, > because vdsm.conf overrides some parameters. Not that I'm against tuned, > but I think it's better to have one place (either tuned or sysctl.d) that > sets those params. I agree, and I think it should be tuned. Can you specify the different setting we have? I assume we have a good reason which may not be applicable to others (OpenStack) from diverging from virtual-host profile. In any case, can you move the bug to tuned? (Originally by Yaniv Kaul) Sure, only one difference: static/etc/sysctl.d/vdsm.conf:vm.dirty_background_ratio = 2 usr/lib/tuned/virtual-host/tuned.conf:vm.dirty_background_ratio = 5 I opened bug 1523194 for tuned, and added it here as "depends on", do you want to move this bug to tuned as well ? (Originally by Yuval Turgeman) (In reply to Yuval Turgeman from comment #15) > Sure, only one difference: > > static/etc/sysctl.d/vdsm.conf:vm.dirty_background_ratio = 2 > > usr/lib/tuned/virtual-host/tuned.conf:vm.dirty_background_ratio = 5 Nir, any idea why we have a different value in VDSM than tuned for this parameter? > > I opened bug 1523194 for tuned, and added it here as "depends on", do you > want to move this bug to tuned as well ? We can probably close this bug, or use it as 'create a dependency on tuned version XYZ' kind of bug. (Originally by Yaniv Kaul) (In reply to Yaniv Kaul from comment #16) > (In reply to Yuval Turgeman from comment #15) > Nir, any idea why we have a different value in VDSM than tuned for this > parameter? These settings were added for bug 740887, suggested by the performance team for rhel 6.x. I don't know if these are needed for rhel 7 and can be replaced by dynamic setting by tuned. Dan, what do you think? (Originally by Nir Soffer) Verified on redhat-release-virtualization-host-4.2-0.6.el7.x86_64 (Originally by Petr Matyas) Verified on # imgbase w You are on rhvh-4.1-0.20180314.0+1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0570 BZ<2>Jira Resync |