Bug 1078981

Summary: Stucked tuned service during host deploying
Product: [oVirt] ovirt-host-deploy Reporter: Chris Pelland <cpelland>
Component: Plugins.tuneAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED UPSTREAM QA Contact: Jiri Belka <jbelka>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 1.1.0CC: aberezin, acathrow, alonbl, bazulay, cpelland, dougsland, eedri, emesika, gklein, iheim, iliam, jskarvad, michal.skrivanek, nlevinki, Rhev-m-bugs, sbonazzo, sherold, tdosek, tpoitras, yeylon, zdover
Target Milestone: ---Keywords: ReleaseNotes, ZStream
Target Release: ---Flags: alonbl: devel_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Release Note
Doc Text:
When performing host-deploy, the tuned daemon occasionally does not respond to restart. To avoid this potential issue, simply stop the tuned daemon on the host before performing host-deploy.
Story Points: ---
Clone Of: 1069119 Environment:
Last Closed: 2014-07-29 14:07:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1069119, 1069245, 1071453    
Bug Blocks:    

Description Chris Pelland 2014-03-20 16:49:58 UTC
+++ This bug was initially created as a clone of Bug #1069119 +++

Description of problem:
Stucked tuned service during host deploying

Version-Release number of selected component (if applicable):
ovirt-beta3

How reproducible:
80%

Steps to Reproduce:
Install rhevm and add a host.

Additional info (thanks to lbendar):

There is an error about "Existing lock /var/run/yum.pid"

you can track the process which caused that using pid. follow
instruction bellow:

{{{
ls -la /proc/$(cat /var/run/yum.pid)/fd | grep 'log$'
l-wx------. 1 root root 64 Feb 24 09:42 1 ->
/tmp/ovirt-host-deploy-20140220152021.log
}}}

and see ... it was caused by rhevm bootstrap ...
so we can take a look at log what happened:

{{{
[root@puma29 ~]# tail /tmp/ovirt-host-deploy-20140220152021.log

2014-02-20 15:21:14 DEBUG otopi.plugins.otopi.services.rhel
plugin.executeRaw:366 execute: ('/sbin/service', 'tuned', 'start'),
executable='None', cwd='None', env=None
2014-02-20 15:21:15 DEBUG otopi.plugins.otopi.services.rhel
plugin.executeRaw:383 execute-result: ('/sbin/service', 'tuned',
'start'), rc=0
2014-02-20 15:21:15 DEBUG otopi.plugins.otopi.services.rhel
plugin.execute:441 execute-output: ('/sbin/service', 'tuned', 'start')
stdout:


2014-02-20 15:21:15 DEBUG otopi.plugins.otopi.services.rhel
plugin.execute:446 execute-output: ('/sbin/service', 'tuned', 'start')
stderr:


2014-02-20 15:21:15 DEBUG otopi.plugins.ovirt_host_deploy.tune.tuned
plugin.executeRaw:366 execute: ('/usr/bin/tuned-adm', 'profile',
'virtual-host'), executable='None', cwd='None', env=None
}}}

and we are here again ... it leads to tuned service.
lets take a look at /var/log/tuned/tuned.log

I can not find anything interesting there
{{{
....
2014-02-20 15:10:35,110 INFO     tuned: performing ktune conditional restart
....
}}}

--- Additional comment from Meital Bourvine on 2014-02-24 04:34:39 EST ---



--- Additional comment from Sandro Bonazzola on 2014-02-24 04:41:20 EST ---

Jaroslav, can you take a look? Maybe it's a tuned bug.

--- Additional comment from Jaroslav Škarvada on 2014-02-24 05:37:54 EST ---

(In reply to Sandro Bonazzola from comment #2)
> Jaroslav, can you take a look? Maybe it's a tuned bug.

I think this may be race. RHEL-6 tuned is not race free and it's something that cannot be correctly fixed without re-design (which happened in RHEL-7).

Could you try to add e.g. 5 seconds delay between '/sbin/service tuned start' and '/usr/bin/tuned-adm profile virtual-host'? In case it helps, I can try to workaround this specific problem.

--- Additional comment from Alon Bar-Lev on 2014-02-24 05:55:33 EST ---

(In reply to Jaroslav Škarvada from comment #3)
> (In reply to Sandro Bonazzola from comment #2)
> > Jaroslav, can you take a look? Maybe it's a tuned bug.
> 
> I think this may be race. RHEL-6 tuned is not race free and it's something
> that cannot be correctly fixed without re-design (which happened in RHEL-7).
> 
> Could you try to add e.g. 5 seconds delay between '/sbin/service tuned
> start' and '/usr/bin/tuned-adm profile virtual-host'? In case it helps, I
> can try to workaround this specific problem.

whoever reproduce it... modify: /usr/share/ovirt-host-deploy/plugins/ovirt-host-deploy/tune/tuned.py

    def _misc(self):
        # tuned-adm does not work if daemon is down!
        self.services.state('tuned', True)
+       import time
+       time.sleep(5)
        rc, stdout, stderr = self.execute(

--- Additional comment from Ilia Meerovich on 2014-02-24 08:27:36 EST ---

This bug blocks testing of ovirt 3.4

--- Additional comment from Alon Bar-Lev on 2014-02-24 08:31:12 EST ---

(In reply to Ilia Meerovich from comment #5)
> This bug blocks testing of ovirt 3.4

you do not understand important fact... this tuned version that was probably distributed in rhel-6.5 will effect production version since 3.2.

blocking ovirt-3.4 tests are the least of our worries.

--- Additional comment from Gil Klein on 2014-02-24 08:50:40 EST ---

Meital, please try to reproduce with the workaround suggested in comment #4, and update the BZ if it solves the problem or not.

--- Additional comment from Meital Bourvine on 2014-02-24 09:22:42 EST ---

This work around seems to be working.
I run it 3 times.

--- Additional comment from Sandro Bonazzola on 2014-02-24 09:51:18 EST ---

Removing from blockers since it's a tuned regression.
Meital, please open a BZ on tuned component.
While tuned waits to be fixed, please downgrade to previous version.
Alon, it's up to you if you want to close this as notabug, wontfix or add a conflict in spec file on this specific tuned version forcing a downgrade or an upgrade just to have a working tuned istalled.
Just let me know if you rebuild host-deploy before 09:00 UTC tomorrow, Feb 25th 2014.

--- Additional comment from Meital Bourvine on 2014-02-24 09:57:47 EST ---

https://bugzilla.redhat.com/show_bug.cgi?id=1069245

--- Additional comment from Alon Bar-Lev on 2014-02-24 10:06:57 EST ---

(In reply to Meital Bourvine from comment #10)
> https://bugzilla.redhat.com/show_bug.cgi?id=1069245

per what sandro wrote, it does not block ovirt-engine-3.4 from being released as this problem is specific to rhel and is effecting also previous releases that are already out.

action items for downstream are different.

please open a bug against rhel tuned to track this issue, this bug should block this one.

--- Additional comment from Sandro Bonazzola on 2014-02-25 01:53:43 EST ---

Removing from blockers again as per comment #9.
Added tuned bug #1069245 to this bug dependencies.

--- Additional comment from Sandro Bonazzola on 2014-02-27 04:42:38 EST ---

Removing AutomationBlocker, TestBlocker since downgrading tuned allow tests to be performed.

--- Additional comment from Sandro Bonazzola on 2014-03-04 04:29:56 EST ---

This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 5 Alon Bar-Lev 2014-07-01 14:48:37 UTC
devel-ack it as doc bug

Comment 7 Jiri Belka 2014-07-09 10:41:06 UTC
is this just a doc bug?

Comment 8 Alon Bar-Lev 2014-07-09 12:40:44 UTC
(In reply to Jiri Belka from comment #7)
> is this just a doc bug?

indeed.

Comment 9 Jiri Belka 2014-07-21 13:12:18 UTC
ok, KB 1136203 exists.

Comment 11 errata-xmlrpc 2014-07-29 14:07:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0962.html