Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 806213

Summary:	Sync the start of multihost tasks to avoid termination by watchdog
Product:	[Retired] Beaker	Reporter:	Karel Srot <ksrot>
Component:	scheduler	Assignee:	beaker-dev-list
Status:	CLOSED WONTFIX	QA Contact:
Severity:	medium	Docs Contact:
Priority:	high
Version:	0.5	CC:	azelinka, bpeck, ltoscano, mcsontos, ohudlick, stl, tools-bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	MultiHost
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-21 14:16:09 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Karel Srot 2012-03-23 08:30:43 UTC

Description of problem:

When scheduling multihost test, one machine is usually provisioned much earlier than the other one. That results in the early test execution on the first system, the test stops on the first rhts-sync-block waiting for the second server and usually it is killed by watchdog.


FMPOV there are following solutions:

1. Extend the test time in Makefile to hours. This is not good for scheduling several multihost tasks because once some test got stuck, it takes hours till it is (correctly) killed by watchdog.

2. Create one special task that ensures the sync after the installation and use this task in every multihost job (could be done by workflow).

3. Something similar to 2 done automatically by beaker.


Version-Release number of selected component (if applicable):
Version - 0.8.1  ???

Comment 1 Petr Šplíchal 2012-03-26 10:52:19 UTC

The second option is quite straightforward (and I'd be willing to
add the support to the workflow-tcms). However, resolving this in
Beaker could probably be a bit more clean solution. Bill, what do
you think? Would this be better (and easy) to fix in Beaker or
leave it rather to the workflow and together with helper tasks?

Comment 2 Bill Peck 2012-03-26 14:47:48 UTC

I could update the /distribution/install task to do the following:

rhts-sync-set READY
rhts-sync-block -s READY $STANDALONE

This will work fine for both multi-host and single-host jobs.

Comment 3 Karel Srot 2012-03-26 15:07:47 UTC

Nice. Would it be necessary to extend the test time of /distribution/install? I am not sure about current maximal duration but extending it to hours might be causing unnecessary delays before the broke install is terminated by watchdog.

Comment 4 Bill Peck 2012-03-26 15:19:11 UTC

I don't think we need to extend the watchdog of install, it already has a long time out for the install to finish.  And remember that the watchdog won't kill the recipeSet until *all* the recipes have expired.

Comment 5 Karel Srot 2012-03-26 15:39:13 UTC

OK, then it seems like the best solution. Thank you.

Comment 6 Petr Šplíchal 2012-03-27 10:03:35 UTC

(In reply to comment #2)
> I could update the /distribution/install task to do the following:
> 
> rhts-sync-set READY
> rhts-sync-block -s READY $STANDALONE

What does the variable STANDALONE hold? Hostnames of other
recipes? I don't see any description in the Deployment Guide.

Comment 7 Marian Csontos 2012-03-27 12:28:50 UTC

...which will break as soon as anyone got an idea to use the task with different role.

RECIPE_MEMBERS variable would be more appropriate here.

Comment 8 Marian Csontos 2012-03-27 12:31:52 UTC

(In reply to comment #6)
> (In reply to comment #2)
> > I could update the /distribution/install task to do the following:
> > 
> > rhts-sync-set READY
> > rhts-sync-block -s READY $STANDALONE
> 
> What does the variable STANDALONE hold? Hostnames of other
> recipes? I don't see any description in the Deployment Guide.

- environment variable is filled in for any role found in corresponding tasks (every Nth task in all recipes)
- an another ones for roles in recipes

That's the rhts way.

Comment 9 Bill Peck 2012-03-27 12:49:21 UTC

(In reply to comment #7)
> ...which will break as soon as anyone got an idea to use the task with
> different role.
> 
> RECIPE_MEMBERS variable would be more appropriate here.

recipe role can be changed as well.  In fact I think this would be a worse option since its more likely to be changed then someone changing the role for task "/distribution/install".  If you change the role for that particular task I think we can assume you know what your doing.

Comment 10 Nick Coghlan 2012-10-17 04:40:22 UTC

Bulk reassignment of issues as Bill has moved to another team.

Comment 11 Karel Srot 2013-01-07 07:48:38 UTC

I would update the requirement from #c0. After getting more experience with the execution of multihost tasks in beaker I believe that the harness should sync the start on EVERY multihost job (not just once after the system provisioning). 

Reason for this is that people are using various "singlehost" tasks withing the job, e.g. for errata package update or /distribution/reservesys. Every "singlehost" tasks in recipe brings back the problem with unsynced starts of multihost task. Since we dont want to exaggerate test time of Multihost jobs just to avoid timeout I believe that the best solution is to sync multihost task execution for every multihost task.

Also bumping the priority to get some attention.