Bug 806213 - Sync the start of multihost tasks to avoid termination by watchdog
Summary: Sync the start of multihost tasks to avoid termination by watchdog
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.5
Hardware: Unspecified
OS: Unspecified
high
medium vote
Target Milestone: ---
Assignee: beaker-dev-list
QA Contact:
URL:
Whiteboard: MultiHost
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-23 08:30 UTC by Karel Srot
Modified: 2020-10-21 14:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-21 14:16:09 UTC


Attachments (Terms of Use)

Description Karel Srot 2012-03-23 08:30:43 UTC
Description of problem:

When scheduling multihost test, one machine is usually provisioned much earlier than the other one. That results in the early test execution on the first system, the test stops on the first rhts-sync-block waiting for the second server and usually it is killed by watchdog.


FMPOV there are following solutions:

1. Extend the test time in Makefile to hours. This is not good for scheduling several multihost tasks because once some test got stuck, it takes hours till it is (correctly) killed by watchdog.

2. Create one special task that ensures the sync after the installation and use this task in every multihost job (could be done by workflow).

3. Something similar to 2 done automatically by beaker.


Version-Release number of selected component (if applicable):
Version - 0.8.1  ???

Comment 1 Petr Šplíchal 2012-03-26 10:52:19 UTC
The second option is quite straightforward (and I'd be willing to
add the support to the workflow-tcms). However, resolving this in
Beaker could probably be a bit more clean solution. Bill, what do
you think? Would this be better (and easy) to fix in Beaker or
leave it rather to the workflow and together with helper tasks?

Comment 2 Bill Peck 2012-03-26 14:47:48 UTC
I could update the /distribution/install task to do the following:

rhts-sync-set READY
rhts-sync-block -s READY $STANDALONE

This will work fine for both multi-host and single-host jobs.

Comment 3 Karel Srot 2012-03-26 15:07:47 UTC
Nice. Would it be necessary to extend the test time of /distribution/install? I am not sure about current maximal duration but extending it to hours might be causing unnecessary delays before the broke install is terminated by watchdog.

Comment 4 Bill Peck 2012-03-26 15:19:11 UTC
I don't think we need to extend the watchdog of install, it already has a long time out for the install to finish.  And remember that the watchdog won't kill the recipeSet until *all* the recipes have expired.

Comment 5 Karel Srot 2012-03-26 15:39:13 UTC
OK, then it seems like the best solution. Thank you.

Comment 6 Petr Šplíchal 2012-03-27 10:03:35 UTC
(In reply to comment #2)
> I could update the /distribution/install task to do the following:
> 
> rhts-sync-set READY
> rhts-sync-block -s READY $STANDALONE

What does the variable STANDALONE hold? Hostnames of other
recipes? I don't see any description in the Deployment Guide.

Comment 7 Marian Csontos 2012-03-27 12:28:50 UTC
...which will break as soon as anyone got an idea to use the task with different role.

RECIPE_MEMBERS variable would be more appropriate here.

Comment 8 Marian Csontos 2012-03-27 12:31:52 UTC
(In reply to comment #6)
> (In reply to comment #2)
> > I could update the /distribution/install task to do the following:
> > 
> > rhts-sync-set READY
> > rhts-sync-block -s READY $STANDALONE
> 
> What does the variable STANDALONE hold? Hostnames of other
> recipes? I don't see any description in the Deployment Guide.

- environment variable is filled in for any role found in corresponding tasks (every Nth task in all recipes)
- an another ones for roles in recipes

That's the rhts way.

Comment 9 Bill Peck 2012-03-27 12:49:21 UTC
(In reply to comment #7)
> ...which will break as soon as anyone got an idea to use the task with
> different role.
> 
> RECIPE_MEMBERS variable would be more appropriate here.

recipe role can be changed as well.  In fact I think this would be a worse option since its more likely to be changed then someone changing the role for task "/distribution/install".  If you change the role for that particular task I think we can assume you know what your doing.

Comment 10 Nick Coghlan 2012-10-17 04:40:22 UTC
Bulk reassignment of issues as Bill has moved to another team.

Comment 11 Karel Srot 2013-01-07 07:48:38 UTC
I would update the requirement from #c0. After getting more experience with the execution of multihost tasks in beaker I believe that the harness should sync the start on EVERY multihost job (not just once after the system provisioning). 

Reason for this is that people are using various "singlehost" tasks withing the job, e.g. for errata package update or /distribution/reservesys. Every "singlehost" tasks in recipe brings back the problem with unsynced starts of multihost task. Since we dont want to exaggerate test time of Multihost jobs just to avoid timeout I believe that the best solution is to sync multihost task execution for every multihost task.

Also bumping the priority to get some attention.


Note You need to log in before you can comment on or make changes to this bug.