Bug 743819 - Jobs killed by External Watchdog after prolonged server outage
Summary: Jobs killed by External Watchdog after prolonged server outage
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified vote
Target Milestone: ---
Assignee: Bill Peck
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-06 08:00 UTC by Marian Csontos
Modified: 2019-05-22 13:41 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-26 07:16:59 UTC


Attachments (Terms of Use)

Description Marian Csontos 2011-10-06 08:00:35 UTC
Description of problem:
We had some jobs killed by external watchdog the last outage - see Bug 637186#c6 and following comment(s).

We need to make sure after outage taking longer than 30 minutes all the watchdogs are updated to allow for machines to sync up with the host.

Version-Release number of selected component (if applicable):
0.7.2

How reproducible:
Not easy to reproduce.

Steps to Reproduce:
1. schedule a task taking X minutes
1. perform an outage longer than X + 1 hour
  
Actual results:
Job killed by external watchdog

Expected results:
Job resumes after the delay, submits all results and continues execution.

Additional info:

Comment 1 Bill Peck 2011-10-06 12:45:17 UTC
How about a command line which would allow the admins to extend current watchdogs by the length of the outage?

bkr watchdogs --add 30m

Comment 2 Bill Peck 2011-10-06 12:45:43 UTC
of course they would need to run that before starting beaker-watchdog services.

Comment 3 Marian Csontos 2011-10-06 14:06:21 UTC
Yes please. I think that would do.

Comment 4 Bill Peck 2012-01-11 16:10:24 UTC
moving to 0.8.2

Comment 5 Bill Peck 2012-03-27 14:44:52 UTC
pushed to gerrit for review


Note You need to log in before you can comment on or make changes to this bug.