Bug 744752 - watchdog not killing job with multiple running tasks
Summary: watchdog not killing job with multiple running tasks
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.7
Hardware: Unspecified
OS: Unspecified
unspecified
low vote
Target Milestone: ---
Assignee: Bill Peck
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-10 11:18 UTC by Marian Csontos
Modified: 2011-10-10 12:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-10 12:06:26 UTC


Attachments (Terms of Use)

Description Marian Csontos 2011-10-10 11:18:34 UTC
Description of problem:
There was an error in testing harness build which resulted in multiple tasks in Running state in single recipe:

  https://beaker-stage.app.eng.bos.redhat.com/jobs/3863

The job is now overdue for 3 days and was not killed by EWD. I have seen this before and verified EWD works fine otherwise.

Though this requires buggy beah package once we allow running multiple tasks in parallel it will become an issue.

I though this may be result of multiple watchdogs active for single recipe, but I am not sure about it any more because of even completed tasks have an "active" watchdog entry (not sure it is actually active.)


Version-Release number of selected component (if applicable):
0.7.3


How reproducible:
100% (2/2)


Steps to Reproduce:
No reproducer, simplest would be to use a selenium test along this line:
1. start task
2. call watchdog_extend
3. repeat steps 1 and 2 several times
4. wait


Actual results:
Task is long overdue though the only displayed watchdog is over.


Expected results:
Task killed by External Watchdog or display all active watchdogs.


Additional info:

Active watchdogs:

> bkr-stage watchdog-show 63163
63163: -245301
> bkr-stage watchdog-show 63164
63164: -245310
> bkr-stage watchdog-show 63165
63165: -245317
> bkr-stage watchdog-show 63166
63166: -245328
> bkr-stage watchdog-show 63167
63167: -245334
> bkr-stage watchdog-show 63168
63168: -245339
> bkr-stage watchdog-show 63169
63169: -245344

But I noticed even completed tasks have watchdog set:

> bkr-stage watchdog-show 63162
63162: -245425
> bkr-stage watchdog-show 63161
63161: -245430
> bkr-stage watchdog-show 63160
63160: -245435
> bkr-stage watchdog-show 63159
63159: -245440

But not so for completed jobs:

> bkr-stage watchdog-show 63158
63158: False

Comment 1 Raymond Mancy 2011-10-10 11:58:57 UTC
The watchdog on lab-devel was not running.
This seems to be my fault. We had a minor issue in the last deployment and I was playing with lab-devel to replicate it. Seems I didn't put it back...
apologies.


Note You need to log in before you can comment on or make changes to this bug.