Bug 744752

Summary: watchdog not killing job with multiple running tasks
Product: [Retired] Beaker Reporter: Marian Csontos <mcsontos>
Component: schedulerAssignee: Bill Peck <bpeck>
Status: CLOSED NOTABUG QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 0.7CC: bpeck, dcallagh, mcsontos, rmancy, stl
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-10 12:06:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marian Csontos 2011-10-10 11:18:34 UTC
Description of problem:
There was an error in testing harness build which resulted in multiple tasks in Running state in single recipe:

  https://beaker-stage.app.eng.bos.redhat.com/jobs/3863

The job is now overdue for 3 days and was not killed by EWD. I have seen this before and verified EWD works fine otherwise.

Though this requires buggy beah package once we allow running multiple tasks in parallel it will become an issue.

I though this may be result of multiple watchdogs active for single recipe, but I am not sure about it any more because of even completed tasks have an "active" watchdog entry (not sure it is actually active.)


Version-Release number of selected component (if applicable):
0.7.3


How reproducible:
100% (2/2)


Steps to Reproduce:
No reproducer, simplest would be to use a selenium test along this line:
1. start task
2. call watchdog_extend
3. repeat steps 1 and 2 several times
4. wait


Actual results:
Task is long overdue though the only displayed watchdog is over.


Expected results:
Task killed by External Watchdog or display all active watchdogs.


Additional info:

Active watchdogs:

> bkr-stage watchdog-show 63163
63163: -245301
> bkr-stage watchdog-show 63164
63164: -245310
> bkr-stage watchdog-show 63165
63165: -245317
> bkr-stage watchdog-show 63166
63166: -245328
> bkr-stage watchdog-show 63167
63167: -245334
> bkr-stage watchdog-show 63168
63168: -245339
> bkr-stage watchdog-show 63169
63169: -245344

But I noticed even completed tasks have watchdog set:

> bkr-stage watchdog-show 63162
63162: -245425
> bkr-stage watchdog-show 63161
63161: -245430
> bkr-stage watchdog-show 63160
63160: -245435
> bkr-stage watchdog-show 63159
63159: -245440

But not so for completed jobs:

> bkr-stage watchdog-show 63158
63158: False

Comment 1 Raymond Mancy 2011-10-10 11:58:57 UTC
The watchdog on lab-devel was not running.
This seems to be my fault. We had a minor issue in the last deployment and I was playing with lab-devel to replicate it. Seems I didn't put it back...
apologies.