Bug 1129020 - Document the WATCHDOG_SCRIPT feature
Summary: Document the WATCHDOG_SCRIPT feature
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Community
Component: Doc
Version: 0.17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified vote
Target Milestone: 0.18.1
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-12 06:22 UTC by Nick Coghlan
Modified: 2018-02-06 00:41 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-12 07:36:22 UTC


Attachments (Terms of Use)

Description Nick Coghlan 2014-08-12 06:22:05 UTC
On the lab controller, a WATCHDOG_SCRIPT can be configured to run whenever an external watchdog timer fires.

It receives the system FQDN(?), recipe ID and currently running task ID as arguments, and should print as its sole output the number of seconds to extend the watchdog.

When it reports successful completion, the external watchdog script becomes responsible for stopping the recipe (the watchdog daemon will log an error and stop the recipe if the script returns a non-zero exit code)

Comment 1 Jun'ichi NOMURA 2014-08-18 05:24:31 UTC
(In reply to Nick Coghlan from comment #0)
> When it reports successful completion, the external watchdog script becomes
> responsible for stopping the recipe (the watchdog daemon will log an error
> and stop the recipe if the script returns a non-zero exit code)

Expected behavior of this feature is:
  - non-zero exit code means either WATCHDOG_SCRIPT has failed or no extension was requested.
    So stop the recipe.
  - zero exit code means WATCHDOG_SCRIPT has requested the timer extension.
    So extend the timeout, where the timeout value is read from the script stdout

That current code assumes 'self.extend_watchdog()' never fails without exception could be an oversight.
If the function may return failure, the code should be fixed to stop the recipe in that case.

Comment 2 Dan Callaghan 2014-09-02 05:45:38 UTC
(In reply to Nick Coghlan from comment #0)
> When it reports successful completion, the external watchdog script becomes
> responsible for stopping the recipe 

Well, it has to either abort the recipe or else just handle being invoked on the same recipe again, to avoid infinite loops. Assuming WATCHDOG_SCRIPT exits normally, the watchdog is extended (meaning it goes back to active) and it should go through the same expiry process when it expires again.

(In reply to Jun'ichi NOMURA from comment #1)
> That current code assumes 'self.extend_watchdog()' never fails without
> exception could be an oversight.
> If the function may return failure, the code should be fixed to stop the
> recipe in that case.

I think the code currently handles failures in WATCHDOG_SCRIPT correctly. The check_output() function is where the external script is actually executed, and that function raises an exception if the exit status is non-zero. There is an except: block which will catch that exception (as well as any other exceptions, like a failure to coerce the script output to int, or a failure to extend the watchdog) and fall through to aborting the recipe as normal. So I think the code matches the expected behaviour you are describing.

Of course, having said all that, we aren't testing WATCHDOG_SCRIPT anywhere currently so I can't prove it...

Comment 3 Dan Callaghan 2014-09-02 06:49:26 UTC
On Gerrit: http://gerrit.beaker-project.org/3305

Comment 4 Dan Callaghan 2014-09-09 01:03:17 UTC
This bug fix is available on the Beaker web site:

https://beaker-project.org/docs-release-0.18/admin-guide/watchdog-script.html

Comment 5 Dan Callaghan 2014-09-12 07:36:22 UTC
Beaker 0.18.1 has been released.


Note You need to log in before you can comment on or make changes to this bug.