Red Hat Bugzilla – Bug 904178
Change of behavior for code escalation/custom signal for jobs
Last modified: 2013-04-15 20:48:31 EDT
Description of problem:
HTCondor 7.8 changed the way a job can specify how to receive a custom signal and the related timeout.
Before: the job had to specify the custom signal as (Remove_)kill_sig. Upon rm, the signal was sent and then a kill after after KILLING_TIMEOUT-1 (or kill_sig_timeout if specified).
The same timeout was used for vacate_job if the job didn't react on the signal; TERM was used unless kill_sig was specified.
Now: the job has to specify the custom signal as (Remove_)kill_sig. Upon rm, in order for the escalation to take place, the job must also define want_graceful_exit=true; in this case the signal is sent and then the escalation takes place after MACHINEMAXVACATETIME (or JobMaxVacateTime if specified by the job - kill_sig_timeout is depracted).
The same timeout is used for vacate_job if the job does not react on the signal (but note that, if specified, the custom signal is always used even if want_graceful_exit is not specified).
Either change the default behaviour (at least for the remove the need for want_graceful_exit and restore the previous timeout) or properly document and explain the change. The name of the new variable (MACHINEMAXVACATETIME/ JobMaxVacateTime instead of KILLING_TIMEOUT/kill_sig_timeout) should be probably documented anyway, unless the old ones are created as alias.
The paragraph "The new submit command want_graceful_removal ..."
A new parameter named GRACEFULLY_REMOVE_JOBS, which defaults to true, determines whether jobs will be removed gracefully by default and use any custom signals defined. A job can override this setting by specifying want_graceful_removal in the job ad.
Also, the default config has MACHINEMAXVACATETIME set to KILLING_TIMEOUT-1.
Fixed on branch:
The feature works now according the behavior described into #3 (which, in the default configuration, is _almost_ like the old default behavior - now the signal escalation is always enabled).
Verified on RHEL5.9/6.4beta, i386/x86_64.