Bug 615378 - Ch 19 FAQ
Summary: Ch 19 FAQ
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: Grid_User_Guide
Version: Development
Hardware: All
OS: Linux
medium
medium
Target Milestone: 1.3
: ---
Assignee: Lana Brindley
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-16 16:16 UTC by Robert Rati
Modified: 2013-10-23 23:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-10-14 20:12:37 UTC


Attachments (Terms of Use)

Description Robert Rati 2010-07-16 16:16:00 UTC
Description of problem:
Change all HOSTALLOW_ -> ALLOW_ && HOSTDENY_ -> DENY_

Set USE_PROCD = FALSE in the startd configuration => STARTD.USE_PROCD = FALSE & STARTER.USE_PROCD = FALSE in the startd configuration.

"The startd will always wait the value specified in the killing_timeout parameter before hard-killing the job" => The startd will always wait the value specified in the killing_timeout parameter before hard-killing the starter

"However, the starter will always wait for the value specified in the killing_timeout-1 configuration variable before attempting to hard-kill the job" => However, by default the starter will wait killing_timeout-1 before attempting to hard-kill the job.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Lana Brindley 2010-07-19 03:33:38 UTC
(In reply to comment #0)
> Description of problem:
> Change all HOSTALLOW_ -> ALLOW_ && HOSTDENY_ -> DENY_

Done.

> 
> Set USE_PROCD = FALSE in the startd configuration => STARTD.USE_PROCD = FALSE &
> STARTER.USE_PROCD = FALSE in the startd configuration.

<listitem>
	<para>
		Set <command>STARTD.USE_PROCD = FALSE</command> and <command>STARTER.USE_PROCD = FALSE</command> in the startd configuration. This is the most reliable way to handle the situation.
	</para>
</listitem>

> 
> "The startd will always wait the value specified in the killing_timeout
> parameter before hard-killing the job" => The startd will always wait the value
> specified in the killing_timeout parameter before hard-killing the starter
> 
> "However, the starter will always wait for the value specified in the
> killing_timeout-1 configuration variable before attempting to hard-kill the
> job" => However, by default the starter will wait killing_timeout-1 before
> attempting to hard-kill the job.

<para>
	When you try to kill a job with a custom signal, it can sometimes cause a race condition to occur between the starter and the startd. This happens when the startd communicates with the starter using <command>procd</command>. The startd will always wait the value specified in the <parameter>killing_timeout</parameter> parameter before hard-killing the starter. However, by default the starter will wait for the value specified in the <parameter>killing_timeout-1</parameter> configuration variable before attempting to hard-kill the job. This means that it is sometimes possible for the startd to be attempting to hard-kill the starter, while the starter is cleaning up and exiting. It causes the starter to stop communicating with the <command>procd</command>, which makes the startd suffer a communication failure, and then crash.
</para>


LKB

Comment 2 Lubos Trilety 2010-09-21 08:29:29 UTC
No HOSTALLOW/HOSTDENY in grid user guide.
Chapter was correctly changed.

>>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.