Bug 615378

Summary: Ch 19 FAQ
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: Grid_User_GuideAssignee: Lana Brindley <lbrindle>
Status: CLOSED CURRENTRELEASE QA Contact: Lubos Trilety <ltrilety>
Severity: medium Docs Contact:
Priority: medium    
Version: DevelopmentCC: ltrilety, mhideo
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-14 20:12:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Rati 2010-07-16 16:16:00 UTC
Description of problem:
Change all HOSTALLOW_ -> ALLOW_ && HOSTDENY_ -> DENY_

Set USE_PROCD = FALSE in the startd configuration => STARTD.USE_PROCD = FALSE & STARTER.USE_PROCD = FALSE in the startd configuration.

"The startd will always wait the value specified in the killing_timeout parameter before hard-killing the job" => The startd will always wait the value specified in the killing_timeout parameter before hard-killing the starter

"However, the starter will always wait for the value specified in the killing_timeout-1 configuration variable before attempting to hard-kill the job" => However, by default the starter will wait killing_timeout-1 before attempting to hard-kill the job.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Lana Brindley 2010-07-19 03:33:38 UTC
(In reply to comment #0)
> Description of problem:
> Change all HOSTALLOW_ -> ALLOW_ && HOSTDENY_ -> DENY_

Done.

> 
> Set USE_PROCD = FALSE in the startd configuration => STARTD.USE_PROCD = FALSE &
> STARTER.USE_PROCD = FALSE in the startd configuration.

<listitem>
	<para>
		Set <command>STARTD.USE_PROCD = FALSE</command> and <command>STARTER.USE_PROCD = FALSE</command> in the startd configuration. This is the most reliable way to handle the situation.
	</para>
</listitem>

> 
> "The startd will always wait the value specified in the killing_timeout
> parameter before hard-killing the job" => The startd will always wait the value
> specified in the killing_timeout parameter before hard-killing the starter
> 
> "However, the starter will always wait for the value specified in the
> killing_timeout-1 configuration variable before attempting to hard-kill the
> job" => However, by default the starter will wait killing_timeout-1 before
> attempting to hard-kill the job.

<para>
	When you try to kill a job with a custom signal, it can sometimes cause a race condition to occur between the starter and the startd. This happens when the startd communicates with the starter using <command>procd</command>. The startd will always wait the value specified in the <parameter>killing_timeout</parameter> parameter before hard-killing the starter. However, by default the starter will wait for the value specified in the <parameter>killing_timeout-1</parameter> configuration variable before attempting to hard-kill the job. This means that it is sometimes possible for the startd to be attempting to hard-kill the starter, while the starter is cleaning up and exiting. It causes the starter to stop communicating with the <command>procd</command>, which makes the startd suffer a communication failure, and then crash.
</para>


LKB

Comment 2 Lubos Trilety 2010-09-21 08:29:29 UTC
No HOSTALLOW/HOSTDENY in grid user guide.
Chapter was correctly changed.

>>> VERIFIED