Bug 803359 - [RFE]change UNHIBERNATE default value to not wake up all the machines
Summary: [RFE]change UNHIBERNATE default value to not wake up all the machines
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-wallaby-base-db
Version: 2.1
Hardware: All
OS: All
low
low
Target Milestone: 2.3
: ---
Assignee: Robert Rati
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On: 756096
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-14 14:29 UTC by Timothy St. Clair
Modified: 2013-03-06 18:42 UTC (History)
8 users (show)

Fixed In Version: condor-wallaby-base-db-1.24-1
Doc Type: Enhancement
Doc Text:
C: The UNHIBERATE parameter in the remote configuration base database was too loosely defined C: Hibernated machines would be woken up unnecessarily C: Tightened the definition of UNHIBERNATE R: Not all hibernated machines will be woken up
Clone Of: 756096
Environment:
Last Closed: 2013-03-06 18:42:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0564 0 normal SHIPPED_LIVE Low: Red Hat Enterprise MRG Grid 2.3 security update 2013-03-06 23:37:09 UTC

Description Timothy St. Clair 2012-03-14 14:29:39 UTC
+++ This bug was initially created as a clone of Bug #756096 +++

Description of problem:
The out of the box definition of UNHIBERNATE is:
UNHIBERNATE = MachineLastMatchTime =!= UNDEFINED
Thus all the sleeping machines are woken up after each ROOSTER_INTERVAL (because ROOSTER_UNHIBERNATE=Offline && Unhibernate).

I understand that this value would probably need a tuning in every cluster, but I think that a more reasonable default value would be something like:

Unhibernate = CurrentTime - MachineLastMatchTime < 1200

as described here: 
https://lists.cs.wisc.edu/archive/condor-users/2010-April/msg00063.shtml

This could be maybe changed upstream.

Version-Release number of selected component:
condor-7.6.5-0.7

--- Additional comment from tstclair on 2012-03-14 10:14:56 EDT ---

Added config default as mentioned.  Tracking in V7_6-build-branch

Comment 2 Robert Rati 2012-03-28 19:30:17 UTC
Changed UNHIBERNATE on PowerManagementSubnetManager to the above value.

Updated on master.

Comment 3 Robert Rati 2012-04-02 15:02:39 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: The UNHIBERATE parameter in the remote configuration base database was too loosely defined
C: Hibernated machines would be woken up unnecessarily
C: Tightened the definition of UNHIBERNATE
R: Not all hibernated machines will be woken up

Comment 6 Robert Rati 2012-10-03 20:06:03 UTC
Updated the value for UNHIBERNATE to above value for feature and param.

PowerManagementSubnetManager
  name:  "PowerManagementSubnetManager"
  params:  {"ROOSTER"=>"$(LIBEXEC)/condor_rooster", "ROOSTER_UNHIBERNATE_RANK"=>"Mips*Cpus", "ROOSTER_MAX_UNHIBERNATE"=>"0", "ROOSTER_INTERVAL"=>"300", "ROOSTER_WAKEUP_CMD"=>"\"$(BIN)/condor_power -d -i -s $(ROOSTER_SUBNET_MASK)\"", "ROOSTER_UNHIBERNATE"=>"Offline && Unhibernate", "DAEMON_LIST"=>">= ROOSTER", "UNHIBERNATE"=>"CurrentTime - MachineLastMatchTime < 1200", "ROOSTER_SUBNET_MASK"=>""}
  depends:  []
  conflicts:  ["PowerManagementNode"]
  included_features:  []

UNHIBERNATE
  kind:  "String"
  default:  "CurrentTime - MachineLastMatchTime < 1200"
  description:  "A boolean expression that specifies when an offline machine should be woken up"
  must_change:  false
  requires_restart:  false
  visibility_level:  0
  depends:  []
  conflicts:  []

Fixed upstream on branch:
BZ803359-UNHIBERNATE-value

Comment 7 Lubos Trilety 2012-12-18 10:15:23 UTC
tested with:
condor-wallaby-base-db-1.25-1

tested on:
RHEL6 i386,x86_64
RHEL5 i386,x86_64

- !ruby/object:Mrg::Grid::SerializedConfigs::Feature
  annotation: Enables power management wake up for a subnet
  conflicts:
  - PowerManagementNode
  depends: []

  included: []

  name: PowerManagementSubnetManager
  params:
    ROOSTER_WAKEUP_CMD: "\"$(BIN)/condor_power -d -i -s $(ROOSTER_SUBNET_MASK)\""
    UNHIBERNATE: CurrentTime - MachineLastMatchTime < 1200
    ROOSTER_UNHIBERNATE_RANK: Mips*Cpus
    ROOSTER_MAX_UNHIBERNATE: "0"
    ROOSTER_SUBNET_MASK: 0
    ROOSTER: $(LIBEXEC)/condor_rooster
    ROOSTER_INTERVAL: "300"
    ROOSTER_UNHIBERNATE: Offline && Unhibernate
    DAEMON_LIST: ">= ROOSTER"

- !ruby/object:Mrg::Grid::SerializedConfigs::Parameter
  annotation: ""
  conflicts: []

  default_val: CurrentTime - MachineLastMatchTime < 1200
  depends: []

  description: A boolean expression that specifies when an offline machine should be woken up
  kind: String
  level: 0
  must_change: false
  name: UNHIBERNATE
  needs_restart: false


# wallaby show-feature PowerManagementSubnetManager
Console Connection Established...
PowerManagementSubnetManager
  name:  "PowerManagementSubnetManager"
  params:  {"UNHIBERNATE"=>"CurrentTime - MachineLastMatchTime < 1200", "ROOSTER_INTERVAL"=>"300", "DAEMON_LIST"=>">= ROOSTER", "ROOSTER_MAX_UNHIBERNATE"=>"0", "ROOSTER"=>"$(LIBEXEC)/condor_rooster", "ROOSTER_WAKEUP_CMD"=>"\"$(BIN)/condor_power -d -i -s $(ROOSTER_SUBNET_MASK)\"", "ROOSTER_SUBNET_MASK"=>"", "ROOSTER_UNHIBERNATE_RANK"=>"Mips*Cpus", "ROOSTER_UNHIBERNATE"=>"Offline && Unhibernate"}
  depends:  []
  conflicts:  ["PowerManagementNode"]
  included_features:  []
  annotation:  "Enables power management wake up for a subnet"

# wallaby show-param UNHIBERNATE
Console Connection Established...
UNHIBERNATE
  kind:  "String"
  default:  "CurrentTime - MachineLastMatchTime < 1200"
  description:  "A boolean expression that specifies when an offline machine should be woken up"
  must_change:  false
  requires_restart:  false
  visibility_level:  0
  depends:  []
  conflicts:  []
  annotation:  ""


>>> verified

Comment 9 errata-xmlrpc 2013-03-06 18:42:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html


Note You need to log in before you can comment on or make changes to this bug.