+++ This bug was initially created as a clone of Bug #756096 +++ Description of problem: The out of the box definition of UNHIBERNATE is: UNHIBERNATE = MachineLastMatchTime =!= UNDEFINED Thus all the sleeping machines are woken up after each ROOSTER_INTERVAL (because ROOSTER_UNHIBERNATE=Offline && Unhibernate). I understand that this value would probably need a tuning in every cluster, but I think that a more reasonable default value would be something like: Unhibernate = CurrentTime - MachineLastMatchTime < 1200 as described here: https://lists.cs.wisc.edu/archive/condor-users/2010-April/msg00063.shtml This could be maybe changed upstream. Version-Release number of selected component: condor-7.6.5-0.7 --- Additional comment from tstclair on 2012-03-14 10:14:56 EDT --- Added config default as mentioned. Tracking in V7_6-build-branch
Changed UNHIBERNATE on PowerManagementSubnetManager to the above value. Updated on master.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: The UNHIBERATE parameter in the remote configuration base database was too loosely defined C: Hibernated machines would be woken up unnecessarily C: Tightened the definition of UNHIBERNATE R: Not all hibernated machines will be woken up
Updated the value for UNHIBERNATE to above value for feature and param. PowerManagementSubnetManager name: "PowerManagementSubnetManager" params: {"ROOSTER"=>"$(LIBEXEC)/condor_rooster", "ROOSTER_UNHIBERNATE_RANK"=>"Mips*Cpus", "ROOSTER_MAX_UNHIBERNATE"=>"0", "ROOSTER_INTERVAL"=>"300", "ROOSTER_WAKEUP_CMD"=>"\"$(BIN)/condor_power -d -i -s $(ROOSTER_SUBNET_MASK)\"", "ROOSTER_UNHIBERNATE"=>"Offline && Unhibernate", "DAEMON_LIST"=>">= ROOSTER", "UNHIBERNATE"=>"CurrentTime - MachineLastMatchTime < 1200", "ROOSTER_SUBNET_MASK"=>""} depends: [] conflicts: ["PowerManagementNode"] included_features: [] UNHIBERNATE kind: "String" default: "CurrentTime - MachineLastMatchTime < 1200" description: "A boolean expression that specifies when an offline machine should be woken up" must_change: false requires_restart: false visibility_level: 0 depends: [] conflicts: [] Fixed upstream on branch: BZ803359-UNHIBERNATE-value
tested with: condor-wallaby-base-db-1.25-1 tested on: RHEL6 i386,x86_64 RHEL5 i386,x86_64 - !ruby/object:Mrg::Grid::SerializedConfigs::Feature annotation: Enables power management wake up for a subnet conflicts: - PowerManagementNode depends: [] included: [] name: PowerManagementSubnetManager params: ROOSTER_WAKEUP_CMD: "\"$(BIN)/condor_power -d -i -s $(ROOSTER_SUBNET_MASK)\"" UNHIBERNATE: CurrentTime - MachineLastMatchTime < 1200 ROOSTER_UNHIBERNATE_RANK: Mips*Cpus ROOSTER_MAX_UNHIBERNATE: "0" ROOSTER_SUBNET_MASK: 0 ROOSTER: $(LIBEXEC)/condor_rooster ROOSTER_INTERVAL: "300" ROOSTER_UNHIBERNATE: Offline && Unhibernate DAEMON_LIST: ">= ROOSTER" - !ruby/object:Mrg::Grid::SerializedConfigs::Parameter annotation: "" conflicts: [] default_val: CurrentTime - MachineLastMatchTime < 1200 depends: [] description: A boolean expression that specifies when an offline machine should be woken up kind: String level: 0 must_change: false name: UNHIBERNATE needs_restart: false # wallaby show-feature PowerManagementSubnetManager Console Connection Established... PowerManagementSubnetManager name: "PowerManagementSubnetManager" params: {"UNHIBERNATE"=>"CurrentTime - MachineLastMatchTime < 1200", "ROOSTER_INTERVAL"=>"300", "DAEMON_LIST"=>">= ROOSTER", "ROOSTER_MAX_UNHIBERNATE"=>"0", "ROOSTER"=>"$(LIBEXEC)/condor_rooster", "ROOSTER_WAKEUP_CMD"=>"\"$(BIN)/condor_power -d -i -s $(ROOSTER_SUBNET_MASK)\"", "ROOSTER_SUBNET_MASK"=>"", "ROOSTER_UNHIBERNATE_RANK"=>"Mips*Cpus", "ROOSTER_UNHIBERNATE"=>"Offline && Unhibernate"} depends: [] conflicts: ["PowerManagementNode"] included_features: [] annotation: "Enables power management wake up for a subnet" # wallaby show-param UNHIBERNATE Console Connection Established... UNHIBERNATE kind: "String" default: "CurrentTime - MachineLastMatchTime < 1200" description: "A boolean expression that specifies when an offline machine should be woken up" must_change: false requires_restart: false visibility_level: 0 depends: [] conflicts: [] annotation: "" >>> verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html