Description of problem: The negotiator only matches one job to a slot per cycle, even if that slot is partitionable, and could service multiple jobs. This slows down loading of pools using p-slots as it requires multiple cycles. Expected results: Moving the "multi-claim" logic from: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2790 from the schedd to the negotiator would allow this functionality to properly work with resource limits like group quotas and concurrency limits. https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2826 https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2818 It would also set the stage for supporting policies for both "maximum-spread" and "minimum-spread" of jobs across machines (currently only maximum-spread is possible): https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2681,31
Another quirk of negotiator treatment of p-slots is that a submitter is charged for an entire weight of a p-slot, even though it is most likely to actually only use a fraction of the resources on that slot. I think it would be a good idea to add some return-information to the claim protocol: the startd can include its "charge" for the claim against a p-slot, which represents its assessment of what fraction of resources were used. In the case of a multi-resource p-slot, I propose that the charge be the maximum of "d-slot-resource(X)/p-slot-total-resource(X)" over all resources X listed on the p-slot definition. So for example if a job requests 1/8 of the cpus, but 1/2 the memory, the match will cost 1/2 of the p-slot's weight since that's the larger fraction.
What changes mentioned in comment #1 and comment #2 are implemented for this BZ. Could you please provide any example how to test: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2790 and https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2826 It is possible to test https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2681 in Vanilla universe? If yes, could you please provide example of that test?
A "basic" test of consumption policies configuration: -------------------------------------------------------------- # spoof some cores NUM_CPUS = 20 # declare an extensible resource for a claim-based consumption policy MACHINE_RESOURCE_tokens = 3 # startd-wide consumption policy config # defaults for cpus/memory/disk consumption CONSUMPTION_POLICY = True # startd defaults, can be overridden on a per-slot-type basis CONSUMPTION_CPUS = ifthenelse(target.Cpus isnt undefined, quantize(target.Cpus, {1}), 1) CONSUMPTION_MEMORY = ifthenelse(target.Memory isnt undefined, quantize(target.Memory, {1}), 1) CONSUMPTION_DISK = ifthenelse(target.Disk isnt undefined, quantize(target.Disk, {100}), 100) CONSUMPTION_TOKENS = ifthenelse(target.Tokens isnt undefined, target.Tokens, 0) # defaults, can be overridden on a per-slot-type basis SLOT_WEIGHT = Cpus NUM_CLAIMS = 5 # slot type 1: a traditional cpu-centric policy SLOT_TYPE_1 = cpus=5,memory=100,disk=25%,tokens=0 SLOT_TYPE_1_PARTITIONABLE = True SLOT_TYPE_1_NUM_CLAIMS = 10 NUM_SLOTS_TYPE_1 = 1 # slot type 2: will demo/test a memory-centric policy SLOT_TYPE_2 = cpus=5,memory=100,disk=25%,tokens=0 SLOT_TYPE_2_PARTITIONABLE = True NUM_SLOTS_TYPE_2 = 1 SLOT_TYPE_2_CONSUMPTION_MEMORY = quantize(target.RequestMemory, {25}) SLOT_TYPE_2_SLOT_WEIGHT = floor(Memory / 25) # slot type 3: a claim-based policy # (not tied to resource such as cpu, mem, etc) SLOT_TYPE_3 = cpus=5,memory=100,disk=25%,tokens=3 SLOT_TYPE_3_PARTITIONABLE = True NUM_SLOTS_TYPE_3 = 1 # always consume 1 token, and none of anything else SLOT_TYPE_3_CONSUMPTION_TOKENS = 1 SLOT_TYPE_3_CONSUMPTION_CPUS = 0 SLOT_TYPE_3_CONSUMPTION_MEMORY = 0 SLOT_TYPE_3_CONSUMPTION_DISK = 0 # define cost in terms of available tokens for serving jobs SLOT_TYPE_3_SLOT_WEIGHT = Tokens # slot type 4: a static-slot policy # (always consume all resources) SLOT_TYPE_4 = cpus=5,memory=100,disk=25%,tokens=0 SLOT_TYPE_4_PARTITIONABLE = True NUM_SLOTS_TYPE_4 = 1 # consume all resources - emulate static slot SLOT_TYPE_4_CONSUMPTION_CPUS = Cpus SLOT_TYPE_4_CONSUMPTION_MEMORY = Memory SLOT_TYPE_4_CONSUMPTION_DISK = Disk SLOT_TYPE_4_CONSUMPTION_TOKENS = Tokens # turn this off to demonstrate that consumption policy will handle this kind of logic MUST_MODIFY_REQUEST_EXPRS = False # turn off schedd-side resource splitting since we are demonstrating neg-side alternative CLAIM_PARTITIONABLE_LEFTOVERS = False # keep slot weights enabled for match costing NEGOTIATOR_USE_SLOT_WEIGHTS = True # for simplicity, turn off preemption, caching, worklife CLAIM_WORKLIFE=0 MAXJOBRETIREMENTTIME = 3600 PREEMPT = False RANK = 0 PREEMPTION_REQUIREMENTS = False NEGOTIATOR_CONSIDER_PREEMPTION = False NEGOTIATOR_MATCHLIST_CACHING = False # verbose logging ALL_DEBUG = D_FULLDEBUG | D_MACHINE # reduce daemon update latencies NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 # This should induce SLOT_TYPE_1 and SLOT_TYPE_4 to go into owner state when # their cpu assets are exhausted, which tests claim logic fix from #3792 START = (Cpus > 0) || (SlotType is "Dynamic") -------------------------------------------------------------- After the four p-slots spin up you should see this: -------------------------------------------------------------- $ condor_status -format "%d" SlotTypeID -format " %d" SlotWeight -format " %d" Cpus -format " %d" Memory -format " %d\n" Tokens 1 5 5 100 0 2 4 5 100 0 3 3 5 100 3 4 5 5 100 0 -------------------------------------------------------------- Submit some jobs, but BEFORE you submit, set up a watch on negotiation: $ tail -f NegotiatorLog | grep -e 'Started Negotiation' -e 'Finished Negotiation' -e 'Successfully matched with' The submit file for the 13 jobs looks like this: -------------------------------------------------------------- universe = vanilla executable = /bin/sleep arguments = 300 request_cpus = 1 request_memory = 1 request_disk = 1 notification = never queue 13 -------------------------------------------------------------- Firstly, you should see that the matchmaking matches the slots like so. Each slot (slot1, slot2, ....) should negotiate *consecutively*. That is, slot1 negotiates until it is full, then slot2, etc: --------------------------------------------------------------------------- $ tail -f NegotiatorLog | grep -e 'Started Negotiation' -e 'Finished Negotiation' -e 'Successfully matched with' 08/13/13 15:30:17 ---------- Finished Negotiation Cycle ---------- 08/13/13 15:30:37 ---------- Started Negotiation Cycle ---------- 08/13/13 15:30:37 Successfully matched with slot1@localhost 08/13/13 15:30:38 Successfully matched with slot1@localhost 08/13/13 15:30:38 Successfully matched with slot1@localhost 08/13/13 15:30:38 Successfully matched with slot1@localhost 08/13/13 15:30:38 Successfully matched with slot1@localhost 08/13/13 15:30:38 Successfully matched with slot2@localhost 08/13/13 15:30:38 Successfully matched with slot2@localhost 08/13/13 15:30:38 Successfully matched with slot2@localhost 08/13/13 15:30:38 Successfully matched with slot2@localhost 08/13/13 15:30:38 Successfully matched with slot3@localhost 08/13/13 15:30:39 Successfully matched with slot3@localhost 08/13/13 15:30:39 Successfully matched with slot3@localhost 08/13/13 15:30:39 Successfully matched with slot4@localhost 08/13/13 15:30:39 ---------- Finished Negotiation Cycle ---------- --------------------------------------------------------------------------- Now re-examine the space of slots. First, the p-slots should look like as follows. Note the remaining SlotWeight values are all zero. Slot Type 1 has no Cpus left. Slot Type 2 has no Memory left. Slot Type 3 has no Tokens left. Slot Type 4 has nothing at all left of anything: --------------------------------------------------------------- $ condor_status -constraint "SlotTypeID > 0" -format "%d" SlotTypeID -format " %d" SlotWeight -format " %d" Cpus -format " %d" Memory -format " %d\n" Tokens 1 0 0 95 0 2 0 1 0 0 3 0 5 100 0 4 0 0 0 0 --------------------------------------------------------------- Now examine the corresponding d-slots. There should be five d-slots of type (-1), each with one Cpu. 4 d-slots of type (-2), each with Memory=25. 3 d-slots type (-3), each with Tokens=1. Finally, 1 d-slot type (-4), with 5 Cpus, 100 Memory, and zero Tokens. Note for type (-4), the weight is 5: --------------------------------------------------------------- $ condor_status -constraint "SlotTypeID < 0" -format "%d" SlotTypeID -format " %d" SlotWeight -format " %d" Cpus -format " %d" Memory -format " %d\n" Tokens | sort | uniq -c 5 -1 1 1 1 0 4 -2 1 1 25 0 3 -3 1 0 0 1 1 -4 5 5 100 0 ---------------------------------------------------------------
Regarding https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2790 We regard consumption policies as an alternative to CLAIM_PARTITIONABLE_LEFTOVERS (#2790), however Consumption policies and #2790 are expected to be compatible in a mixed-pool environment. That is, you should be able to mix startds with consumption policies and startds without them, and also set CLAIM_PARTITIONABLE_LEFTOVERS=True. The following test scenario should hold: ----------------------------------------------------------------------------- # spoof some cores NUM_CPUS = 5 STARTD.ST1.STARTD_LOG = $(LOG)/Startd_1_Log STARTD.ST1.STARTD_NAME = st1 STARTD.ST1.ADDRESS_FILE = $(LOG)/.startd_1_address STARTD_ST1_ARGS = -f -local-name ST1 STARTD_ST1 = $(STARTD) STARTD.ST2.STARTD_LOG = $(LOG)/Startd_2_Log STARTD.ST2.STARTD_NAME = st2 STARTD.ST2.ADDRESS_FILE = $(LOG)/.startd_2_address STARTD_ST2_ARGS = -f -local-name ST2 STARTD_ST2 = $(STARTD) # master-only procd should work USE_PROCD = FALSE DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, STARTD_ST1, STARTD_ST2 # configure an aggregate resource (p-slot) to consume SLOT_TYPE_1 = 100% SLOT_TYPE_1_PARTITIONABLE = True # declare multiple claims for negotiator to use # may also use global: NUM_CLAIMS SLOT_TYPE_1_NUM_CLAIMS = 20 NUM_SLOTS_TYPE_1 = 1 # turn on schedd-side claim splitting to test with a consumption policy CLAIM_PARTITIONABLE_LEFTOVERS = True # turn this off to demonstrate that consumption policy will handle this kind of logic MUST_MODIFY_REQUEST_EXPRS = False # configure a consumption policy. This policy is modeled on # current 'modify-request-exprs' defaults: # "my" is resource ad, "target" is job ad # startd-wide consumption policy config # defaults for cpus/memory/disk consumption STARTD.ST2.CONSUMPTION_POLICY = True # a consumption policy where match consumes whole slot each time STARTD.ST2.CONSUMPTION_CPUS = 2 STARTD.ST2.CONSUMPTION_MEMORY = 32 STARTD.ST2.CONSUMPTION_DISK = 128 # keep slot weights enabled for match costing NEGOTIATOR_USE_SLOT_WEIGHTS = True # weight used to derive match cost: W(before-consumption) - W(after-consumption) SlotWeight = Cpus # for simplicity, turn off preemption, caching, worklife CLAIM_WORKLIFE=0 MAXJOBRETIREMENTTIME = 3600 PREEMPT = False RANK = 0 PREEMPTION_REQUIREMENTS = False NEGOTIATOR_CONSIDER_PREEMPTION = False NEGOTIATOR_MATCHLIST_CACHING = False # verbose logging ALL_DEBUG = D_FULLDEBUG NEGOTIATOR_INTERVAL = 300 SCHEDD_INTERVAL = 15 ----------------------------------------------------------------------------- spin up the pool, and verify that there are two p-slots: ----------------------------------------------------------------------------- $ condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@st1@localhos LINUX X86_64 Unclaimed Idle 0.500 1897 0+00:00:04 slot1@st2@localhos LINUX X86_64 Unclaimed Idle 0.460 1897 0+00:00:04 Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 2 0 0 2 0 0 0 Total 2 0 0 2 0 0 0 ----------------------------------------------------------------------------- Now submit 10 jobs: ----------------------------------------------------------------------------- universe = vanilla cmd = /bin/sleep args = 300 should_transfer_files = if_needed when_to_transfer_output = on_exit queue 10 ----------------------------------------------------------------------------- You should see three match-events in the negotiator: ----------------------------------------------------------------------------- $ grep Matched NegotiatorLog 04/11/13 16:57:36 Matched 1.0 none.user0000@localdomain <10.0.1.3:52463> preempting none <10.0.1.3:50396> slot1@st1@localhost 04/11/13 16:57:36 Matched 1.1 none.user0000@localdomain <10.0.1.3:52463> preempting none <10.0.1.3:54095> slot1@st2@localhost 04/11/13 16:57:36 Matched 1.2 none.user0000@localdomain <10.0.1.3:52463> preempting none <10.0.1.3:54095> slot1@st2@localhost ----------------------------------------------------------------------------- Notice that only one match can occur against the "traditional" p-slot slot1@st1@localhost, but two matches are allowed against slot1@st2@localhost, since its consumption policy consumes 2 cpus per match and it has 5 cpus. However, the schedd can use leftover-splitting on slot1@st2@localhost, so it will result in 7 total jobs running. Five on slot1@st1, and two against slot1@st2: ----------------------------------------------------------------------------- $ cchist condor_q RemoteHost 5 slot1@st1@localhost 2 slot1@st2@localhost 3 undefined 10 total -----------------------------------------------------------------------------
Regarding https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2826 Consumption policies (and negotiator multi-matching) are designed to properly respect both concurrency limits and accounting group quotas. In general, p-slots should work *better* with accounting groups, because each match is charged only for what it uses at the time of the match. So, for example, this bug is also fixed: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3013
Regarding https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2681 This bug is particular to the dedicated scheduler, so a vanilla universe isn't applicable here. It is related to the known issues with CLAIM_PARTITIONABLE_LEFTOVERS (#2790) and concurrency limits: allowing multiple matches in the scheduler causes various accounting difficulties because you have to make that accounting visible to the scheduler, when before it did not have to be aware of it. Consumption policies avoid these issues, because it leaves resource accounting logic where it already resided.
UPSTREAM-8.1.2-BZ794818-consumption-policies
(In reply to Erik Erlandson from comment #5) > # startd defaults, can be overridden on a per-slot-type basis > CONSUMPTION_CPUS = ifthenelse(target.Cpus isnt undefined, > quantize(target.Cpus, {1}), 1) > CONSUMPTION_MEMORY = ifthenelse(target.Memory isnt undefined, > quantize(target.Memory, {1}), 1) > CONSUMPTION_DISK = ifthenelse(target.Disk isnt undefined, > quantize(target.Disk, {100}), 100) > CONSUMPTION_TOKENS = ifthenelse(target.Tokens isnt undefined, target.Tokens, > 0) > The above aren't very well written. they may work, but for the "wrong" reason. Proper definitions are: CONSUMPTION_CPUS = quantize(target.RequestCpus, {1}) CONSUMPTION_MEMORY = quantize(target.RequestMemory, {1}) CONSUMPTION_DISK = quantize(target.RequestDisk, {100}) CONSUMPTION_TOKENS = ifthenelse(target.RequestTokens isnt undefined, target.RequestTokens, 0)
Negotiator doesn't print any error even when the consumption policy for some resource is set to minus number. It leads to very strange behaviour. e.g. SLOT_TYPE_2 = tokens=0,cpus=5,memory=100 SLOT_TYPE_2_PARTITIONABLE = True SLOT_TYPE_2_CONSUMPTION_memory = -47 NUM_SLOTS_TYPE_2 = 1 submit some jobs see condor_status # condor_status -format "%d" SlotTypeID -format " %d" SlotWeight -format " %d" Cpus -format " %d" Memory -format " %d\n" Tokens ... 2 -487 4 -22930 0 -2 490 1 23030 0 ... For sure memory numbers are bad one.
Negotiator doesn't match all possible dynamic slots, if the requirements are bigger than consumption policy. e.g. CONSUMPTION_cpus=1 CONSUMPTION_memory=1 NUM_SLOTS_TYPE_1=1 SLOT_TYPE_1_SLOT_WEIGHT=cpus SLOT_TYPE_1_PARTITIONABLE=True SLOT_TYPE_1=tokens=0,cpus=10,memory=100 submit following jobs: should_transfer_files=IF_NEEDED executable=/bin/sleep requirements=(FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED) transfer_executable=False universe=vanilla request_cpus=3 arguments=6000 when_to_transfer_output=ON_EXIT notification=never queue 10 $ condor_status -format "%d" SlotTypeID -format " %d" SlotWeight -format " %d" Cpus -format " %d\n" Memory | sort | uniq -c 8 -1 1 1 1 1 1 2 2 92 There are still 2 cpus available consumption policy is set to 1, so it should match, but it doesn't match because job requires 3 cpus.
(In reply to Lubos Trilety from comment #13) > Negotiator doesn't match all possible dynamic slots, if the requirements are > bigger than consumption policy. > > There are still 2 cpus available consumption policy is set to 1, so it > should match, but it doesn't match because job requires 3 cpus. This is a bug. It's happening because the job ads are being optimized once, on read-in, prior to the actual matchmaking logic. RequestCpus is being replaced with the constant '3' inside the job.Requirements expression, and so the call to cp_override_requested() has no effect on the job side. It takes a RequestXXX set to some constant value > the consumption value to repro. Fix is to disable the job.Requirements optimization in an environment where consumption policies are in effect.
Corrected in condor-7.8.9-0.5 # cat NegotiatorLog ... 09/20/13 11:59:36 WARNING: consumption policy for Consumptionmemory on resource slot2.lab.eng.brq.redhat.com failed to evaluate to a non-negative numeric value 09/20/13 11:59:36 WARNING: Consumption for asset memory on resource slot2.lab.eng.brq.redhat.com was negative: -47 ... (In reply to Lubos Trilety from comment #12) > Negotiator doesn't print any error even when the consumption policy for some > resource is set to minus number. It leads to very strange behaviour. > > e.g. > SLOT_TYPE_2 = tokens=0,cpus=5,memory=100 > SLOT_TYPE_2_PARTITIONABLE = True > SLOT_TYPE_2_CONSUMPTION_memory = -47 > NUM_SLOTS_TYPE_2 = 1 > > submit some jobs > > see condor_status > > # condor_status -format "%d" SlotTypeID -format " %d" SlotWeight -format " > %d" Cpus -format " %d" Memory -format " %d\n" Tokens > ... > 2 -487 4 -22930 0 > -2 490 1 23030 0 > ... > > For sure memory numbers are bad one.
It seems it doesn't work properly with: CLAIM_PARTITIONABLE_LEFTOVERS = True For example: Configure two startds: STARTD.ST1.STARTD_LOG = $(LOG)/Startd_1_Log STARTD.ST1.STARTD_NAME = st1 STARTD.ST1.ADDRESS_FILE = $(LOG)/.startd_1_address STARTD_ST1_ARGS = -f -local-name ST1 STARTD_ST1 = $(STARTD) STARTD.ST2.STARTD_LOG = $(LOG)/Startd_2_Log STARTD.ST2.STARTD_NAME = st2 STARTD.ST2.ADDRESS_FILE = $(LOG)/.startd_2_address STARTD_ST2_ARGS = -f -local-name ST2 STARTD_ST2 = $(STARTD) # master-only procd should work USE_PROCD = FALSE DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, STARTD_ST1, STARTD_ST2 # set number of cpus NUM_CPUS=40 # Configure policy STARTD.ST1.CONSUMPTION_POLICY=True CLAIM_PARTITIONABLE_LEFTOVERS=True # configure slots SLOT_TYPE_4_SLOT_WEIGHT=floor(cpus/10) SLOT_TYPE_4_CONSUMPTION_cpus=TotalSlotcpus SLOT_TYPE_4_CONSUMPTION_tokens=0 NUM_SLOTS_TYPE_4=1 SLOT_TYPE_4_CONSUMPTION_memory=TotalSlotmemory SLOT_TYPE_4=tokens=0,cpus=10,memory=100 SLOT_TYPE_4_PARTITIONABLE=True NUM_SLOTS_TYPE_3=1 SLOT_TYPE_3_CONSUMPTION_tokens=ifthenelse(target.Requesttokens isnt undefined, quantize(target.Requesttokens, {1}), 1) SLOT_TYPE_3_CONSUMPTION_memory=0 SLOT_TYPE_3_SLOT_WEIGHT=tokens SLOT_TYPE_3_PARTITIONABLE=True SLOT_TYPE_3=tokens=3,cpus=10,memory=100 SLOT_TYPE_3_CONSUMPTION_cpus=0 NUM_SLOTS_TYPE_2=1 SLOT_TYPE_2=tokens=0,cpus=10,memory=100 SLOT_TYPE_2_PARTITIONABLE=True SLOT_TYPE_2_SLOT_WEIGHT=floor(memory/33) SLOT_TYPE_2_CONSUMPTION_memory=quantize(target.Requestmemory, {33}) NUM_SLOTS_TYPE_1=1 SLOT_TYPE_1_SLOT_WEIGHT=floor(cpus/2) SLOT_TYPE_1_PARTITIONABLE=True SLOT_TYPE_1=tokens=0,cpus=10,memory=100 # Default policy CONSUMPTION_tokens=ifthenelse(target.Requesttokens isnt undefined, target.Requesttokens, 0) CONSUMPTION_memory=quantize(target.Requestmemory, {15}) CONSUMPTION_cpus=2 # set resources MACHINE_RESOURCE_tokens=6 # other SCHEDD_INTERVAL=15 NEGOTIATOR_CONSIDER_PREEMPTION=False NEGOTIATOR_USE_SLOT_WEIGHTS=True SLOT_WEIGHT=Cpus NEGOTIATOR_INTERVAL=30 NEGOTIATOR_MATCHLIST_CACHING=False MUST_MODIFY_REQUEST_EXPRS=False PREEMPT=False RANK=0 MAXJOBRETIREMENTTIME=3600 CLAIM_WORKLIFE=0 PREEMPTION_REQUIREMENTS=False Submit following jobs: universe = vanilla cmd = /bin/sleep args = 3000 should_transfer_files = if_needed when_to_transfer_output = on_exit request_cpus = 3 request_memory = 12 queue 100 Run condor_status: # condor_status -format "%d" SlotTypeID -format " %d" Cpus -format " %d" Memory -format " %d\n" Tokens | sort | uniq -c 1 1 1 64 0 4 -1 2 15 0 1 1 2 40 0 3 -1 3 12 0 1 2 1 64 0 3 -2 2 33 0 3 -2 3 12 0 1 2 4 1 0 3 -3 0 0 1 1 3 10 100 0 1 3 1 64 3 3 -3 3 12 0 1 4 0 0 0 1 -4 10 100 0 1 4 1 64 0 3 -4 3 12 0 There is one -1 dynamic slot missing, there should be: 5 -1 2 15 0 1 1 0 25 0 instead of: 4 -1 2 15 0 1 1 2 40 0 See logs: # cat NegotiatorLog ... 09/20/13 12:14:25 Request 00001.00012: 09/20/13 12:14:25 Matched 1.12 test.lab.eng.brq.redhat.com <10.34.33.139:59430> preempting none <10.34.33.139:45008> slot1@st1.lab.eng.brq.redhat.com 09/20/13 12:14:25 Successfully matched with slot1@st1.lab.eng.brq.redhat.com ... # cat SchedLog ... 09/20/13 12:14:26 (pid:13469) Starting add_shadow_birthdate(1.12) 09/20/13 12:14:26 (pid:13469) Started shadow for job 1.12 on slot1@st2.lab.eng.brq.redhat.com <10.34.33.139:56178> for test, (shadow pid = 13585) ... Seems like scheduler take that job and run it on st2 instead of st1. It doesn't happen always though, but in most cases it happens. Sometimes the issue is on other slot than slot1.
Fixed on condor-7.8.9-0.5 (In reply to Erik Erlandson from comment #14) > (In reply to Lubos Trilety from comment #13) > > Negotiator doesn't match all possible dynamic slots, if the requirements are > > bigger than consumption policy. > > > > > There are still 2 cpus available consumption policy is set to 1, so it > > should match, but it doesn't match because job requires 3 cpus. > > > This is a bug. It's happening because the job ads are being optimized > once, on read-in, prior to the actual matchmaking logic. RequestCpus is > being replaced with the constant '3' inside the job.Requirements expression, > and so the call to cp_override_requested() has no effect on the job side. > > It takes a RequestXXX set to some constant value > the consumption value to > repro. > > Fix is to disable the job.Requirements optimization in an environment where > consumption policies are in effect.
With consumption policy active, the p-slot state was changed to matched and it remains matched even after all jobs were removed. Settings: # set number of cpus NUM_CPUS=10 # Configure policy #CONSUMPTION_POLICY=True # configure slots NUM_SLOTS_TYPE_1=1 SLOT_TYPE_1_SLOT_WEIGHT=floor(cpus/2) SLOT_TYPE_1_PARTITIONABLE=True SLOT_TYPE_1=cpus=10,memory=100 # Default policy CONSUMPTION_memory=quantize(target.Requestmemory, {15}) CONSUMPTION_cpus=2 # other SCHEDD_INTERVAL=15 NEGOTIATOR_CONSIDER_PREEMPTION=False NEGOTIATOR_USE_SLOT_WEIGHTS=True SLOT_WEIGHT=Cpus NEGOTIATOR_INTERVAL=30 NEGOTIATOR_MATCHLIST_CACHING=False MUST_MODIFY_REQUEST_EXPRS=False PREEMPT=False RANK=0 MAXJOBRETIREMENTTIME=3600 CLAIM_WORKLIFE=0 PREEMPTION_REQUIREMENTS=False Submit following job: universe = vanilla cmd = /bin/sleep args = 3000 requirements=(FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED) transfer_executable=False should_transfer_files = if_needed when_to_transfer_output = on_exit queue 10 See condor_status: # condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@host LINUX X86_64 Matched Idle 0.240 25 0+00:00:04 slot1_1@host LINUX X86_64 Claimed Busy 0.000 15 0+00:00:04 slot1_2@host LINUX X86_64 Claimed Busy 0.000 15 0+00:00:04 slot1_3@host LINUX X86_64 Claimed Busy 0.000 15 0+00:00:04 slot1_4@host LINUX X86_64 Claimed Busy 0.000 15 0+00:00:04 slot1_5@host LINUX X86_64 Claimed Busy 0.000 15 0+00:00:04 Machines Owner Claimed Unclaimed Matched Preempting X86_64/LINUX 6 0 5 0 1 0 Total 6 0 5 0 1 0 Remove all jobs: # condor_rm -all All jobs marked for removal. See condor_status: # condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@host LINUX X86_64 Matched Idle 0.050 100 0+00:01:34 Machines Owner Claimed Unclaimed Matched Preempting X86_64/LINUX 1 0 0 0 1 0 Total 1 0 0 0 1 0 No other jobs can be run. Submit previous job again, see results: # condor_q -- Submitter: host : <IP:48016> : host ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 2.0 test 9/24 13:05 0+00:00:00 I 0 0.0 sleep 3000 ... 10 jobs; 0 completed, 0 removed, 10 idle, 0 running, 0 held, 0 suspended # cat MatchLog ... 09/24/13 13:05:24 ---------- Started Negotiation Cycle ---------- 09/24/13 13:05:24 Phase 1: Obtaining ads from collector ... 09/24/13 13:05:24 Getting Scheduler, Submitter and Machine ads ... 09/24/13 13:05:24 Sorting 3 ads ... 09/24/13 13:05:24 Getting startd private ads ... 09/24/13 13:05:24 Got ads: 3 public and 1 private 09/24/13 13:05:24 Public ads include 1 submitter, 1 startd 09/24/13 13:05:24 Phase 2: Performing accounting ... 09/24/13 13:05:24 Phase 3: Sorting submitter ads by priority ... 09/24/13 13:05:24 Phase 4.1: Negotiating with schedds ... 09/24/13 13:05:24 Negotiating with test@host at <IP:48016> 09/24/13 13:05:24 0 seconds so far 09/24/13 13:05:24 Request 00002.00000: 09/24/13 13:05:24 Rejected 2.0 test@host <IP:48016>: no match found 09/24/13 13:05:24 Got NO_MORE_JOBS; done negotiating 09/24/13 13:05:24 negotiateWithGroup resources used scheddAds length 0 09/24/13 13:05:24 ---------- Finished Negotiation Cycle ---------- ... Without consumption policy the state is unclaimed not matched and after removing jobs and submitting a new ones, they are running correctly.
Tested with: condor-7.8.9-0.5 Tested on: RHEL5 i386, x86_64 RHEL6 i386, x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1294.html