| Summary: | Clarify Negotiator statistics cycle period definition | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Lubos Trilety <ltrilety> |
| Component: | condor | Assignee: | Erik Erlandson <eerlands> |
| Status: | CLOSED DUPLICATE | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | Development | CC: | iboverma, matt, mkudlej |
| Target Milestone: | 2.1 | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-09-20 16:23:44 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Lubos Trilety
2011-04-27 15:11:17 UTC
(In reply to comment #0) > 1) LastNegotiationCyclePeriod2 is always equal to 0 > # condor_status -subsystem negotiator -l | grep -i NegotiationCyclePeriod > LastNegotiationCyclePeriod0 = 22 > LastNegotiationCyclePeriod1 = 23 > LastNegotiationCyclePeriod2 = 0 This looks like an artifact of NEGOTIATOR_UPDATE_INTERVAL: https://bugzilla.redhat.com/show_bug.cgi?id=673538
> 2) LastNegotiationCycleNumIdleJobs is not working correctly.
>
> Additional info:
> The LastNegotiationCycleNumIdleJobs was changed after some time, but not every
> negotiator cycle.
This is almost certainly due to latency between schedd and propagation to submitter ads on collector, but I will take a look at it.
Looking at the negotiator log, it seems that both (1) and (2) are happening because jobs are being scheduled without the negotiator loop. Some kind of autoclustering/claim re-use trickery? There are two things that could cause the described behavior. The first is that NEGOTIATOR_UPDATE_INTERVAL could be too high (Bug 673538). This can be alleviated by setting it to a shorter interval. The second is that the schedd is reusing claims, and the negotiator loop is being by-passed for these jobs, after the first initial scheduling cycle (which creates the claims). Those statistics are intended to measure activity inside the negotiation loop, so in this case their being zero is acceptable. This behavior can be turned off by setting CLAIM_WORKLIFE=0 (In reply to comment #4) > There are two things that could cause the described behavior. The first is > that NEGOTIATOR_UPDATE_INTERVAL could be too high (Bug 673538). This can be > alleviated by setting it to a shorter interval. > It was found with NEGOTIATOR_UPDATE_AFTER_CYCLE set to True, so it should not be a problem. Anyway retested again with NEGOTIATOR_UPDATE_INTERVAL = 5. The results are still the same. # condor_status -subsystem negotiator -l | grep -i NegotiationCyclePeriod LastNegotiationCyclePeriod0 = 64 LastNegotiationCyclePeriod1 = 25 LastNegotiationCyclePeriod2 = 0 change after one minute: # condor_status -subsystem negotiator -l | grep -i NegotiationCyclePeriod LastNegotiationCyclePeriod0 = 63 LastNegotiationCyclePeriod1 = 64 LastNegotiationCyclePeriod2 = 0 > The second is that the schedd is reusing claims, and the negotiator loop is > being by-passed for these jobs, after the first initial scheduling cycle (which > creates the claims). Those statistics are intended to measure activity inside > the negotiation loop, so in this case their being zero is acceptable. This > behavior can be turned off by setting CLAIM_WORKLIFE=0 If you look at description of this bug, there is already set CLAIM_WORKLIFE to zero in example. >>> ASSIGNED (In reply to comment #6) > It was found with NEGOTIATOR_UPDATE_AFTER_CYCLE set to True, so it should not > be a problem. Anyway retested again with NEGOTIATOR_UPDATE_INTERVAL = 5. The > results are still the same. > > # condor_status -subsystem negotiator -l | grep -i NegotiationCyclePeriod > LastNegotiationCyclePeriod0 = 64 > LastNegotiationCyclePeriod1 = 25 > LastNegotiationCyclePeriod2 = 0 The above behavior from LastNegotiationCyclePeriod2 is because "period" is calculated using difference between end of current cycle and the previous cycle. For the last cycle-stat, there is no previous cycle, so it is defaulted to zero. For example, if you set "NEGOTIATION_CYCLE_STATS_LENGTH = 10" then you would see "LastNegotiationCyclePeriod9 = 0" relevant code is this: int period = 0; if (((1+i) < num_negotiation_cycle_stats) && (negotiation_cycle_stats[1+i] != NULL)) period = s->end_time - negotiation_cycle_stats[1+i]->end_time; Regarding LastNegotiationCycleNumIdleJobs<N>, I'm seeing correct behavior when I attempt the repro. The negotiator's measurement of idle jobs lags the schedd: $ condor_q | tail -1; condor_status -l -neg | grep LastNegotiationCycleNumIdleJobs 406 jobs; 406 idle, 0 running, 0 held LastNegotiationCycleNumIdleJobs0 = 408 LastNegotiationCycleNumIdleJobs1 = 410 LastNegotiationCycleNumIdleJobs2 = 412 $ condor_q | tail -1; condor_status -l -neg | grep LastNegotiationCycleNumIdleJobs 404 jobs; 404 idle, 0 running, 0 held LastNegotiationCycleNumIdleJobs0 = 406 LastNegotiationCycleNumIdleJobs1 = 408 LastNegotiationCycleNumIdleJobs2 = 410 (In reply to comment #7) > The above behavior from LastNegotiationCyclePeriod2 is because "period" is > calculated using difference between end of current cycle and the previous > cycle. For the last cycle-stat, there is no previous cycle, so it is defaulted > to zero. For example, if you set "NEGOTIATION_CYCLE_STATS_LENGTH = 10" then > you would see "LastNegotiationCyclePeriod9 = 0" > > relevant code is this: > int period = 0; > if (((1+i) < num_negotiation_cycle_stats) && (negotiation_cycle_stats[1+i] != > NULL)) > period = s->end_time - negotiation_cycle_stats[1+i]->end_time; I don't think it's correct behaviour, I believe that from customer point of view all of these values should indicate time between negotiator cycles. For example if customer choose "NEGOTIATION_CYCLE_STATS_LENGTH = 1", he does not expect that "LastNegotiationCyclePeriod0 = 0". > > Regarding LastNegotiationCycleNumIdleJobs<N>, I'm seeing correct behavior when > I attempt the repro. The negotiator's measurement of idle jobs lags the > schedd: > > $ condor_q | tail -1; condor_status -l -neg | grep > LastNegotiationCycleNumIdleJobs > 406 jobs; 406 idle, 0 running, 0 held > LastNegotiationCycleNumIdleJobs0 = 408 > LastNegotiationCycleNumIdleJobs1 = 410 > LastNegotiationCycleNumIdleJobs2 = 412 > > $ condor_q | tail -1; condor_status -l -neg | grep > LastNegotiationCycleNumIdleJobs > 404 jobs; 404 idle, 0 running, 0 held > LastNegotiationCycleNumIdleJobs0 = 406 > LastNegotiationCycleNumIdleJobs1 = 408 > LastNegotiationCycleNumIdleJobs2 = 410 My reproduction: # condor_q | tail -1; condor_status -l -neg | grep LastNegotiationCycleNumIdleJobs 941 jobs; 940 idle, 1 running, 0 held LastNegotiationCycleNumIdleJobs0 = 1000 LastNegotiationCycleNumIdleJobs1 = 1000 LastNegotiationCycleNumIdleJobs2 = 1000 The slower update of LastNegotiationCycleNumIdleJobs was due to SCHEDD_INTERVAL taking its default value of 5 minutes. Setting it to something smaller, such as 15 sec, keeps it in sync. Generally, SCHEDD_INTERVAL should be set to same value of NEGOTIATOR_INTERVAL for optimal behavior. QE recommends moving the remaining "cycle period" fix to 2.1, and I agree. Closing this and moving to doc ticket https://bugzilla.redhat.com/show_bug.cgi?id=739999 *** This bug has been marked as a duplicate of bug 739999 *** |