| Summary: | RFE: Make Schedd send updates on job remove | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Jan Sarenik <jsarenik> |
| Component: | condor | Assignee: | Matthew Farrellee <matt> |
| Status: | CLOSED NOTABUG | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 1.3 | CC: | eerlands, iboverma, ltoscano, matt, tmckay |
| Target Milestone: | 2.0 | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-02-24 12:44:13 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 634302 | ||
| Bug Blocks: | |||
|
Description
Jan Sarenik
2011-01-27 16:44:14 UTC
I would expect condor_status -submitter/-sched to exhibit the same behavior. This is indeed visible from condor_status -schedd/-submitter as well. The Schedd publishes on SCHEDD_INTERVAL, at the end of a negotiation cycle, at a reconfig or on a reschedule request. Until a publish the information in the Collector may out stale, as well as the information in the QMF object space. It is probably ok to tickle the Schedd to publish an update on remove, but may have scale implications. The publishing is done as part of a scan of the entire queue. However, the timeout() code has some protections to prevent processing the queue too frequently. Let's turn this into an RFE for tickling the collector update. The Schedd also does not send an update when a job completes. This means the number of running jobs may be stale after a job exits. Additionally, the Schedd does not send an update when a job starts running. Also, the Schedd does not send an update when holding a job. There are many paths to a job changing state that do not result in an update to the Collector. Another not listed above is periodic expression evaluation. Even though timeout() protects itself from rapid repeated calls, given an active Schedd, the calls will effectively make SCHEDD_INTERVAL = SCHEDD_MIN_INTERVAL. Instead of tickling timeout() for each such transition, I suggest setting SCHEDD_INTERVAL to a lower value, one that provides an acceptable lag for a deployment. Wild speculation: SCHEDD_INTERVAL for small or medium sized deployments could be easily set to 30 (from 300). For large deployments, a shorter publish interval may impact Schedd throughput. |