| Summary: | MonitorSelfAge always 0 on Linux: Up-time value for schedulers on Overview->Performance is 0 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Trevor McKay <tmckay> |
| Component: | condor | Assignee: | Matthew Farrellee <matt> |
| Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 1.0 | CC: | croberts, eallen, iboverma, ltoscano, ltrilety, matt |
| Target Milestone: | 2.0 | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | condor-7.6.1-0.4 | Doc Type: | Bug Fix |
| Doc Text: |
C: Internal uptime stat reset to 0 before being reported
C: MonitorSelfAge is always 0
F: Internal uptime stat no longer reset to 0
R: MonitorSelfAge now reflects uptime
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-06-23 15:39:32 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 693778 | ||
|
Description
Trevor McKay
2011-04-26 02:41:04 UTC
Since its introduction on 22 Dec 2005, MonitorSelfAge on Linux has been 0.
src/condor_procapi/procapi.cpp's ProcAPI::getProcInfoRaw on Linux attempts to read proc in a loop, to absorb errors. Unfortunately, the sample_time (used to calculate age) is reset to 0 each time around the loop, including the first, and never properly re-set.
Fix is to move the setting of sample_time() into the loop, after the the procRaw is initialized (zero'd).
diff --git a/src/condor_procapi/procapi.cpp b/src/condor_procapi/procapi.cpp
index fcb28fa..bee8e36 100644
--- a/src/condor_procapi/procapi.cpp
+++ b/src/condor_procapi/procapi.cpp
@@ -492,12 +492,6 @@ ProcAPI::getProcInfoRaw( pid_t pid, procInfoRaw& procRaw, int &status )
// assume success
status = PROCAPI_OK;
- // clear the memory of procRaw
- initProcInfoRaw(procRaw);
-
- // set the sample time
- procRaw.sample_time = secsSinceEpoch();
-
// read the entry a certain number of times since it appears that linux
// often simply does something stupid while reading.
sprintf( path, "/proc/%d/stat", pid );
@@ -508,7 +502,10 @@ ProcAPI::getProcInfoRaw( pid_t pid, procInfoRaw& procRaw, int &status )
// in case I must restart, assume that everything is ok again...
status = PROCAPI_OK;
+ // clear the memory of procRaw
initProcInfoRaw(procRaw);
+ // set the sample time
+ procRaw.sample_time = secsSinceEpoch();
if( (fp = safe_fopen_wrapper(path, "r")) == NULL ) {
if( errno == ENOENT ) {
Resolved upstream on V7_6-branch, 520a7f9c
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
C: Internal uptime stat reset to 0 before being reported
C: MonitorSelfAge is always 0
F: Internal uptime stat no longer reset to 0
R: MonitorSelfAge now reflects uptime
Note: Detect problematically with condor_status -long -master/-schedd/-any Tested on:
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: I686-RedHat_5.6 $
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: I686-RedHat_6.0 $
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: X86_64-RedHat_6.0 $
# condor_status -long -master/-schedd/-any | grep MonitorSelfAge
MonitorSelfAge = 1684
available also using qpid-tool and cumin under slot(s)
>>> VERIFIED
(In reply to comment #7) > available also using qpid-tool and cumin under slot(s) under Overview->Performance An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html |