Description of problem: Why does this value seem to always be 0? Also 0 under the Schedulers tab. Is this coming from the schema, or from cumin?
Since its introduction on 22 Dec 2005, MonitorSelfAge on Linux has been 0. src/condor_procapi/procapi.cpp's ProcAPI::getProcInfoRaw on Linux attempts to read proc in a loop, to absorb errors. Unfortunately, the sample_time (used to calculate age) is reset to 0 each time around the loop, including the first, and never properly re-set. Fix is to move the setting of sample_time() into the loop, after the the procRaw is initialized (zero'd). diff --git a/src/condor_procapi/procapi.cpp b/src/condor_procapi/procapi.cpp index fcb28fa..bee8e36 100644 --- a/src/condor_procapi/procapi.cpp +++ b/src/condor_procapi/procapi.cpp @@ -492,12 +492,6 @@ ProcAPI::getProcInfoRaw( pid_t pid, procInfoRaw& procRaw, int &status ) // assume success status = PROCAPI_OK; - // clear the memory of procRaw - initProcInfoRaw(procRaw); - - // set the sample time - procRaw.sample_time = secsSinceEpoch(); - // read the entry a certain number of times since it appears that linux // often simply does something stupid while reading. sprintf( path, "/proc/%d/stat", pid ); @@ -508,7 +502,10 @@ ProcAPI::getProcInfoRaw( pid_t pid, procInfoRaw& procRaw, int &status ) // in case I must restart, assume that everything is ok again... status = PROCAPI_OK; + // clear the memory of procRaw initProcInfoRaw(procRaw); + // set the sample time + procRaw.sample_time = secsSinceEpoch(); if( (fp = safe_fopen_wrapper(path, "r")) == NULL ) { if( errno == ENOENT ) {
Resolved upstream on V7_6-branch, 520a7f9c
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: Internal uptime stat reset to 0 before being reported C: MonitorSelfAge is always 0 F: Internal uptime stat no longer reset to 0 R: MonitorSelfAge now reflects uptime
Note: Detect problematically with condor_status -long -master/-schedd/-any
Tested on: $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: I686-RedHat_5.6 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: X86_64-RedHat_5.6 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: I686-RedHat_6.0 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: X86_64-RedHat_6.0 $ # condor_status -long -master/-schedd/-any | grep MonitorSelfAge MonitorSelfAge = 1684 available also using qpid-tool and cumin under slot(s) >>> VERIFIED
(In reply to comment #7) > available also using qpid-tool and cumin under slot(s) under Overview->Performance
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html