699571 – MonitorSelfAge always 0 on Linux: Up-time value for schedulers on Overview->Performance is 0

Bug 699571 - MonitorSelfAge always 0 on Linux: Up-time value for schedulers on Overview->Performance is 0

Summary: MonitorSelfAge always 0 on Linux: Up-time value for schedulers on Overview->P...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	condor
Sub Component:
Version:	1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	2.0
Target Release:	---
Assignee:	Matthew Farrellee
QA Contact:	Lubos Trilety
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	693778
TreeView+	depends on / blocked

Reported:	2011-04-26 02:41 UTC by Trevor McKay
Modified:	2011-06-23 15:39 UTC (History)
CC List:	6 users (show)
Fixed In Version:	condor-7.6.1-0.4
Doc Type:	Bug Fix
Doc Text:	C: Internal uptime stat reset to 0 before being reported C: MonitorSelfAge is always 0 F: Internal uptime stat no longer reset to 0 R: MonitorSelfAge now reflects uptime
Clone Of:
Environment:
Last Closed:	2011-06-23 15:39:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2011:0889	0	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Grid 2.0 Release	2011-06-23 15:35:53 UTC

Description Trevor McKay 2011-04-26 02:41:04 UTC

Description of problem:

Why does this value seem to always be 0?  Also 0 under the Schedulers tab.

Is this coming from the schema, or from cumin?

Comment 2 Matthew Farrellee 2011-04-26 13:02:40 UTC

Since its introduction on 22 Dec 2005, MonitorSelfAge on Linux has been 0.

src/condor_procapi/procapi.cpp's ProcAPI::getProcInfoRaw on Linux attempts to read proc in a loop, to absorb errors. Unfortunately, the sample_time (used to calculate age) is reset to 0 each time around the loop, including the first, and never properly re-set.

Fix is to move the setting of sample_time() into the loop, after the the procRaw is initialized (zero'd).

diff --git a/src/condor_procapi/procapi.cpp b/src/condor_procapi/procapi.cpp
index fcb28fa..bee8e36 100644
--- a/src/condor_procapi/procapi.cpp
+++ b/src/condor_procapi/procapi.cpp
@@ -492,12 +492,6 @@ ProcAPI::getProcInfoRaw( pid_t pid, procInfoRaw& procRaw, int &status )
                // assume success
        status = PROCAPI_OK;
 
-               // clear the memory of procRaw
-       initProcInfoRaw(procRaw);
-
-               // set the sample time
-       procRaw.sample_time = secsSinceEpoch();
-
        // read the entry a certain number of times since it appears that linux
        // often simply does something stupid while reading.
        sprintf( path, "/proc/%d/stat", pid );
@@ -508,7 +502,10 @@ ProcAPI::getProcInfoRaw( pid_t pid, procInfoRaw& procRaw, int &status )
 
                // in case I must restart, assume that everything is ok again...
                status = PROCAPI_OK;
+               // clear the memory of procRaw
                initProcInfoRaw(procRaw);
+               // set the sample time
+               procRaw.sample_time = secsSinceEpoch();
 
                if( (fp = safe_fopen_wrapper(path, "r")) == NULL ) {
                        if( errno == ENOENT ) {

Comment 3 Matthew Farrellee 2011-04-26 13:09:09 UTC

Resolved upstream on V7_6-branch, 520a7f9c

Comment 4 Matthew Farrellee 2011-04-27 20:24:09 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Internal uptime stat reset to 0 before being reported
C: MonitorSelfAge is always 0
F: Internal uptime stat no longer reset to 0
R: MonitorSelfAge now reflects uptime

Comment 5 Matthew Farrellee 2011-04-28 11:34:15 UTC

Note: Detect problematically with condor_status -long -master/-schedd/-any

Comment 7 Lubos Trilety 2011-05-09 14:29:12 UTC

Tested on:
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: I686-RedHat_5.6 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: I686-RedHat_6.0 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: X86_64-RedHat_6.0 $

# condor_status -long -master/-schedd/-any | grep MonitorSelfAge
MonitorSelfAge = 1684

available also using qpid-tool and cumin under slot(s)

>>> VERIFIED

Comment 8 Lubos Trilety 2011-05-09 14:30:45 UTC

(In reply to comment #7)
> available also using qpid-tool and cumin under slot(s)

under Overview->Performance

Comment 9 errata-xmlrpc 2011-06-23 15:39:32 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0889.html

Note You need to log in before you can comment on or make changes to this bug.