Bug 433266 - negative ETIME field in ps
negative ETIME field in ps
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: procps (Show other bugs)
4.8
All Linux
medium Severity medium
: rc
: ---
Assigned To: Daniel Novotny
Brian Brock
:
Depends On:
Blocks: 391511 456484
  Show dependency treegraph
 
Reported: 2008-02-18 03:44 EST by Jose Plans
Modified: 2012-12-03 07:31 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 556762 580877 871819 (view as bug list)
Environment:
Last Closed: 2009-05-18 16:04:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
procps-etime.patch (373 bytes, patch)
2008-02-18 03:48 EST, Jose Plans
no flags Details | Diff
a new patch (441 bytes, patch)
2009-03-20 08:13 EDT, Daniel Novotny
no flags Details | Diff

  None (edit)
Description Jose Plans 2008-02-18 03:44:37 EST
Description of problem:

This is the description of the problem from a customer, after verifying it I
couldn't find anything wrong.
--
ETIME field is not obtained from procps but rather computed as a difference
between current time and process start time.

/* elapsed wall clock time, [[dd-]hh:]mm:ss format (not same as "time") */
static int pr_etime(char *restrict const outbuf, const proc_t *restrict const
pp){
 unsigned long t;
 unsigned dd,hh,mm,ss;
 char *cp = outbuf;
 t = seconds_since_boot - (unsigned long)(pp->start_time / Hertz);
 ss = t%60;
 t /= 60;
 mm = t%60;
 t /= 60;
 hh = t%24;
 t /= 24;
 dd = t;
 cp +=(     dd      ?  snprintf(cp, COLWID, "%u-", dd)           :  0 );
 cp +=( (dd || hh)  ?  snprintf(cp, COLWID, "%02u:", hh)         :  0 );
 cp +=                 snprintf(cp, COLWID, "%02u:%02u", mm, ss)       ;
 return (int)(cp-outbuf);
}


If t = -1, it will print exactly the value customer sees. I copied the sources
above to a standalone C-program, set t=-1 and the output is

{alexs 10:17:55} etime
49710-06:28:15

How can we obtain t=-1? The value 'seconds_since_boot' is global and obtained
after parsing the arguments:

ps/global.c:  seconds_since_boot    = uptime(0,0);

After that, 'ps' starts to read from /proc directory the processes info and
parse/format/print it. I can see a possibility of the following race:

(a) ps starts running and sets seconds_since_boot
(b) just after that a new process is created. As a result, it will have
  start_time > seconds_since_boot

(c) when we eventually obtain and print info for that process,

seconds_since_boot - (unsigned long)(pp->start_time / Hertz)

is negative. Obviously, for this to occur we need a heavily loaded system (so
that there are many processes - this will make 'ps' run for a longer time) and
many new processes per second.

An obvious fix would be to add an explicit test to pr_etime() so that if we
see a negative value, we'll replace it by 0. E.g.:

 t = seconds_since_boot - (unsigned long)(pp->start_time / Hertz);
 if (t < 0) t = 0;

The problem is not RH-specific and is even workarounded in some tools, e.g.
http://ar
chive.groundworkopensource.com/groundwork-opensource/tags/GROUNDWORK_MONITO
R_5_1_0/monitor-core/nagios/plugins/check_procs.c
       if (hyphcnt > 0) {
               sscanf(etime, "%d-%d:%d:%d",
                               &days, &hours, &minutes, &seconds);
               /* linux 2.6.5/2.6.6 reporting some processes with infinite
                * elapsed times for some reason */
               if (days == 49710) {
                       return 0;
               }
--

Version-Release number of selected component (if applicable):
procps-3.2.3-8.6

How reproducible:
Not easily, not often. Please read below a comment explaining the way it's
suspected to be reproduced:

"Obviously, for this to occur we need a heavily loaded system (so
that there are many processes - this will make 'ps' run for a longer time) and
many new processes per second."

Steps to Reproduce:
1. Unknown
2. ps -eo uname,pid,ppid,stime,etime,cmd
  
Actual results:

% ps -eo uname,pid,ppid,stime,etime,cmd
ETIME shows: 49710-06:28:15

Expected results:
Correct ETIME.

Additional info:
Proposed patch attached.
Comment 1 Jose Plans 2008-02-18 03:48:14 EST
Created attachment 295142 [details]
procps-etime.patch
Comment 2 RHEL Product and Program Management 2008-09-05 13:22:46 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 8 Daniel Novotny 2009-03-20 08:13:39 EDT
Created attachment 336033 [details]
a new patch
Comment 12 errata-xmlrpc 2009-05-18 16:04:13 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0950.html

Note You need to log in before you can comment on or make changes to this bug.