Bug 871819 - negative ETIME field in ps
negative ETIME field in ps
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: procps-ng (Show other bugs)
19
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jaromír Cápík
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-31 09:20 EDT by Branislav Náter
Modified: 2016-01-31 20:57 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 433266
Environment:
Last Closed: 2014-01-31 13:04:45 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Branislav Náter 2012-10-31 09:20:36 EDT
Appears again in procps-ng

+++ This bug was initially created as a clone of Bug #433266 +++

Description of problem:

This is the description of the problem from a customer, after verifying it I
couldn't find anything wrong.
--
ETIME field is not obtained from procps but rather computed as a difference
between current time and process start time.

/* elapsed wall clock time, [[dd-]hh:]mm:ss format (not same as "time") */
static int pr_etime(char *restrict const outbuf, const proc_t *restrict const
pp){
 unsigned long t;
 unsigned dd,hh,mm,ss;
 char *cp = outbuf;
 t = seconds_since_boot - (unsigned long)(pp->start_time / Hertz);
 ss = t%60;
 t /= 60;
 mm = t%60;
 t /= 60;
 hh = t%24;
 t /= 24;
 dd = t;
 cp +=(     dd      ?  snprintf(cp, COLWID, "%u-", dd)           :  0 );
 cp +=( (dd || hh)  ?  snprintf(cp, COLWID, "%02u:", hh)         :  0 );
 cp +=                 snprintf(cp, COLWID, "%02u:%02u", mm, ss)       ;
 return (int)(cp-outbuf);
}


If t = -1, it will print exactly the value customer sees. I copied the sources
above to a standalone C-program, set t=-1 and the output is

{alexs 10:17:55} etime
49710-06:28:15

How can we obtain t=-1? The value 'seconds_since_boot' is global and obtained
after parsing the arguments:

ps/global.c:  seconds_since_boot    = uptime(0,0);

After that, 'ps' starts to read from /proc directory the processes info and
parse/format/print it. I can see a possibility of the following race:

(a) ps starts running and sets seconds_since_boot
(b) just after that a new process is created. As a result, it will have
  start_time > seconds_since_boot

(c) when we eventually obtain and print info for that process,

seconds_since_boot - (unsigned long)(pp->start_time / Hertz)

is negative. Obviously, for this to occur we need a heavily loaded system (so
that there are many processes - this will make 'ps' run for a longer time) and
many new processes per second.

An obvious fix would be to add an explicit test to pr_etime() so that if we
see a negative value, we'll replace it by 0. E.g.:

 t = seconds_since_boot - (unsigned long)(pp->start_time / Hertz);
 if (t < 0) t = 0;

The problem is not RH-specific and is even workarounded in some tools, e.g.
http://ar
chive.groundworkopensource.com/groundwork-opensource/tags/GROUNDWORK_MONITO
R_5_1_0/monitor-core/nagios/plugins/check_procs.c
       if (hyphcnt > 0) {
               sscanf(etime, "%d-%d:%d:%d",
                               &days, &hours, &minutes, &seconds);
               /* linux 2.6.5/2.6.6 reporting some processes with infinite
                * elapsed times for some reason */
               if (days == 49710) {
                       return 0;
               }
--

Version-Release number of selected component (if applicable):
procps-3.2.3-8.6

How reproducible:
Not easily, not often. Please read below a comment explaining the way it's
suspected to be reproduced:

"Obviously, for this to occur we need a heavily loaded system (so
that there are many processes - this will make 'ps' run for a longer time) and
many new processes per second."

Steps to Reproduce:
1. Unknown
2. ps -eo uname,pid,ppid,stime,etime,cmd
  
Actual results:

% ps -eo uname,pid,ppid,stime,etime,cmd
ETIME shows: 49710-06:28:15

Expected results:
Correct ETIME.

Additional info:
Proposed patch attached.
Comment 2 Fedora End Of Life 2013-04-03 15:52:57 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Comment 3 Jaromír Cápík 2013-04-11 09:08:05 EDT
Please, retest with 3.3.7.
Comment 4 Jaromír Cápík 2014-01-31 13:04:45 EST
The fix should be available in all currently supported Fedora releases. Closing.

Note You need to log in before you can comment on or make changes to this bug.