433266 – negative ETIME field in ps

Bug 433266 - negative ETIME field in ps

Summary: negative ETIME field in ps

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	procps
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Daniel Novotny
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	391511 456484
TreeView+	depends on / blocked

Reported:	2008-02-18 08:44 UTC by Jose Plans
Modified:	2018-10-20 00:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	556762 580877 871819 (view as bug list)
Environment:
Last Closed:	2009-05-18 20:04:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
procps-etime.patch (373 bytes, patch) 2008-02-18 08:48 UTC, Jose Plans	no flags	Details \| Diff
a new patch (441 bytes, patch) 2009-03-20 12:13 UTC, Daniel Novotny	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:0950	0	normal	SHIPPED_LIVE	procps bug fix update	2009-05-18 13:21:52 UTC

Description Jose Plans 2008-02-18 08:44:37 UTC

Description of problem:

This is the description of the problem from a customer, after verifying it I
couldn't find anything wrong.
--
ETIME field is not obtained from procps but rather computed as a difference
between current time and process start time.

/* elapsed wall clock time, [[dd-]hh:]mm:ss format (not same as "time") */
static int pr_etime(char *restrict const outbuf, const proc_t *restrict const
pp){
 unsigned long t;
 unsigned dd,hh,mm,ss;
 char *cp = outbuf;
 t = seconds_since_boot - (unsigned long)(pp->start_time / Hertz);
 ss = t%60;
 t /= 60;
 mm = t%60;
 t /= 60;
 hh = t%24;
 t /= 24;
 dd = t;
 cp +=(     dd      ?  snprintf(cp, COLWID, "%u-", dd)           :  0 );
 cp +=( (dd || hh)  ?  snprintf(cp, COLWID, "%02u:", hh)         :  0 );
 cp +=                 snprintf(cp, COLWID, "%02u:%02u", mm, ss)       ;
 return (int)(cp-outbuf);
}


If t = -1, it will print exactly the value customer sees. I copied the sources
above to a standalone C-program, set t=-1 and the output is

{alexs 10:17:55} etime
49710-06:28:15

How can we obtain t=-1? The value 'seconds_since_boot' is global and obtained
after parsing the arguments:

ps/global.c:  seconds_since_boot    = uptime(0,0);

After that, 'ps' starts to read from /proc directory the processes info and
parse/format/print it. I can see a possibility of the following race:

(a) ps starts running and sets seconds_since_boot
(b) just after that a new process is created. As a result, it will have
  start_time > seconds_since_boot

(c) when we eventually obtain and print info for that process,

seconds_since_boot - (unsigned long)(pp->start_time / Hertz)

is negative. Obviously, for this to occur we need a heavily loaded system (so
that there are many processes - this will make 'ps' run for a longer time) and
many new processes per second.

An obvious fix would be to add an explicit test to pr_etime() so that if we
see a negative value, we'll replace it by 0. E.g.:

 t = seconds_since_boot - (unsigned long)(pp->start_time / Hertz);
 if (t < 0) t = 0;

The problem is not RH-specific and is even workarounded in some tools, e.g.
http://ar
chive.groundworkopensource.com/groundwork-opensource/tags/GROUNDWORK_MONITO
R_5_1_0/monitor-core/nagios/plugins/check_procs.c
       if (hyphcnt > 0) {
               sscanf(etime, "%d-%d:%d:%d",
                               &days, &hours, &minutes, &seconds);
               /* linux 2.6.5/2.6.6 reporting some processes with infinite
                * elapsed times for some reason */
               if (days == 49710) {
                       return 0;
               }
--

Version-Release number of selected component (if applicable):
procps-3.2.3-8.6

How reproducible:
Not easily, not often. Please read below a comment explaining the way it's
suspected to be reproduced:

"Obviously, for this to occur we need a heavily loaded system (so
that there are many processes - this will make 'ps' run for a longer time) and
many new processes per second."

Steps to Reproduce:
1. Unknown
2. ps -eo uname,pid,ppid,stime,etime,cmd
  
Actual results:

% ps -eo uname,pid,ppid,stime,etime,cmd
ETIME shows: 49710-06:28:15

Expected results:
Correct ETIME.

Additional info:
Proposed patch attached.

Comment 1 Jose Plans 2008-02-18 08:48:14 UTC

Created attachment 295142 [details]
procps-etime.patch

Comment 2 RHEL Program Management 2008-09-05 17:22:46 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Daniel Novotny 2009-03-20 12:13:39 UTC

Created attachment 336033 [details]
a new patch

Comment 12 errata-xmlrpc 2009-05-18 20:04:13 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0950.html

Note You need to log in before you can comment on or make changes to this bug.