Bug 89959 - "sa" shows wrong cpu time for killed processes
Summary: "sa" shows wrong cpu time for killed processes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: psacct
Version: 3.0
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Charlie Bennett
QA Contact:
URL:
Whiteboard:
: 114617 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-04-30 11:33 UTC by Need Real Name
Modified: 2007-11-30 22:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-12-21 03:50:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:497 0 normal SHIPPED_LIVE Updated psacct package 2004-12-20 05:00:00 UTC

Description Need Real Name 2003-04-30 11:33:36 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Description of problem:
For processes beeing killed (e.g. CTRL-C, KILL, TERM) "sa" utility shows 
a "cpu" value about 5 times larger then the actual consumed cpu user and 
system time, using an SMP kernel: 2.4.18-27.7.x.cernsmp (modified at CERN: 
http://www.cern.ch).

Version-Release number of selected component (if applicable):
psacct-6.3.2-19

How reproducible:
Always

Steps to Reproduce:
1.Start a cycling process consuming ~100% of one CPU
2.Run it for 10 minutes
3.Kill it with TERM, KILL or CTRL-C
    

Actual Results:  "sa -u" shows 3067.60 cpu for the process

Expected Results:  "sa -u" should show about 600.00 cpu

Additional info:

On our interactive nodes we use a daemon process "beniced" to kill user 
processes consuming more then 30 minutes CPU time and now these processes are 
accounted with about 5 times more CPU time, then actually consumed.

Comment 1 Dave Botsch 2003-05-04 23:55:01 UTC
We are having the same problem, but it is worse than simply killed processes.

Kernel 2.4.18-10 is fine. Upgrading to 2.4.18-27.7.x has broken processing
accounting.

Here is an example of it being broken:

> time openssl speed
(SNIP)

314.796u 0.056s 5:21.43 97.9%   0+0k 0+0io 232pf+0w

> lastcomm |grep openssl
openssl                 dwb7     stdin    1611.81 secs Sun May  4 19:47

as you can see, openssl speed took about 5 minutes to run. But lastcom shows the
time being about 5 times that number. Reproducible 100% of the time on ANY
computer running 2.4.18-27.

This is a HUGE problem as it results in users being overcharged for cpu time.

Comment 2 Dave Botsch 2003-05-06 17:17:51 UTC
We believe we've found the problem.

In 2.4.18-27, the HZ value is set to 512 instead of 100 for i686 and athlon (but
for some odd reason, still set to 100 for i386 and i586). You can confirm this
by grepping for "_HZ" in the configs for kernel source:

> grep "_HZ" *
kernel-2.4.18-athlon.config:CONFIG_HZ=512
kernel-2.4.18-athlon.config:CONFIG_HZ=512
kernel-2.4.18-athlon-smp.config:CONFIG_HZ=512
kernel-2.4.18-athlon-smp.config:CONFIG_HZ=512
kernel-2.4.18-i386-BOOT.config:CONFIG_HZ=100
kernel-2.4.18-i386.config:CONFIG_HZ=100
kernel-2.4.18-i586.config:CONFIG_HZ=100
kernel-2.4.18-i586-smp.config:CONFIG_HZ=100
kernel-2.4.18-i686-bigmem.config:CONFIG_HZ=512
kernel-2.4.18-i686-bigmem.config:CONFIG_HZ=512
kernel-2.4.18-i686.config:CONFIG_HZ=512
kernel-2.4.18-i686.config:CONFIG_HZ=512
kernel-2.4.18-i686-debug.config:CONFIG_HZ=512
kernel-2.4.18-i686-debug.config:CONFIG_HZ=512
kernel-2.4.18-i686-smp.config:CONFIG_HZ=512
kernel-2.4.18-i686-smp.config:CONFIG_HZ=512


For things like proc file output, a conversion is done so that you see the right
values in realtime:

from linux-2.4.18-variablehz.patch:

+               hz_to_std(task->times.tms_utime),
+               hz_to_std(task->times.tms_stime),
+               hz_to_std(task->times.tms_cutime),
+               hz_to_std(task->times.tms_cstime),


but, if you look at acct.c, which records the accounting files, this conversion
is not done. 

So, either acct.c needs to be patched to do the conversion, or you have to go
download later source for gnuacct which supports the --ahz option allowing you
to specify the hz value of 512 to get correct output. Or, you have to edit the
configs to define CONFIG_HZ as 100 for the appropriate archs instead of as 512.

Comment 3 Charlie Bennett 2004-04-07 17:50:51 UTC
*** Bug 114617 has been marked as a duplicate of this bug. ***

Comment 4 Charlie Bennett 2004-04-07 17:54:29 UTC
*** Bug 117817 has been marked as a duplicate of this bug. ***

Comment 5 Charlie Bennett 2004-09-23 18:49:04 UTC
This is fixed in FC3t3 and AS2.1.  Moving this bug to RHEL3 so that it
gets fixed there, too.

Comment 6 Charlie Bennett 2004-09-27 16:43:14 UTC
Erratum RHBA-2004:497-05

Comment 7 John Flanagan 2004-12-21 03:50:31 UTC
An advisory has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-497.html



Note You need to log in before you can comment on or make changes to this bug.