Bug 89959

Summary: "sa" shows wrong cpu time for killed processes
Product: Red Hat Enterprise Linux 3 Reporter: Need Real Name <vpose>
Component: psacctAssignee: Charlie Bennett <ccb>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: bernds, botsch, gasi, jan.iven
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-21 03:50:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2003-04-30 11:33:36 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Description of problem:
For processes beeing killed (e.g. CTRL-C, KILL, TERM) "sa" utility shows 
a "cpu" value about 5 times larger then the actual consumed cpu user and 
system time, using an SMP kernel: 2.4.18-27.7.x.cernsmp (modified at CERN: 
http://www.cern.ch).

Version-Release number of selected component (if applicable):
psacct-6.3.2-19

How reproducible:
Always

Steps to Reproduce:
1.Start a cycling process consuming ~100% of one CPU
2.Run it for 10 minutes
3.Kill it with TERM, KILL or CTRL-C
    

Actual Results:  "sa -u" shows 3067.60 cpu for the process

Expected Results:  "sa -u" should show about 600.00 cpu

Additional info:

On our interactive nodes we use a daemon process "beniced" to kill user 
processes consuming more then 30 minutes CPU time and now these processes are 
accounted with about 5 times more CPU time, then actually consumed.

Comment 1 Dave Botsch 2003-05-04 23:55:01 UTC
We are having the same problem, but it is worse than simply killed processes.

Kernel 2.4.18-10 is fine. Upgrading to 2.4.18-27.7.x has broken processing
accounting.

Here is an example of it being broken:

> time openssl speed
(SNIP)

314.796u 0.056s 5:21.43 97.9%   0+0k 0+0io 232pf+0w

> lastcomm |grep openssl
openssl                 dwb7     stdin    1611.81 secs Sun May  4 19:47

as you can see, openssl speed took about 5 minutes to run. But lastcom shows the
time being about 5 times that number. Reproducible 100% of the time on ANY
computer running 2.4.18-27.

This is a HUGE problem as it results in users being overcharged for cpu time.

Comment 2 Dave Botsch 2003-05-06 17:17:51 UTC
We believe we've found the problem.

In 2.4.18-27, the HZ value is set to 512 instead of 100 for i686 and athlon (but
for some odd reason, still set to 100 for i386 and i586). You can confirm this
by grepping for "_HZ" in the configs for kernel source:

> grep "_HZ" *
kernel-2.4.18-athlon.config:CONFIG_HZ=512
kernel-2.4.18-athlon.config:CONFIG_HZ=512
kernel-2.4.18-athlon-smp.config:CONFIG_HZ=512
kernel-2.4.18-athlon-smp.config:CONFIG_HZ=512
kernel-2.4.18-i386-BOOT.config:CONFIG_HZ=100
kernel-2.4.18-i386.config:CONFIG_HZ=100
kernel-2.4.18-i586.config:CONFIG_HZ=100
kernel-2.4.18-i586-smp.config:CONFIG_HZ=100
kernel-2.4.18-i686-bigmem.config:CONFIG_HZ=512
kernel-2.4.18-i686-bigmem.config:CONFIG_HZ=512
kernel-2.4.18-i686.config:CONFIG_HZ=512
kernel-2.4.18-i686.config:CONFIG_HZ=512
kernel-2.4.18-i686-debug.config:CONFIG_HZ=512
kernel-2.4.18-i686-debug.config:CONFIG_HZ=512
kernel-2.4.18-i686-smp.config:CONFIG_HZ=512
kernel-2.4.18-i686-smp.config:CONFIG_HZ=512


For things like proc file output, a conversion is done so that you see the right
values in realtime:

from linux-2.4.18-variablehz.patch:

+               hz_to_std(task->times.tms_utime),
+               hz_to_std(task->times.tms_stime),
+               hz_to_std(task->times.tms_cutime),
+               hz_to_std(task->times.tms_cstime),


but, if you look at acct.c, which records the accounting files, this conversion
is not done. 

So, either acct.c needs to be patched to do the conversion, or you have to go
download later source for gnuacct which supports the --ahz option allowing you
to specify the hz value of 512 to get correct output. Or, you have to edit the
configs to define CONFIG_HZ as 100 for the appropriate archs instead of as 512.

Comment 3 Charlie Bennett 2004-04-07 17:50:51 UTC
*** Bug 114617 has been marked as a duplicate of this bug. ***

Comment 4 Charlie Bennett 2004-04-07 17:54:29 UTC
*** Bug 117817 has been marked as a duplicate of this bug. ***

Comment 5 Charlie Bennett 2004-09-23 18:49:04 UTC
This is fixed in FC3t3 and AS2.1.  Moving this bug to RHEL3 so that it
gets fixed there, too.

Comment 6 Charlie Bennett 2004-09-27 16:43:14 UTC
Erratum RHBA-2004:497-05

Comment 7 John Flanagan 2004-12-21 03:50:31 UTC
An advisory has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-497.html