Bug 125371

Summary: ps uS not reporting time used by processes correctly
Product: [Fedora] Fedora Reporter: Peter Hunter <peter.hunter>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NEXTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: pfrields
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-16 04:45:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Hunter 2004-06-05 10:41:03 UTC
Description of problem:
A process with a child which eats a lot of CPU time and then dies is being reported as 
having used very little CPU. Under previous versions, the S option to ps gave summary 
information, including adding CPU time used by children to the parent info. This was very 
useful, and no longer seems to be working. (The specific process I am thinking of is the 
Folding@Home client.)

Version-Release number of selected component (if applicable):
3.2.0-1.1

How reproducible:
Always

Steps to Reproduce:
1. Run a long-running process that starts children that eat CPU and then die.
2. Run ps uS to see how much CPU has been used by the process and its children

Actual Results:  Very low CPU usage reported

Expected Results:  Very high CPU usage reported

Comment 1 Peter Hunter 2004-06-07 08:06:37 UTC
Sorry to reply to my own report. It's just that I noticed most processes with children are 
being reported correctly, so I wonder if the trouble is threading. Which might mean it is 
related to bug number 120460. Just a thought.

Comment 2 Albert Cahalan 2004-07-20 18:15:56 UTC
Here's part of an email I sent to somebody who was
interested in solving this problem. He had plenty of
code experience, but no kernel experience and little time.

I think these instructions should be simple to follow.
The only concerns are locking and cache line bouncing.
Aside from that, the code change is trivial.

-----------------------------------------------------------

> If you think there is something that I can do to help you in a 1 to 2
> day time frame, let me know.  I will work on it at night and over the
> weekend.

No, but you can produce a 90%-correct hack for yourself
in a day. This involves the kernel source, not procps.

In fs/proc/base.c find the proc_pident_lookup function.
Change this:
                case PROC_TID_STAT:
                case PROC_TGID_STAT:
                        inode->i_fop = &proc_info_file_operations;
                        ei->op.proc_read = proc_pid_stat;
                        break;

Into this:
                case PROC_TID_STAT:
                        inode->i_fop = &proc_info_file_operations;
                        ei->op.proc_read = proc_tid_stat;
                        break;
                case PROC_TGID_STAT:
                        inode->i_fop = &proc_info_file_operations;
                        ei->op.proc_read = proc_tgid_stat;
                        break;

In fs/proc/array.c, copy proc_pid_stat to make a second
function. Name one copy proc_tid_stat, and the other
copy proc_tgid_stat. There's also an extern declaration
that you need to duplicate in the fs/proc/base.c file.
Use grep to see if I missed anything.

Now you have separate per-process and per-thread functions,
but they show the same thing. Modify the per-process (tgid)
one to take CPU usage data from some new variables. You can
put these variables in the task_struct with the others,
causing non-leader tasks to have some wasted space. The
fancy solution would involve the signal struct, but that
requires different locking.

That leaves only one thing left. You need to insure that
your new struct members get updated. Find the places where
this happens for the per-thread variables. Either use the
group_leader pointer for this (in case you placed the new
data in the task_stuct of the leader) or... you're on your
own for using the signal struct.

There. Done, except for some rare glitches when multiple
threads try to update the counter at the exact same moment.


Comment 3 Dave Jones 2005-04-16 04:45:19 UTC
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.