Description of problem: A process with a child which eats a lot of CPU time and then dies is being reported as having used very little CPU. Under previous versions, the S option to ps gave summary information, including adding CPU time used by children to the parent info. This was very useful, and no longer seems to be working. (The specific process I am thinking of is the Folding@Home client.) Version-Release number of selected component (if applicable): 3.2.0-1.1 How reproducible: Always Steps to Reproduce: 1. Run a long-running process that starts children that eat CPU and then die. 2. Run ps uS to see how much CPU has been used by the process and its children Actual Results: Very low CPU usage reported Expected Results: Very high CPU usage reported
Sorry to reply to my own report. It's just that I noticed most processes with children are being reported correctly, so I wonder if the trouble is threading. Which might mean it is related to bug number 120460. Just a thought.
Here's part of an email I sent to somebody who was interested in solving this problem. He had plenty of code experience, but no kernel experience and little time. I think these instructions should be simple to follow. The only concerns are locking and cache line bouncing. Aside from that, the code change is trivial. ----------------------------------------------------------- > If you think there is something that I can do to help you in a 1 to 2 > day time frame, let me know. I will work on it at night and over the > weekend. No, but you can produce a 90%-correct hack for yourself in a day. This involves the kernel source, not procps. In fs/proc/base.c find the proc_pident_lookup function. Change this: case PROC_TID_STAT: case PROC_TGID_STAT: inode->i_fop = &proc_info_file_operations; ei->op.proc_read = proc_pid_stat; break; Into this: case PROC_TID_STAT: inode->i_fop = &proc_info_file_operations; ei->op.proc_read = proc_tid_stat; break; case PROC_TGID_STAT: inode->i_fop = &proc_info_file_operations; ei->op.proc_read = proc_tgid_stat; break; In fs/proc/array.c, copy proc_pid_stat to make a second function. Name one copy proc_tid_stat, and the other copy proc_tgid_stat. There's also an extern declaration that you need to duplicate in the fs/proc/base.c file. Use grep to see if I missed anything. Now you have separate per-process and per-thread functions, but they show the same thing. Modify the per-process (tgid) one to take CPU usage data from some new variables. You can put these variables in the task_struct with the others, causing non-leader tasks to have some wasted space. The fancy solution would involve the signal struct, but that requires different locking. That leaves only one thing left. You need to insure that your new struct members get updated. Find the places where this happens for the per-thread variables. Either use the group_leader pointer for this (in case you placed the new data in the task_stuct of the leader) or... you're on your own for using the signal struct. There. Done, except for some rare glitches when multiple threads try to update the counter at the exact same moment.
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.