Bug 116783
Summary: | Java threads cpu time not counted in ps, top, etc. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Brian Krahmer <brian> | ||||||||||
Component: | procps | Assignee: | Karel Zak <kzak> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | rawhide | CC: | gbailey, hrunting, idht4n | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | i686 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2005-06-30 13:33:08 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Brian Krahmer
2004-02-25 04:58:27 UTC
Created attachment 98029 [details]
Test case
Run this with 'time java proctest', then run again and watch it with ps and top
Here is a fix that works on my system. It only addresses the issue at hand... Before: [brian@selkirk /]$ ps -ef brian 5277 5276 0 Mar04 pts/0 00:00:00 java -server ... After: [brian@selkirk /]$ ps -ef brian 5277 5276 0 Mar04 pts/0 00:02:04 java -server ... Here is the diff: [root@selkirk proc]# diff readproc.h readproc.h.bak 215,217d214 < //add the task times to our process time < extern void add_task_info(PROCTAB *restrict const PT, const proc_t *restrict const p); < [root@selkirk proc]# diff readproc.h readproc.h.bak 215,217d214 < //add the task times to our process time < extern void add_task_info(PROCTAB *restrict const PT, const proc_t *restrict const p); < [root@selkirk proc]# diff readproc.c readproc.c.bak 554,558d553 < if (p->nlwp > 1) < { < add_task_info(PT, p); < } < 564,579d558 < static void add_task_info(PROCTAB *restrict const PT, const proc_t *restrict const p) < { < proc_t *task; < < for (;;) < { < task = readtask(PT, p, NULL); < if (task == NULL) < break; < p->utime += task->utime; < p->stime += task->stime; < p->utime += task->cutime; < p->stime += task->cstime; < } < } < BTW, these patches are against 3.2.0-1 Since I've done the hard work, please consider fixing this! :) added patch to procps-3_2_0-3 Created attachment 98923 [details]
Test to ensure procps works properly
Removed the patch because of comments from upstream maintainer. >+static void add_task_info(PROCTAB *restrict const PT, >> + const proc_t *restrict const p) >> +{ >> + proc_t *task; >> + for (;;) >> + { >> + task = readtask(PT, p, NULL); >> + if (task == NULL) >> + break; >> + p->utime += task->utime; >> + p->stime += task->stime; >> + p->utime += task->cutime; >> + p->stime += task->cstime; >> + } >> +} >> + >> ////////////////////////////////////////////////////////////////////////////////// >> // This reads /proc/*/task/* data, for one task. >> // p is the POSIX process (task group summary) (not needed by THIS implementation) >> --- procps-3.2.0/proc/readproc.h~ 2004-02-01 18:22:39.000000000 -0500 >> +++ procps-3.2.0/proc/readproc.h 2004-03-25 15:30:56.539717488 -0500 >> @@ -212,6 +212,9 @@ >> //fill out a proc_t for a single task >> extern proc_t * get_proc_stats(pid_t pid, proc_t *p); >> >> +//add the task times to our process time >> +extern void add_task_info(PROCTAB *restrict const PT, const proc_t *restrict const p); >> + >> // openproc/readproctab: >> // >> // Return PROCTAB* / *proc_t[] or NULL on error ((probably) "/proc" cannot be First, "static void" and "extern void" don't match. Second, you completely broke thread support. Third, performance has surely gotten worse. I can see what you're trying to do, but this really needs kernel support. For that matter, %CPU on Linux has never worked. The BSD and SysV kernels maintain a per-process decaying average; Linux only does this for system load. I'll attach a copy of the procps test suite. Untar it over the procps source, run "make" at the top level, then cd to the test directory and run "./runtests". Here are the failing tests: FAIL ps/thread-nosort-H err-1 FAIL ps/thread-nosort-HL err-1 FAIL ps/thread-nosort-L err-1 FAIL ps/thread-nosort-m eval-fn FAIL ps/thread-nosort-mL eval-fn FAIL ps/thread-sorted-H err-1 FAIL ps/thread-sorted-HL err-1 FAIL ps/thread-sorted-L err-1 FAIL ps/thread-sorted-m eval-fn FAIL ps/thread-sorted-mL eval-fn >+static void add_task_info(PROCTAB *restrict const PT, >> + const proc_t *restrict const p) >> +{ >> + proc_t *task; >> + for (;;) >> + { >> + task = readtask(PT, p, NULL); >> + if (task == NULL) >> + break; >> + p->utime += task->utime; >> + p->stime += task->stime; >> + p->utime += task->cutime; >> + p->stime += task->cstime; >> + } >> +} In addition to the problems listed in my other email: 4. huge memory leak 5. double-counting of the first task Created attachment 98924 [details]
Test cases for procps.
I'll outline a solution below. While non-kernel hacks
are possible, they are unacceptably slow -- they nearly
wipe out the whole reason for having threads in task
directories. If I accept a user-space hack, there will
be far less motivation for fixing the kernel.
Here's part of an email I sent to somebody who was
interested in solving this problem. He had plenty of
code experience, but no kernel experience and little time.
I think these instructions should be simple to follow.
The only concerns are locking and cache line bouncing.
Aside from that, the code change is trivial.
-----------------------------------------------------------
> If you think there is something that I can do to help you in a 1 to 2
> day time frame, let me know. I will work on it at night and over the
> weekend.
No, but you can produce a 90%-correct hack for yourself
in a day. This involves the kernel source, not procps.
In fs/proc/base.c find the proc_pident_lookup function.
Change this:
case PROC_TID_STAT:
case PROC_TGID_STAT:
inode->i_fop = &proc_info_file_operations;
ei->op.proc_read = proc_pid_stat;
break;
Into this:
case PROC_TID_STAT:
inode->i_fop = &proc_info_file_operations;
ei->op.proc_read = proc_tid_stat;
break;
case PROC_TGID_STAT:
inode->i_fop = &proc_info_file_operations;
ei->op.proc_read = proc_tgid_stat;
break;
In fs/proc/array.c, copy proc_pid_stat to make a second
function. Name one copy proc_tid_stat, and the other
copy proc_tgid_stat. There's also an extern declaration
that you need to duplicate in the fs/proc/base.c file.
Use grep to see if I missed anything.
Now you have separate per-process and per-thread functions,
but they show the same thing. Modify the per-process (tgid)
one to take CPU usage data from some new variables. You can
put these variables in the task_struct with the others,
causing non-leader tasks to have some wasted space. The
fancy solution would involve the signal struct, but that
requires different locking.
That leaves only one thing left. You need to insure that
your new struct members get updated. Find the places where
this happens for the per-thread variables. Either use the
group_leader pointer for this (in case you placed the new
data in the task_stuct of the leader) or... you're on your
own for using the signal struct.
There. Done, except for some rare glitches when multiple
threads try to update the counter at the exact same moment.
*** Bug 120460 has been marked as a duplicate of this bug. *** Created attachment 103381 [details]
kernel-side fix
This should apply to a very recent kernel
from Linus's BitKeeper tree, and maybe to
the upcoming 2.6.9 release.
There may need to be an additional small
adjustment to the code that updates the
values on the clock tick, to make sure that
the per-process ones are done then instead
of on thread death.
the 2.6.10 kernel has a fix I'm running 2.6.11-1.14_FC3 and I still don't see per-thread CPU usage in top. Am I doing something wrong? David, the problem wasn't in "per-thread CPU usage" in the top, but the problem was in the kernel <2.6.10. It doesn't include thread specific data into sum of information about process ("parent" of threads). It means thread specific CPU usage was hidden for top or ps. The top from procps <= 3.2.5 doesn't display per-thread (/proc/<pid>/tash/<tid>/*) data. It displays information about main process ("parent" of threads) only. The support for separate tasks is new in procps-3.2.6. And it will be available in FC5. |