Description of problem: the process accounting code uses the current thread data to calculate run time, and eventually the start time (as now - elapsed to allow for time changes) For anything other than the group leader the start_time is not correct. Current code in kernel/acct.c has something like /* calculate run_time in nsec*/ do_posix_clock_monotonic_gettime(&uptime); run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec; run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC + current->start_time.tv_nsec; later kernels change this to /* calculate run_time in nsec*/ do_posix_clock_monotonic_gettime(&uptime); run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec; run_time -= (u64)current->group_leader->start_time.tv_sec * NSEC_PER_SEC + current->group_leader->start_time.tv_nsec;
Well, the subject is not right, ->start_time is initialized correctly for any thread. Yes, later kernels use ->group_leader->start_time. But rhel4 can't do this because it does acct_process() per thread, not per process. And this is just wrong. The real fix should first change do_exit() to call acct_process() only if group_dead is true, this change is obviously correct. Unfortunately, this change is very much user visible, it can confuse the current users of bsd accounting.
Correcting this would imply a too big change in the userspace interface.