Description of problem: My laptop was booted about 2 days 15 hours ago and since then has been in APM suspend mode for three overnight periods totalling about 30 hours 16 minutes (so it has actually been running for about 33 hours). Now the process start time reported by "ps" for new processes lags behind about 30 hours 16 minutes. The system date is set correctly. The output for process 1 is also displayed correctly. The command "uptime" reports the real time elapsed since the machine booted. On an old laptop still running 2.2.19 the reverse happens: newly started processes have the correct start time, while old processes are reported to have started later than they actually were. The "uptime" command reports the time since the machine booted excluding time spent in suspend mode. This behaviour is "better" because I prefer to see correct info for the currently running processes on the system whereas for old processes the exact start time often doesn't matter. In an ideal world, all processes would be reported correctly. Version-Release number of selected component (if applicable): kernel-2.6.6-1.435 procps-3.2.0-1.1 How reproducible: Always Steps to Reproduce: 1. Boot Fedora Core 2 on laptop (acpi=off) 2. Place laptop in APM suspend for several hours 3. sh -c 'date;ps -o lstart $$' Actual results: Tue Jul 6 12:54:11 BST 2004 STARTED Mon Jul 5 06:38:16 2004 Expected results: The two timestamps shown should be more or less identical. Additional info: May or may not be related to bug 125372.
Ooooh, that's a nasty one! I suppose that the kernel should update jiffies when the laptop wakes up. This would solve many problems. Ensure that the time counts toward %idle I suppose. I don't see how procps could handle this alone or even help, so this is another mis-filed kernel problem.
procps could work around this by finding out the current jiffy count and working backwards (rather than working forwards from boot time), which I assume is what it did on the older box. But yes, it's probably better fixed in the kernel.
I'm not sure that this is related, but the bug title matches my problem, so.... I'm having a problem where ps shows process creation times that differ from the actual process start time. On some machines, the time shown by ps lags the actual time by a few minutes. On other machines, ps shows times in the future. (!) Here's a dual Xeon system running FC2 2.6.6-1.435.2.3smp: > date; ps aux|tail -2; date Wed Jul 28 20:28:53 CDT 2004 tibbs 1519 0.0 0.0 3772 876 pts/11 R 20:31 0:00 ps aux tibbs 1520 0.0 0.0 4584 488 pts/11 S 20:31 0:00 tail -2 Wed Jul 28 20:28:53 CDT 2004 Here's a dual Opteron system running FC2-x86_64 2.6.6-1.435.2.3smp: > date; ps aux|tail -2; date Wed Jul 28 20:28:28 CDT 2004 tibbs 12341 0.0 0.0 7308 912 pts/65 R 20:26 0:00 ps aux tibbs 12342 0.0 0.0 5544 360 pts/65 R 20:26 0:00 tail -2 Wed Jul 28 20:28:28 CDT 2004 I'm not even sure if this is anything to worry about. I can't find any sign of this problem on any machine running FC1, but I don't have any FC1 machines running identical hardware.
Jason Tibbits has a mostly unrelated problem. It's caused by recent kernels having two ideas of the HZ value. (example: 1000 and 999.98703) I think the -mm tree has a fix for this, which involves using a real (non-jiffies) timestamp for process start times. The problem is also fixable by having the HZ to USER_HZ conversion handle the non-integer nature of the jiffies tick frequency. Fixing problems of this sort is like playing a whack-a-mole game, and has been ever since the POSIX timers code intruduced sub-jiffie timing and HZ not being really HZ.
Pavel Machek posted a patch to linux-kernel for this. ----------------------------------------- From: Pavel Machek (pavel) Subject: swsusp: fix process start times after resume Newsgroups: linux.kernel Date: 2004-10-04 06:30:15 PST Hi! Currently, process start times change after swsusp (because they are derived from jiffies and current time, oops). This should fix it. Please apply, Pavel Index: linux/arch/i386/kernel/time.c =================================================================== --- linux.orig/arch/i386/kernel/time.c 2004-10-01 12:24:26.000000000 +0200 +++ linux/arch/i386/kernel/time.c 2004-10-01 00:53:07.000000000 +0200 @@ -319,7 +319,7 @@ return retval; } -static long clock_cmos_diff; +static long clock_cmos_diff, sleep_start; static int time_suspend(struct sys_device *dev, u32 state) { @@ -328,6 +328,7 @@ */ clock_cmos_diff = -get_cmos_time(); clock_cmos_diff += get_seconds(); + sleep_start = get_cmos_time(); return 0; } @@ -335,10 +336,13 @@ { unsigned long flags; unsigned long sec = get_cmos_time() + clock_cmos_diff; + unsigned long sleep_length = get_cmos_time() - sleep_start; + write_seqlock_irqsave(&xtime_lock, flags); xtime.tv_sec = sec; xtime.tv_nsec = 0; write_sequnlock_irqrestore(&xtime_lock, flags); + jiffies += sleep_length * HZ; return 0; } -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
Regarding comment #2 by Ian Collier: The procps code is unchanged. If it were changed, it would start to fail for 2.4.xx kernels. Also, fixing this problem would lead to other problems. The times are simply inconsistant. Process start time and process lifetime ought to both work OK, along with %idle, uptime, boot time, and so on.
is this problem any better with the latest updates ?
Yes. Actually, now you mention it, it seems to have been fixed in 2.6.9. (APM is a lot healthier in 2.6.10 too, but that's another matter.)
Strike that last remark from the record. :-( "ps" seems fine though; you can close this.
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.