From Bugzilla Helper: User-Agent: Mozilla/4.79 [en] (X11; U; SunOS 5.8 sun4u) Description of problem: Incorrect date causes automounter to stop working. At that time if you type "date" you will see the date of: Wed Aug 4 09:01:29 PST 586562 The month and day will change as our calendar day moves forward, but the year is always consistent with the reported one above. If you look in /var/log/messages, you will see the time stamp of the time above. However, if you run hwclock, the date is correct. The correct date of the machine is: Tue Feb 3 09:01:58 PST 2004 The reported date of the machine is: Wed Aug 4 09:01:29 PST 586562 Version-Release number of selected component (if applicable): coreutils-4.5.3-26 How reproducible: Sometimes Steps to Reproduce: 1. See the automounter no longer mounts NFS mount points 2. Run the "/bin/date" command and you will see the skewed date 3. Run the "/sbin/hwclock" command to verify correct date Additional info:
If it is causing other problems, the error isn't in 'date' but in the kernel's time-keeping.
Could you tell me what platform(s) this problem is showing up on, and attach boot logs from these system(s)? I suspect that the problem is one that we saw some time ago with TSC timer handling and that we thought we had fixed. The reason we may not have tripped over it until now is that nearly all current AMD64 systems have HPET timers, which have been working correctly (and far more accurately) as far as we know. If the RHEL kernel is not recognizing the HPET timer on your system, I want to know why :) If it turns out that the TSC bug is still not fixed, then getting the HPET working will solve your immediate problem. If it turns out to be a problem in HPET handling (unlikely) then we'll have to address that ASAP; in either case console logs will help enormously in finding out exactly what's going on.
After reviewing some logs sent to me by Mentor: It's interesting that you're seeing the same thing with both SLES8 and RHEL3... that suggests a rather fundamental problem. When I look at roach.log, one thing jumps out at me: with only a couple of exceptions, every time ntpdate runs it believes that the clock is slow by either 4293 seconds (71 min. 33 sec) or about 140,466,900 seconds (1625 days, 18 hours, 16 min, 40 sec). The quantum jumps up into the 587th century are occurring because somehow the pattern of the 4293-second update is showing up in the *high-order* longword of the 64-bit time_t. For any dates that we mortals might see in our lifetimes, the high-order longword should always be zero. I'm curious as to why the date seems to be reported as so consistently off, even with updates. I notice that the strange jumps occur right when ntpdate is run, too. This suggests to me that there might be an issue with *updating* the date, rather than timekeeping per se. Frequent updating of the date along with lots of reading thereof might trigger a latent race condition. I'll keep investigating.
Update from customer contact: booting with the "notsc" parameter so as to use only the HPET timer seems to be a good workaround; stability testing is continuing. I still want to get to the bottom of this, though.
A work-around for this problem has just been committed to the RHEL 3 U2 patch pool (in the internal Engineering build of kernel version 2.4.21-9.17.EL). I'll leave this bug in "assigned" state so that Jim can pursue a final resolution in the future.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-188.html