Created attachment 446248 [details] ps strace Description of problem: "Unknown HZ value! (93) Assume 1024" on some procps commands (ps,uptime..) 93 is sometimes 85 Version-Release number of selected component (if applicable): procps-3.2.8-11.fc14.x86_64 kernel-2.6.36-0.18.rc3.git1.fc15.x86_64 Attaching the strace of the issue which has the full contents of /proc/stat and /proc/utpime at that time , as the uptime increased the problem went away.
I can see: Unknown HZ value! (94) Assume 1024. Unknown HZ value! (94) Assume 1024. I think it appeared with procps-3.2.8-11.fc14.x86_64. I'm using latest Fedora 14.
I found this, which seems to describe the same problem: http://lkml.indiana.edu/hypermail/linux/kernel/0202.2/0403.html It's an awfully old thread though, so I'm not sure whether it's really the same issue.
I suspect this is more likely a kernel issue than a procps issue, re-assigning. dmesg output may be useful, and testing with 'nohz=off' and 'clocksource=acpi_pm' parameters. Do you see any problems that seem to be related to the message or is it just an odd message you noticed and thought to report? -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
I'm not sure if it's a kernel problem, maybe yes, but maybe new procps just made it visible. I'm using 2.6.35.4-12.fc14.x86_64 and there were no such messages after installing this kernel.
-- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
I'm in the habit of shutting off my computer every night, and booting up in the morning, so I'm confident that this message happened as a result of a procps upgrade, not a kernel upgrade (I'm on 2.6.35.4-12.fc14.x86_64, fwiw). AFAICT, this is just an annoying message. The default ~/.bashrc sources /etc/bashrc which ends up invoking ps at some point; that means that whenever a start up a new terminal, I see this error message printed out as the first thing in the terminal. Other than that minor annoyance, I haven't actually observed any real problems.
the fact that it arrived with a procps update doesn't mean it's a 'bug' in procps; it could just be a new informational message that it didn't print before. Given the content, which is talking about kernel timer ticks as I mentioned, I'm still pretty sure it has something to do with the kernel; it'd be nice if a kernel dev could look in :) Chuck? -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #3) > I suspect this is more likely a kernel issue than a procps issue, re-assigning. > dmesg output may be useful, and testing with 'nohz=off' and > 'clocksource=acpi_pm' parameters. Do you see any problems that seem to be > related to the message or is it just an odd message you noticed and thought to > report? Resetting needinfo? flag from comment #3, since Adam hasn't been answered yet.
Interestingly, the message seems to go away when my machine has been up for a day or so.
I found this, a fix for that problem from Aug 2008; it says it will be in the next procps but it's not. And it's not in ours either. http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg553706.html
I'm still not sure what's going on there. Is it failing to find the AT_CLKTCK ELF note and falling back to the old hack? Or is it finding it with some strange value?
From procps's proc/sysinfo.c:init_libproc(): static void init_libproc(void){ ... if(linux_version_code > LINUX_VERSION(2, 4, 0)){ Hertz = find_elf_note(AT_CLKTCK); if(Hertz!=NOTE_NOT_FOUND) return; fputs("2.4+ kernel w/o ELF notes? -- report this\n", stderr); } old_Hertz_hack(); } If it did not detect the ELF note, it would print the message "2.4+ kernel w/o ELF notes? -- report this\n". So the linux_version_code test must have failed. How could it fail? The function is not called explicitely, it is just declated as a constructor: static void init_libproc(void) __attribute__((constructor)); And so is the function which sets linux_version_code: static void init_Linux_version(void) __attribute__((constructor)); I bet the constructors are called in an unexpected (to the author) order. Really, why depend on constructors instead of plain simple exactly defined function calls?
I'm not sure if this helps or not, but I've never seen this message on kernel-2.6.35.4-12. I do see it whenever I boot into kernel-2.6.35.4-28, and I'm pretty sure I saw it in the -25 kernel as well. This is on my F14 laptop. Gene
old_Hertz_hack() is simply buggy. For instance: - It does not take iowait time into account, so if you have I/O load, it underestimates HZ. - It does not account for the possibility of CPUs going offline - in such a case it overestimates HZ. But the main point stands: old_Hertz_hack() should not be called *at all*. The bug is due to undefined execution order of constructors, which may change with linking order. A minimal fix would assign priorities to the constructors for their order to be defined. An optimal fix would get rid of the usage of constructors.
BTW, I filed bug 635607 for make, as I believe the link order change is unexpected in GNU make 3.82. But procps has to be fixed in any case.
*** Bug 636738 has been marked as a duplicate of this bug. ***
I found this http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=460331 but my En not good....
Thank you for your comments. This issue should be fixed in procps-3.2.8-12.fc15 by a little modified Debian patch. Could you try it, please? I am not able to reproduce this bug.
procps-3.2.8-12.fc15 works fine for me and the patch looks fine.
Ok, closing.
It also affects F-14 (procps-3.2.8-11.fc14). Please issue an update. Thanks.
procps-3.2.8-12.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/procps-3.2.8-12.fc14
procps-3.2.8-12.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.