Bug 632236

Summary: confused old_Hertz_hack - Unknown HZ value! (93) Assume 1024
Product: [Fedora] Fedora Reporter: Yanko Kaneti <yaneti>
Component: procpsAssignee: Jan Görig <jgorig>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: anton, aquini, awilliam, cyrusyzgtt, dougsland, evan, gansalmon, genes1122, h1k6zn2m, itamar, jgorig, jonathan, kernel-maint, madhu.chinakonda, maurizio.antillon, mcepl, mcepl, misek, mschmidt, notting, nphilipp, orion, stephent98, tomek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: procps-3.2.8-12.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-09-23 15:03:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ps strace none

Description Yanko Kaneti 2010-09-09 13:31:18 UTC
Created attachment 446248 [details]
ps strace

Description of problem:
"Unknown HZ value! (93) Assume 1024" on some procps commands (ps,uptime..)

93 is sometimes 85

Version-Release number of selected component (if applicable):
procps-3.2.8-11.fc14.x86_64
kernel-2.6.36-0.18.rc3.git1.fc15.x86_64

Attaching the strace of the issue which has the full contents of /proc/stat and /proc/utpime at that time  , as the uptime increased the problem went away.

Comment 1 Vaclav "sHINOBI" Misek 2010-09-10 18:39:12 UTC
I can see: Unknown HZ value! (94) Assume 1024.

Unknown HZ value! (94) Assume 1024.

I think it appeared with procps-3.2.8-11.fc14.x86_64. I'm using latest Fedora 14.

Comment 2 Evan Klitzke 2010-09-10 20:03:43 UTC
I found this, which seems to describe the same problem: http://lkml.indiana.edu/hypermail/linux/kernel/0202.2/0403.html

It's an awfully old thread though, so I'm not sure whether it's really the same issue.

Comment 3 Adam Williamson 2010-09-12 19:20:36 UTC
I suspect this is more likely a kernel issue than a procps issue, re-assigning. dmesg output may be useful, and testing with 'nohz=off' and 'clocksource=acpi_pm' parameters. Do you see any problems that seem to be related to the message or is it just an odd message you noticed and thought to report?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 4 Vaclav "sHINOBI" Misek 2010-09-12 21:02:47 UTC
I'm not sure if it's a kernel problem, maybe yes, but maybe new procps just made it visible. I'm using 2.6.35.4-12.fc14.x86_64 and there were no such messages after installing this kernel.

Comment 5 Adam Williamson 2010-09-12 22:01:41 UTC

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 6 Evan Klitzke 2010-09-13 03:14:11 UTC
I'm in the habit of shutting off my computer every night, and booting up in the morning, so I'm confident that this message happened as a result of a procps upgrade, not a kernel upgrade (I'm on 2.6.35.4-12.fc14.x86_64, fwiw). AFAICT, this is just an annoying message.

The default ~/.bashrc sources /etc/bashrc which ends up invoking ps at some point; that means that whenever a start up a new terminal, I see this error message printed out as the first thing in the terminal. Other than that minor annoyance, I haven't actually observed any real problems.

Comment 7 Adam Williamson 2010-09-13 05:47:35 UTC
the fact that it arrived with a procps update doesn't mean it's a 'bug' in procps; it could just be a new informational message that it didn't print before. Given the content, which is talking about kernel timer ticks as I mentioned, I'm still pretty sure it has something to do with the kernel; it'd  be nice if a kernel dev could look in :) Chuck?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 8 Jonathan Kamens 2010-09-13 17:57:39 UTC
(In reply to comment #3)
> I suspect this is more likely a kernel issue than a procps issue, re-assigning.
> dmesg output may be useful, and testing with 'nohz=off' and
> 'clocksource=acpi_pm' parameters. Do you see any problems that seem to be
> related to the message or is it just an odd message you noticed and thought to
> report?

Resetting needinfo? flag from comment #3, since Adam hasn't been answered yet.

Comment 9 Jonathan Kamens 2010-09-13 17:58:15 UTC
Interestingly, the message seems to go away when my machine has been up for a day or so.

Comment 10 Chuck Ebbert 2010-09-14 15:56:16 UTC
I found this, a fix for that problem from Aug 2008; it says it will be in the next procps but it's not. And it's not in ours either.

http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg553706.html

Comment 11 Chuck Ebbert 2010-09-14 16:05:31 UTC
I'm still not sure what's going on there. Is it failing to find the AT_CLKTCK ELF note and falling back to the old hack? Or is it finding it with some strange value?

Comment 12 Michal Schmidt 2010-09-17 12:54:46 UTC
From procps's proc/sysinfo.c:init_libproc():

static void init_libproc(void){
...
  if(linux_version_code > LINUX_VERSION(2, 4, 0)){ 
    Hertz = find_elf_note(AT_CLKTCK);
    if(Hertz!=NOTE_NOT_FOUND) return;
    fputs("2.4+ kernel w/o ELF notes? -- report this\n", stderr);
  }
  old_Hertz_hack();
}

If it did not detect the ELF note, it would print the message "2.4+ kernel w/o ELF notes? -- report this\n".
So the linux_version_code test must have failed. How could it fail? The function is not called explicitely, it is just declated as a constructor:

static void init_libproc(void) __attribute__((constructor));

And so is the function which sets linux_version_code:

static void init_Linux_version(void) __attribute__((constructor));

I bet the constructors are called in an unexpected (to the author) order.

Really, why depend on constructors instead of plain simple exactly defined function calls?

Comment 13 Gene Snider 2010-09-18 19:52:25 UTC
I'm not sure if this helps or not, but I've never seen this message on kernel-2.6.35.4-12.  I do see it whenever I boot into kernel-2.6.35.4-28, and I'm pretty sure I saw it in the -25 kernel as well.  This is on my F14 laptop.

Gene

Comment 14 Michal Schmidt 2010-09-20 09:50:28 UTC
old_Hertz_hack() is simply buggy. For instance:
 - It does not take iowait time into account, so if you have I/O load,
   it underestimates HZ.
 - It does not account for the possibility of CPUs going offline -
   in such a case it overestimates HZ.

But the main point stands: old_Hertz_hack() should not be called *at all*.
The bug is due to undefined execution order of constructors,
which may change with linking order.
A minimal fix would assign priorities to the constructors for their order
to be defined. An optimal fix would get rid of the usage of constructors.

Comment 15 Michal Schmidt 2010-09-20 10:24:25 UTC
BTW, I filed bug 635607 for make, as I believe the link order change is unexpected in GNU make 3.82.
But procps has to be fixed in any case.

Comment 16 Roman Rakus 2010-09-23 08:48:27 UTC
*** Bug 636738 has been marked as a duplicate of this bug. ***

Comment 17 cyrushmh 2010-09-23 08:59:03 UTC
I found this http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=460331
but my En not good....

Comment 18 Jan Görig 2010-09-23 14:23:21 UTC
Thank you for your comments. This issue should be fixed in procps-3.2.8-12.fc15 by a little modified Debian patch. Could you try it, please? I am not able to reproduce this bug.

Comment 19 Michal Schmidt 2010-09-23 14:55:44 UTC
procps-3.2.8-12.fc15 works fine for me and the patch looks fine.

Comment 20 Jan Görig 2010-09-23 15:03:55 UTC
Ok, closing.

Comment 21 Michal Schmidt 2010-09-23 15:47:19 UTC
It also affects F-14 (procps-3.2.8-11.fc14). Please issue an update. Thanks.

Comment 22 Fedora Update System 2010-09-23 16:41:12 UTC
procps-3.2.8-12.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/procps-3.2.8-12.fc14

Comment 23 Fedora Update System 2010-09-25 05:32:23 UTC
procps-3.2.8-12.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.