Bug 324081
Summary: | Intermittent infinite loop in init_cacheinfo on Intel Pentium D | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Steve Mead <steve.mead> | ||||||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 7 | CC: | drepper, hongjiu.lu | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-10-16 11:38:49 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Steve Mead
2007-10-09 00:27:15 UTC
Created attachment 220381 [details]
Contents of /proc/cpuinfo
Can you please run attached program? Created attachment 220771 [details]
cacheinfo.c
After running the program multiple times, I see three distinct sets of output. The most common outputs are: shared 1048576 level 2 max_cpuid 5 cpuid (4, 0) = 04000121 cpuid (4, 1) = 04000143 And: shared 1048576 level 2 max_cpuid 3 Twice it has printed out the following, where the second number inside the parentheses keeps incrementing until I hit CTRL-C: shared 1048576 level 2 max_cpuid 5 cpuid (4, 0) = 04000121 cpuid (4, 1) = 00000000 cpuid (4, 2) = 00000000 cpuid (4, 3) = 00000000 ... My suspicion is that this is a CPU bug. The program cannot really print different values in different runs. The only possible variants in the code is the content of the other registers. But that (except eax and ecx) should be irrelevant. Especially the return of max_cpuid == 3 is very suspicious. I've asked Intel to look at this. In upstream glibc I've implemented a work-around. Looking at the /proc/cpuinfo dump, CPU 0 has cpuid level : 3 and no physical id/siblings/core id/cpu cores lines. So it makes sense, sometimes you get shared 1048576 level 2 max_cpuid 3 when the process is scheduled on CPU 0, or shared 1048576 level 2 max_cpuid 5 when the process is scheduled on CPU 1, and then if rescheduled between that and the following loop it can loop forever. Wonder if max_cpuid can be e.g. tweaked in the BIOS and the BIOS wrongly tweaks it only for the boot CPU and not the other CPUs, or of course it can be a CPU bug. Certainly having different CPUs different cpuid levels means cpuid insn is completely unusable in userland, unless the process is pinned just to one CPU (but that's very much undesirable for libc initialization). What does dmidecode report? Created attachment 227861 [details]
Output of dmidecode
You have BIOS A01. Can you try BIOS A03 at http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R129670&SystemID=DIM_P4_9100&servicetag=&os=WW1&osl=en&deviceid=308&devlib=0&typecnt=0&vercnt=3&catid=-1&impid=-1&formatcnt=1&libid=1&fileid=172852 The BIOS upgrade seems to have fixed the problem. Now /proc/cpuinfo shows both cores with a cpuid level of 5, and the cacheinfo.c program reliably gives output of: shared 1048576 level 2 max_cpuid 5 cpuid (4, 0) = 04000121 cpuid (4, 1) = 04000143 I also ran a few stress tests and didn't see any hung processes. Thank you all for your help, and I'm sorry it was something as mundane as an old BIOS. Closing. rawhide glibc has workaround in case other people have buggy BIOS, though of course best would be if people upgrade their BIOSes to fixed ones. may I pick rawhide glibc for testing ? on F7 I mean ?? |