Red Hat Bugzilla – Bug 166437
Frequent SMP kernel lockups with Athlon X2
Last modified: 2015-01-04 17:21:36 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6
Description of problem:
New Athlon X2 system runs stably under uniprocessor kernel but experiences random SMP kernel failures (lockups) after roughly 5-10 minutes of activity. Reset switch is the only remedy.
Related bugs suspect powernow-k8, asus-acpi, etc. but I don't see a pattern yet.
This ASUS A8N-SLI system uses nVidia nForce 4 SLI chipset and a 7800GTX PCI-e video card. Whether SMP or not, the kernel complains about aperture too small (suggests to enable IOMMU 64MB in BIOS) and reports a BIOS error (no PSB nor ACPI _PSS objects).
Version-Release number of selected component (if applicable):
There is one potential fix for powernow-k8 in the latest errata kernel currently
in updates-testing. Give that a try.
I'll try, but my current SMP kernel (kernel-smp-2.6.12-1.1398_FC4) locks up even
after disabling the cpuspeed service and rebooting. Lockups seem related to
system load, but the kernel doesn't provide any information.
FWIW, I went straight to kernel-smp-2.6.12-1.1435_FC4 in updates-testing when I
swapped in my new ASUS A8N-SLI Premium (BIOS 1005) and X2 3800+ CPU yesterday
and it's stable, but power management doesn't work at all:
powernow-k8: MP systems not supported by PSB BIOS structure
powernow-k8: MP systems not supported by PSB BIOS structure
Interestingly, my APC UPS load meter says that system power consumption while
idle isn't much higher than my previous Athlon 64 3000+ with working power
The only problem is that Sun Java 1.5.0_04 crashes, which it didn't do on the
1398 uniprocessor kernel. That could be a Sun bug though. Everything else
seems to work.
The issue on my system is a BIOS setting which moves the memory block usually
obscured by PCI above the 4GB mark, assuming that the OS supports PAE. The BIOS
does this in SW, and on E0+ revision CPUs, also in HW. The benefit is that all
4GB can be used despite the PCI hole.
My Linux kernel seems to be OK with this in uniprocessor mode but not in SMP
mode. There may be a memory setup bug in the SMP kernel. BTW, is
CONFIG_HIGHMEM64G gone in x86-64 kernels?
Stream benchmark uses a large static array. Arrays of 114 MB and smaller don't
lockup my SMP kernel. Arrays 229 MB and larger do. Turning off memory
remapping in BIOS restores proper operation.
This BIOS change can only be a temorporary fix, because instead of 4GB, the OS
now gets only 3GB. Linux SMP memory handling needs to be fixed to allow BIOS
memory remapping so that full 4GB is usable.
Interesting. I only have 1GB RAM and I bet you have more? I remember seeing
that remapping BIOS setting and I'm pretty sure it defaulted to On.
I just installed kernel 1447. Power management still doesn't work but Java
doesn't blow up anymore.
Kernel 1447 doesn't help -- the SMP kernel still locks up when running a 229MB
stream benchmark with memory remapping enabled in BIOS (to see all 4GB of RAM).
With memory remapping disabled in BIOS, the SMP kernel doesn't lock up, but
only 3GB (out of installed 4GB) are visible.
This affects only the SMP kernel. The uniprocessor kernel runs fine with memory
remapping enabled in BIOS and uses all 4GB.
This bug affects only Fedora kernels. Freshly compiled kernel 184.108.40.206 from
kernel.org runs fine in SMP mode and sees all 4GB of RAM with memory remapping
enabled in BIOS.
Could this be some kind of Fedora-specific SMP kernel configuration bug?
> Freshly compiled kernel 220.127.116.11 from kernel.org
What if you build it using Fedora's /boot/config-VERSION?
Kernel 18.104.22.168 from kernel.org built with Fedora's
/boot/config-2.6.12-1.1456_FC4smp runs OK, but of course "make oldconfig" had to
add a number of new configuration options. I took the default on all of them,
including memory related ones (*_DISCONTIGMEM_*).
Hypothesis: Fedora's 2.6.12-1.1456_FC4smp kernel had a memory handling bug and
should be updated (e.g. to 22.214.171.124 base from kernel.org), but the configuration
you're in luck. 2.6.13-1.1526_FC4 just got pushed out, which is based on 126.96.36.199
please give it a try, and let me know if that works.
Will do. Meanwhile, kernel 2.6.14-rc2 from kernel.org *again* has the lockup
problem. This may be an IOMMU issue. With 188.8.131.52, I see (note that this
system doesn't have AGP, it's PCI-Express only, and there are no IOMMU options
in its BIOS):
Linux version 184.108.40.206_FC4smp [...]
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
BIOS-e820: 00000000bfff0000 - 00000000bfff3000 (ACPI NVS)
BIOS-e820: 00000000bfff3000 - 00000000c0000000 (ACPI data)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
CPU 0: aperture @ 1a60000000 size 32 MB
Aperture from northbridge cpu 0 too small (32 MB)
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 8000000
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ 8000000 size 65536 KB
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
...and the above works fine. Switching to kernel 2.6.14-rc2, I get a total
lockup about 30 seconds after boot when PCI memory remapping is enabled in BIOS.
This kernel boots only if BIOS remapping is disabled, but then I see only 3GB
(not 4GB) of RAM. The IOMMU related messages become:
Linux version 2.6.14-rc2_smp
PCI-DMA: Disabling IOMMU.
Note that CONFIG_GART_IOMMU=y was used in both kernels (the second kernel had
fewer configuration options turned on).
ASUS BIOS 1007 doesn't have any IOMMU options, but the first example used PCI
memory remapping, while the second one crashed unless remapping was off. I
believe that AMD's IOMMU is needed to see full 4GB, so the fact that it is
disabled in the second example is bad news.
2.6.14rc isn't exactly in fantastic shape right now, so it doesn't surprise me
you hit problems with it. For the sake of this bug though, lets focus on the
Fedora kernel alone.
btw, it'd be great if you could report that bug upstream to http://bugme.osdl.org
let me know how things go with the errata kernel.
The latest kernel-smp-2.6.13-1.1526_FC4 can run 2GB+ stream tests and can see
4GB of RAM when hardware memory remapping is enabled in BIOS.
I suggest closing this bug for now.
This bug is back: the latest kernel-smp-2.6.14-1.1637_FC4 crashes on boot after
failing to find the IOMMU.
FYI, both kernel-smp-2.6.13-1.1532_FC4 and kernel-smp-2.6.14-1.1637_FC4 worked
fine. This was broken in 2.6.14 kernels...
Given that dual core Athlons and ASUS motherboards are now popular, and that
many users can afford 4GB of RAM, this "crash-on-boot" phenomenon is a critical
One correction: The "FYI" line in my previous message needs to be fixed.
Only the 2.6.13-based kernels (e.g. kernel-smp-2.6.13-1.1532_FC4) worked fine.
All 2.6.14-based kernels (e.g. kernel-smp-2.6.14-1.1637_FC4) are broken and fail
Hint: http://lkml.org/lkml/2005/11/6/54 suggests booting with "iommu=soft
swiotlb=65536" (which I haven't tried yet). Also, bug #169115 may be a
duplicate of this bug.
While "iommu=soft swiotlb=65536" works, it may be better to use "pci=nommconf"
as recommended by Andi Kleen at http://bugzilla.kernel.org/show_bug.cgi?id=5343
-- the problem he reports is that the MCFG table provided by ACPI BIOS is broken
and needs a fix. He is developing a workaround.
The MCFG table describes the memory mapped PCI configuration space, which is
required for the MMCONF form of access to devices on the PCI-Express bus
(otherwise, one must address them through BIOS or directly).
Anyway, using the boot line option "pci=nommconf" works for me, and I'll use
this until Andi's workaround arrives.
This is a mass-update to all currently open kernel bugs.
A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
Closing per previous comment.
I believe I'm having this same issue using kernel version
kernel-2.6.17-1.2139_FC5. I have an Athlon X2 processor and an ASUS A8N-SLI
motherboard, and I'm having the same problems described above. I found this
blog entry when I googled the problem:
This suggests to me that I'm not the only one who has had this problem since the
bug was closed. I suggest reopening the bug.
P.S. In spite of what the blog entry above suggests, I continued to have kernel
panics even when only software remapping was enabled.
Incidentally, I tried booting using both the kernel parameters "iommu=soft
swiotlb=65536" and "pci=nommconf," and neither workaround proved to be
satisfactory. (My system didn't lock up, but it was clearly unstable. I kept
having applications crash for no apparent reason.) The only way I can get my
system to boot and operate normally is to disable both software and hardware
remapping in the BIOS (and thereby lose access to 1 GB of memory).
Are there any plans to reopen this bug? If not, should I file a new bug report?
I hate to knowingly file a duplicate report, but I don't know what else to do.
As I noted above, I'm still having this problem using the latest release of the
kernel, and neither of the workarounds suggested above resolved the problem.
Just thought I'd update. Still having problems with this on FC5
(2.6.17-1.2174). Tried stock kernel (220.127.116.11), same problem.
Have tried passing (grub) options:
kernel /vmlinuz-18.104.22.168 ro root=/dev/VolGroup00/LogVol00 rhgb quiet acpi=no
I have also tried disabling apic support. No luck.
This is an ABIT AV8 mb, VIA K8T800 Pro/VT8237, Athlon X2 4400+, 4GB
DDR400(running at 333). Frequency is every 1-2 days, typically under high load.
I had the same experience of kernel-smp-2.6.13-1.1532_FC4 running fine.