Red Hat Bugzilla – Bug 336281
Panic and Oops immediately at kernel initialization (x86_64)
Last modified: 2007-11-30 17:12:18 EST
Description of problem:
On an AMD Solo system, in attempting to boot the current Fedora 7 kernel update
"kernel-220.127.116.11-91.fc7", the x86_64 kernel panics right at the beginning of
kernel startup immediately, and dumps an Oops on the screen. I'm currently
using the 2.6.20-2936.fc7xen kernel as a workaround for the time being.
Version-Release number of selected component (if applicable):
How reproducible: 100%
Steps to Reproduce:
1. Have a fully up to date Fedora 7 x86_64 system running on AMD Solo hardware.
2. Boot to standard non-Xen kernel, and get kernel panic and Oops report.
This is an original AMD Solo system with an Athlon 64 3000+ (1.6GHz) which boots
the above mentioned Xen kernel ok, as well as some previous OS releases. I've
taken a digital picture of the Oops, and will attach it and any other
potentially useful info I can think of below.
Please let me know if there are any kernel commandline options, etc. which might
be useful either as a temporary workaround, or to aide in further diagnosis, and
I'll be happy to try and help track down the cause.
Created attachment 229931 [details]
Kernel Oops digipic
Created attachment 230211 [details]
lspci -v output
Same problem occurs with the kernel-18.104.22.168-4.fc7 from Fedora 7 updates-testing.
Booting 22.214.171.124-4.fc7 from F7 updates-testing with the "agp=off" option seems
to work around the problem also. Would be nice to have working AGP though. ;o)
[root@hammer RPMS]# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 4
model name : AMD Athlon(tm) 64 Processor 3000+
stepping : 0
cpu MHz : 1595.454
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow up
bogomips : 3193.44
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts ttp
Update: Additional workaround that might be useful to anyone else having this
problem, so you can have your DRI and eat it too... ;oP
Booting with agp=off has the unfortunate side effect of making DRI go bye-bye by
default, however with the Xorg "radeon" driver, you can force it to use pcigart
instead of expecting AGP to be available. Pcigart is disabled by default
nowadays still, because historically it wasn't very stable on many PCI Radeon
card+motherboard combinations, but it can be manually enabled by editing
xorg.conf and adding to the Device section for Radeon:
Option "BusType" "PCI"
When the X server is restarted, DRI should now be enabled via pcigart. To
confirm DRI is working, run:
[mharris@hammer ~]$ glxinfo |grep renderer
OpenGL renderer string: Mesa DRI R300 20060815 TCL
That shows DRI is working. Then check the X server log file to see that PCI is
being used instead of AGP:
(II) RADEON(0): AGP card detected
(**) RADEON(0): Forced into PCI mode
(II) RADEON(0): [drm] DRM interface version 1.3
(II) RADEON(0): [drm] created "radeon" driver at busid "pci:0000:01:00.0"
(II) RADEON(0): [drm] added 8192 byte SAREA at 0x1efff000
(II) RADEON(0): [drm] mapped SAREA 0x1efff000 to 0x2aaab50d2000
(II) RADEON(0): [drm] framebuffer handle = 0xc0000000
(II) RADEON(0): [drm] added 1 reserved context for kernel
(II) RADEON(0): [pci] 8192 kB allocated with handle 0x0039c200
(II) RADEON(0): [pci] ring handle = 0x2efff000
(II) RADEON(0): [pci] Ring mapped at 0x2aaab50d4000
(II) RADEON(0): [pci] Ring contents 0x00000000
(II) RADEON(0): [pci] ring read ptr handle = 0x1f000000
(II) RADEON(0): [pci] Ring read ptr mapped at 0x2aaab51d5000
(II) RADEON(0): [pci] Ring read ptr contents 0x00000000
(II) RADEON(0): [pci] vertex/indirect buffers handle = 0x2f000000
(II) RADEON(0): [pci] Vertex/indirect buffers mapped at 0x2aaab51d6000
(II) RADEON(0): [pci] Vertex/indirect buffers contents 0x00000000
(II) RADEON(0): [pci] GART texture map handle = 0x2f001000
(II) RADEON(0): [pci] GART Texture map mapped at 0x2aaab53d6000
That shows PCI gart is being used. I must admit that I am kindof surprised it
actually works still nowadays, but it does. I've been running for about an hour
with no stability problems, and I gave it a good bashing with an OpenGL first
person shooter game for a good 20 minutes. I have not tested it with
desktop-effects as I'm in KDE.
Anyhow, I thought I'd share this workaround here as most people are probably
unfamiliar with forcing the Radeon driver to use pcigart mode for DRI when AGP
is flaky or not available for some reason.
338551 has a similar sounding problem, with the curious workaround that running
the debug build seems to make the problem go away.
Would be an interesting datapoint to know if the kernel-debug in F7 makes the
problem go away for you too.
This really looks the same as bug 249174. Quite a bit of information
can be found in comments there.
A hack of booting with 'mem=510M' (there is 512M on a test machine),
as described in comments to bug 249174, works for me still with
kernel-126.96.36.199-4.fc7; although it stopped to work with rawhide
kernels 188.8.131.52-11.fc8 and 184.108.40.206-26.fc8 - as noted in bug 338551.
I'd say this is definitely a dupe. Do you have a VIA chipset by any chance? Me
and everyone else on 249174 and 338551 seem to have VIA chipsets.
> Do you have a VIA chipset ...
If this was a question to Mike Harris then an output from lspci
is in an attachment to comment #2. This does not look like VIA.
Even without VIA I think this is probably a dupe of bug 249174. Try the patch or
*** This bug has been marked as a duplicate of 249174 ***
This system is an AMD Solo motherboard - AMD chipset.
This bug is filed against Fedora 7, but closed as a dupe of a Fedora devel
(Fedora 8) bug. I assume whatever the bug fix is determined to be, will end up
making it to the F7 kernel also, but I thought I'd point it out here, in case
you wanted to track it separately for F7 also.
I'll followup in the master bug for now.
Upgraded to kernel-220.127.116.11-10.fc7 and it crashes on startup right away also,
only with a very different Oops message. It's too big to fit on the screen and
scrolls off the top. I rebooted 3 times and got slightly different oops, so it
seems there is some randomness to it. ;o/
I was able to take 3 pictures, and will upload them soonish. The new Oops could
very well be a totally different and possibly unrelated issue as well. I can
file a separate bug report for it if desired, just let me know.
Fix is not released yet, setting status to modified.
Updating just so there is full history here...
I updated to kernel-18.104.22.168-17.fc7 and the problem has now vanished, and I am
able to boot up with AGP enabled, and have DRI working in X with AGP. I haven't
done any extensive testing yet, but it appears on the surface that this problem
is now resolved.
I'll be testing the -19 kernel next also.