336281 – Panic and Oops immediately at kernel initialization (x86_64)

Bug 336281 - Panic and Oops immediately at kernel initialization (x86_64)

Summary: Panic and Oops immediately at kernel initialization (x86_64)

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	7
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Dave Airlie
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-17 15:23 UTC by Mike A. Harris
Modified:	2007-11-30 22:12 UTC (History)
CC List:	4 users (show)
Fixed In Version:	2.6.23.1-21.fc7
Clone Of:
Environment:
Last Closed:	2007-11-07 01:21:57 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Kernel Oops digipic (1.30 MB, image/jpeg) 2007-10-17 15:38 UTC, Mike A. Harris	no flags	Details
lspci -v output (4.73 KB, text/plain) 2007-10-17 18:41 UTC, Mike A. Harris	no flags	Details
View All

Description Mike A. Harris 2007-10-17 15:23:19 UTC

Description of problem:
On an AMD Solo system, in attempting to boot the current Fedora 7 kernel update
"kernel-2.6.22.9-91.fc7", the x86_64 kernel panics right at the beginning of
kernel startup immediately, and dumps an Oops on the screen.  I'm currently
using the 2.6.20-2936.fc7xen kernel as a workaround for the time being.

Version-Release number of selected component (if applicable):

kernel-2.6.22.9-91.fc7 x86_64


How reproducible: 100%

Steps to Reproduce:
1. Have a fully up to date Fedora 7 x86_64 system running on AMD Solo hardware.
2. Boot to standard non-Xen kernel, and get kernel panic and Oops report.
  
Additional info:

This is an original AMD Solo system with an Athlon 64 3000+ (1.6GHz) which boots
the above mentioned Xen kernel ok, as well as some previous OS releases.  I've
taken a digital picture of the Oops, and will attach it and any other
potentially useful info I can think of below.

Please let me know if there are any kernel commandline options, etc. which might
be useful either as a temporary workaround, or to aide in further diagnosis, and
I'll be happy to try and help track down the cause.

Comment 1 Mike A. Harris 2007-10-17 15:38:09 UTC

Created attachment 229931 [details]
Kernel Oops digipic

Comment 2 Mike A. Harris 2007-10-17 18:41:22 UTC

Created attachment 230211 [details]
lspci -v output

Comment 3 Mike A. Harris 2007-10-18 20:15:01 UTC

Update:

Same problem occurs with the kernel-2.6.23.1-4.fc7 from Fedora 7 updates-testing.

Comment 4 Mike A. Harris 2007-10-19 20:01:30 UTC

Booting 2.6.33.1-4.fc7 from F7 updates-testing with the "agp=off" option seems
to work around the problem also.  Would be nice to have working AGP though. ;o)

Comment 5 Mike A. Harris 2007-10-20 19:28:28 UTC

[root@hammer RPMS]# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 4
model name      : AMD Athlon(tm) 64 Processor 3000+
stepping        : 0
cpu MHz         : 1595.454
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow up
bogomips        : 3193.44
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp

Comment 6 Mike A. Harris 2007-10-22 08:54:27 UTC

Update:  Additional workaround that might be useful to anyone else having this
problem, so you can have your DRI and eat it too...  ;oP

Booting with agp=off has the unfortunate side effect of making DRI go bye-bye by
default, however with the Xorg "radeon" driver, you can force it to use pcigart
instead of expecting AGP to be available.  Pcigart is disabled by default
nowadays still, because historically it wasn't very stable on many PCI Radeon
card+motherboard combinations, but it can be manually enabled by editing
xorg.conf and adding to the Device section for Radeon:

Option "BusType" "PCI"

When the X server is restarted, DRI should now be enabled via pcigart.  To
confirm DRI is working, run:

[mharris@hammer ~]$ glxinfo |grep renderer
OpenGL renderer string: Mesa DRI R300 20060815 TCL

That shows DRI is working.  Then check the X server log file to see that PCI is
being used instead of AGP:

[/var/log/Xorg.0.log]
(II) RADEON(0): AGP card detected
(**) RADEON(0): Forced into PCI mode
                ^^^^^^^^^^^^^^^^^^^^
[SNIP]
(II) RADEON(0): [drm] DRM interface version 1.3
(II) RADEON(0): [drm] created "radeon" driver at busid "pci:0000:01:00.0"
(II) RADEON(0): [drm] added 8192 byte SAREA at 0x1efff000
(II) RADEON(0): [drm] mapped SAREA 0x1efff000 to 0x2aaab50d2000
(II) RADEON(0): [drm] framebuffer handle = 0xc0000000
(II) RADEON(0): [drm] added 1 reserved context for kernel
(II) RADEON(0): [pci] 8192 kB allocated with handle 0x0039c200
(II) RADEON(0): [pci] ring handle = 0x2efff000
(II) RADEON(0): [pci] Ring mapped at 0x2aaab50d4000
(II) RADEON(0): [pci] Ring contents 0x00000000
(II) RADEON(0): [pci] ring read ptr handle = 0x1f000000
(II) RADEON(0): [pci] Ring read ptr mapped at 0x2aaab51d5000
(II) RADEON(0): [pci] Ring read ptr contents 0x00000000
(II) RADEON(0): [pci] vertex/indirect buffers handle = 0x2f000000
(II) RADEON(0): [pci] Vertex/indirect buffers mapped at 0x2aaab51d6000
(II) RADEON(0): [pci] Vertex/indirect buffers contents 0x00000000
(II) RADEON(0): [pci] GART texture map handle = 0x2f001000
(II) RADEON(0): [pci] GART Texture map mapped at 0x2aaab53d6000


That shows PCI gart is being used.  I must admit that I am kindof surprised it
actually works still nowadays, but it does.  I've been running for about an hour
with no stability problems, and I gave it a good bashing with an OpenGL first
person shooter game for a good 20 minutes.  I have not tested it with
desktop-effects as I'm in KDE.

Anyhow, I thought I'd share this workaround here as most people are probably
unfamiliar with forcing the Radeon driver to use pcigart mode for DRI when AGP
is flaky or not available for some reason.

Comment 7 Dave Jones 2007-10-22 18:40:44 UTC

338551 has a similar sounding problem, with the curious workaround that running
the debug build seems to make the problem go away.
Would be an interesting datapoint to know if the kernel-debug in F7 makes the
problem go away for you too.

Comment 8 Michal Jaegermann 2007-10-22 19:26:46 UTC

This really looks the same as bug 249174.  Quite a bit of information
can be found in comments there.

Comment 9 Michal Jaegermann 2007-10-22 20:37:03 UTC

A hack of booting with 'mem=510M' (there is 512M on a test machine),
as described in comments to bug 249174, works for me still with
kernel-2.6.23.1-4.fc7; although it stopped to work with rawhide
kernels 2.6.23.1-11.fc8 and 2.6.23.1-26.fc8 - as noted in bug 338551.

Comment 10 Callum Lerwick 2007-10-23 02:53:18 UTC

I'd say this is definitely a dupe. Do you have a VIA chipset by any chance? Me
and everyone else on 249174 and 338551 seem to have VIA chipsets.

Comment 11 Michal Jaegermann 2007-10-23 16:12:40 UTC

> Do you have a VIA chipset ...
If this was a question to Mike Harris then an output from lspci
is in an attachment to comment #2.  This does not look like VIA.

Comment 12 Martin Ebourne 2007-10-28 21:06:55 UTC

Even without VIA I think this is probably a dupe of bug 249174. Try the patch or
RPMs there.

Comment 13 Callum Lerwick 2007-10-29 04:02:22 UTC


*** This bug has been marked as a duplicate of 249174 ***

Comment 14 Mike A. Harris 2007-10-29 18:15:39 UTC

This system is an AMD Solo motherboard - AMD chipset.

Comment 15 Mike A. Harris 2007-10-29 18:18:22 UTC

This bug is filed against Fedora 7, but closed as a dupe of a Fedora devel
(Fedora 8) bug.  I assume whatever the bug fix is determined to be, will end up
making it to the F7 kernel also, but I thought I'd point it out here, in case
you wanted to track it separately for F7 also.

I'll followup in the master bug for now.

Comment 16 Mike A. Harris 2007-10-31 01:18:26 UTC

Upgraded to kernel-2.6.23.1-10.fc7 and it crashes on startup right away also,
only with a very different Oops message.  It's too big to fit on the screen and
scrolls off the top.  I rebooted 3 times and got slightly different oops, so it
seems there is some randomness to it.  ;o/

I was able to take 3 pictures, and will upload them soonish.  The new Oops could
very well be a totally different and possibly unrelated issue as well.  I can
file a separate bug report for it if desired, just let me know.

Comment 17 Chuck Ebbert 2007-10-31 01:35:06 UTC

Fix is not released yet, setting status to modified.

Comment 18 Mike A. Harris 2007-10-31 05:40:08 UTC

Updating just so there is full history here...

I updated to kernel-2.6.23.1-17.fc7 and the problem has now vanished, and I am
able to boot up with AGP enabled, and have DRI working in X with AGP.  I haven't
done any extensive testing yet, but it appears on the surface that this problem
is now resolved.

I'll be testing the -19 kernel next also.

Thanks guys.

Note You need to log in before you can comment on or make changes to this bug.