Red Hat Bugzilla – Bug 230524
Machine freezes when trying to start X using radeon driver and 9800XT
Last modified: 2007-11-30 17:11:58 EST
Description of problem:
Linux galahad.bludgeon.org 2.6.19-1.2911.fc6 #1 SMP
Happens every time until 'alias radeon off' is added to modprobe.conf
Steps to Reproduce:
1. Install FC6 + all latest errata
2. Use default xorg.conf (as detected)
3. Start X (either with startx or runlevel 5)
DVI LCD shows out of scan range. VGA LCD enters power save. System is hard
locked (cannot access remotely, does not respond to ping). Must hard reset.
X starts normally.
This is with an ASUS Radeon 9800 XT (ATI clone) w/ 256MB's memory. ASUS A7V333
As soon as I enter 'alias radeon off' into modprobe.conf, X starts up and works
fine. However I lose DRI.
Have not tried with rawhide.
Created attachment 148993 [details]
Xorg log with DRI enabled.
Created attachment 148994 [details]
Created attachment 148995 [details]
Maybe similar to bug 230025. However 230025 is for rawhide.
Comment on attachment 148993 [details]
Xorg log with DRI enabled.
This is a log from a "freeze" (note that DRI is enabled)
Created attachment 148996 [details]
Xorg log with DRI disabled
X starts successfully in this case (alias radeon off in modprobe.conf)
So, this is expected behavior? Any reason why this was closed?
This is fairly odd. It looks like fi1236_drv.so is hanging the box; no idea why
that would be related to DRM though. Can you start successfully with DRI
enabled if you move /usr/lib/xorg/modules/multimedia/fi1236_drv.so to /tmp?
Also, try this. Boot the machine in runlevel 3 with DRI enabled and the fi1236
driver in the normal place (not moved out to /tmp). ssh in from another machine
and run 'X -verbose', and see if the last line of the output is different from
how the log you gave ends. I suspect the module in question is just buggy and
is trying to call a nonexistant function, which is instantly fatal.
Moving fi1236_drv.so does not enable me to start X with DRI enabled.
I'll attach the resulting Xorg.log from that effort and also the Xorg.log which
resulted from starting X from a remote console with X -verbose and the
fi1236_drv.so in place.
Created attachment 149431 [details]
Xorg log; DRI enabled; fi1236_drv loaded; result of X -verbose
Created attachment 149432 [details]
Xorg log; DRI enabled; fi1236_drv NOT loaded
I noticed that when starting X on the two previous attempts (with DRI enabled),
I was able to remotely ssh into the box while X was "hung" (screen = blank/black).
Could not attach strace to the X process. lsof did not show anything
particularly useful for the X process and neither did /proc/pid_of_X/fd. Could
not kill the X process with wither kill nor kill -9.
Could you attach gdb to the hung X process?
If not, it would be useful to get the output of 'ps awxl' for the X server
process itself. It's probably hung in a DRM ioctl for some reason.
I am not able to attach gdb to the hung X process. I run gdb /usr/bin/X <pid>
and it does start up, but cannot attach to the process (just sits there
hanging). I have to ctrl-z out and reboot the machine.
I'll attach a ps awxl.
In addition, I tried running gdb /usr/bin/X and then starting X from gdb. This
didn't appear to give me any additional helpful information, but I'll attach
what I captured from the screen. Once the server locks up, there's no way to
detach to get a backtrace unless you can tell me some gdb tricks to try. I
should note that this is with the -debug RPM's installed for the Xorg-server and
I also captured the strace -f output of /usr/bin/X (run as strace -f /usr/bin/X
&> /tmp/strace.log). I will attach that as well in case it is useful.
Created attachment 149547 [details]
Result of ps awxl after starting X normally and then attempting to attach to it
with gdb /usr/bin/X <pidofX>
Created attachment 149548 [details]
Attempt to start X from within gdb.
Ran gdb /usr/bin/X and then 'c' from the gdb> prompt. This is stdout.
Created attachment 149549 [details]
Output of strace -f /usr/bin/X &> /tmp/strace.log
Created attachment 149550 [details]
Output of ps afxwl after attempted startup of X from strace
That seems to show us getting hung up on the xkbcomp fork. Which seems...
unrelated? Really weird if so.
With DRI enabled, try starting X from the command line as:
X -kb :0
If _that_ comes up then we know we can blame something in the xkbcomp fork going
wacky. Otherwise we're back to looking for DRI bugs.
Nope, X still didn't come up. Got some interesting strace output from it
though. Will upload both that and the Xorg.log file.
Created attachment 149581 [details]
strace -f /usr/bin/X -kb :0 &> /tmp/strace3.log
Created attachment 149582 [details]
X -kb :0
Comment on attachment 149581 [details]
strace -f /usr/bin/X -kb :0 &> /tmp/strace3.log
I should note that it appeared to be just looping on FD 9 and I couldn't reboot
with 'reboot' so I hard reset. Maybe I should have tried attaching with gdb
Anyone out there? :-) This actually also happens using the Ubuntu Feisty Fawn
Live CD. Guess I could file a bug with them. Wonder if it's the fault of Xorg
or the radeon module in the kernel.
ajax, tried to find dups on b.f.o, but there well ... There are two leads
https://bugs.freedesktop.org/show_bug.cgi?id=5341 (bug looks pretty similar, but
it is in some rather uncertain state, so I would hesitate to call it upstream of
this bug), and then the only things I found were not 100% same (IMHO)
One of the b.f.o bugs pointed (indirectly) to
which says, that it is actually
(but that looks very different)
and https://bugs.freedesktop.org/show_bug.cgi?id=1360 (even less likely).
There is also
which points to https://bugs.freedesktop.org/show_bug.cgi?id=3606 (looks different).
It is weird, because all of these bugs (except of b.f.o 5341) happen DURING the
running of Xorg (mostly when glxgears are involved), but we have here problem on
starup of X.
Keep in mind guys that this is an Asus-branded 9800XT. Don't know what
differences this would result in from a hardware perspective. Would be
interesting to find someone else out there with an identical card to see if they
have the same issue. There's on on eBay currently for $200. :-)
Have had an interesting breakthrough here.
Due to a discussion on the dri-devel list here:
I changed my aperture settings in BIOS from 64MB to 128MB. I can now start Xorg
without having alias radeon off in modprobe.conf. I actually came across this
suggestion also on a Gentoo ATI page.
Still seems like this should work regardless of aperture settings.
Michel Danzer has suggested that the following patch may address the issue:
See this post:
I don't even see a shared-core/radeon_cp.c in libdrm-2.3.0-1.fc6.src.rpm so I'm
not sure how easily this could be back-ported.
Created attachment 152460 [details]
Kernel drm patch for AGP aperture size w/ Radeon
With the above patch applied to my FC6 kernel (.2933), I can now start Xorg with
the old 64MB aperture setting and DRI enabled. The following appears in dmesg:
[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 17
[drm] Initialized radeon 1.25.0 20060524 on minor 0
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
[drm] Setting GART location based on new memory map
[drm] Can't use AGP base @0xf8000000, won't fit
[drm] Loading R300 Microcode
[drm] writeback test succeeded in 1 usecs
Note the "won't fit" error above. This will go away if I re-up my aperture
setting to 128MB, but at least the system doesn't hang. I don't know what the
performance consequences are for this.
Any chance of getting this backported into FC6 or FC7?
Reporter, as a protection against soon-to-come end of support for FC6, could you
please confirm (and attach the logs) that you can reproduce this bug on F7 or on
*** Bug 200718 has been marked as a duplicate of this bug. ***
I _believe_ the patch I mentioned above is actually in the F7 kernels. I
thought I'd made a note of that here, but apparently not.
Michael Danzer applied this to the kernel upstream here:
Back in Feb of this year. So this is included in F7 kernels that include this
You can close this out as far as I am concerned....
Closing per reporter's request.