Bug 230524

Summary: Machine freezes when trying to start X using radeon driver and 9800XT
Product: [Fedora] Fedora Reporter: Ray Van Dolson <rayvd>
Component: xorg-x11-drv-atiAssignee: Dave Airlie <airlied>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: mcepl, pedro.madeira, xgl-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: f7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-05 22:33:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xorg log with DRI enabled.
none
lspci output
none
Base xorg.conf
none
Xorg log with DRI disabled
none
Xorg log; DRI enabled; fi1236_drv loaded; result of X -verbose
none
Xorg log; DRI enabled; fi1236_drv NOT loaded
none
ps awxl
none
Attempt to start X from within gdb.
none
Output of strace -f /usr/bin/X &> /tmp/strace.log
none
Output of ps afxwl after attempted startup of X from strace
none
strace -f /usr/bin/X -kb :0 &> /tmp/strace3.log
none
X -kb :0
none
Kernel drm patch for AGP aperture size w/ Radeon none

Description Ray Van Dolson 2007-03-01 06:54:40 UTC
Description of problem:

Linux galahad.bludgeon.org 2.6.19-1.2911.fc6 #1 SMP
xorg-x11-drv-ati-6.6.3-1.fc6
xorg-x11-server-Xorg-1.1.1-47.6.fc6


How reproducible:
Happens every time until 'alias radeon off' is added to modprobe.conf

Steps to Reproduce:
1. Install FC6 + all latest errata
2. Use default xorg.conf (as detected)
3. Start X (either with startx or runlevel 5)
  
Actual results:
DVI LCD shows out of scan range.  VGA LCD enters power save.  System is hard
locked (cannot access remotely, does not respond to ping).  Must hard reset.

Expected results:
X starts normally.

Additional info:
This is with an ASUS Radeon 9800 XT (ATI clone) w/ 256MB's memory.  ASUS A7V333
motherboard.

As soon as I enter 'alias radeon off' into modprobe.conf, X starts up and works
fine.  However I lose DRI.

Have not tried with rawhide.

Comment 1 Ray Van Dolson 2007-03-01 06:54:40 UTC
Created attachment 148993 [details]
Xorg log with DRI enabled.

Comment 2 Ray Van Dolson 2007-03-01 06:55:31 UTC
Created attachment 148994 [details]
lspci output

Comment 3 Ray Van Dolson 2007-03-01 06:56:36 UTC
Created attachment 148995 [details]
Base xorg.conf

Comment 4 Ray Van Dolson 2007-03-01 06:59:23 UTC
Maybe similar to bug 230025.  However 230025 is for rawhide.

Comment 5 Ray Van Dolson 2007-03-01 07:01:31 UTC
Comment on attachment 148993 [details]
Xorg log with DRI enabled.

This is a log from a "freeze" (note that DRI is enabled)

Comment 6 Ray Van Dolson 2007-03-01 07:02:39 UTC
Created attachment 148996 [details]
Xorg log with DRI disabled

X starts successfully in this case (alias radeon off in modprobe.conf)

Comment 7 Ray Van Dolson 2007-03-02 17:55:44 UTC
So, this is expected behavior?  Any reason why this was closed?

Comment 8 Adam Jackson 2007-03-06 16:25:34 UTC
This is fairly odd.  It looks like fi1236_drv.so is hanging the box; no idea why
that would be related to DRM though.  Can you start successfully with DRI
enabled if you move /usr/lib/xorg/modules/multimedia/fi1236_drv.so to /tmp?

Also, try this.  Boot the machine in runlevel 3 with DRI enabled and the fi1236
driver in the normal place (not moved out to /tmp).  ssh in from another machine
and run 'X -verbose', and see if the last line of the output is different from
how the log you gave ends.  I suspect the module in question is just buggy and
is trying to call a nonexistant function, which is instantly fatal.

Comment 9 Ray Van Dolson 2007-03-07 06:36:29 UTC
Moving fi1236_drv.so does not enable me to start X with DRI enabled.

I'll attach the resulting Xorg.log from that effort and also the Xorg.log which
resulted from starting X from a remote console with X -verbose and the
fi1236_drv.so in place.

Comment 10 Ray Van Dolson 2007-03-07 06:38:56 UTC
Created attachment 149431 [details]
Xorg log; DRI enabled; fi1236_drv loaded; result of X -verbose

Comment 11 Ray Van Dolson 2007-03-07 06:39:44 UTC
Created attachment 149432 [details]
Xorg log; DRI enabled; fi1236_drv NOT loaded

Comment 12 Ray Van Dolson 2007-03-07 06:42:27 UTC
I noticed that when starting X on the two previous attempts (with DRI enabled),
I was able to remotely ssh into the box while X was "hung" (screen = blank/black).
Could not attach strace to the X process.  lsof did not show anything
particularly useful for the X process and neither did /proc/pid_of_X/fd.  Could
not kill the X process with wither kill nor kill -9.

Comment 13 Adam Jackson 2007-03-07 16:19:05 UTC
Could you attach gdb to the hung X process?

If not, it would be useful to get the output of 'ps awxl' for the X server
process itself.  It's probably hung in a DRM ioctl for some reason.

Comment 14 Ray Van Dolson 2007-03-08 05:32:02 UTC
I am not able to attach gdb to the hung X process.  I run gdb /usr/bin/X <pid>
and it does start up, but cannot attach to the process (just sits there
hanging).  I have to ctrl-z out and reboot the machine.

I'll attach a ps awxl.

In addition, I tried running gdb /usr/bin/X and then starting X from gdb.  This
didn't appear to give me any additional helpful information, but I'll attach
what I captured from the screen.  Once the server locks up, there's no way to
detach to get a backtrace unless you can tell me some gdb tricks to try.  I
should note that this is with the -debug RPM's installed for the Xorg-server and
the Xorg-ati-drv.

I also captured the strace -f output of /usr/bin/X (run as strace -f /usr/bin/X
&> /tmp/strace.log).  I will attach that as well in case it is useful.

Comment 15 Ray Van Dolson 2007-03-08 05:35:50 UTC
Created attachment 149547 [details]
ps awxl

Result of ps awxl after starting X normally and then attempting to attach to it
with gdb /usr/bin/X <pidofX>

Comment 16 Ray Van Dolson 2007-03-08 05:36:59 UTC
Created attachment 149548 [details]
Attempt to start X from within gdb.

Ran gdb /usr/bin/X and then 'c' from the gdb> prompt.  This is stdout.

Comment 17 Ray Van Dolson 2007-03-08 05:38:50 UTC
Created attachment 149549 [details]
Output of strace -f /usr/bin/X &> /tmp/strace.log

Comment 18 Ray Van Dolson 2007-03-08 05:39:43 UTC
Created attachment 149550 [details]
Output of ps afxwl after attempted startup of X from strace

Comment 19 Adam Jackson 2007-03-08 15:59:51 UTC
That seems to show us getting hung up on the xkbcomp fork.  Which seems...
unrelated?  Really weird if so.

With DRI enabled, try starting X from the command line as:

X -kb :0

If _that_ comes up then we know we can blame something in the xkbcomp fork going
wacky.  Otherwise we're back to looking for DRI bugs.

Comment 20 Ray Van Dolson 2007-03-08 16:27:44 UTC
Nope, X still didn't come up.  Got some interesting strace output from it
though.  Will upload both that and the Xorg.log file.

Comment 21 Ray Van Dolson 2007-03-08 16:29:47 UTC
Created attachment 149581 [details]
strace -f /usr/bin/X -kb :0 &> /tmp/strace3.log

Comment 22 Ray Van Dolson 2007-03-08 16:30:44 UTC
Created attachment 149582 [details]
X -kb :0

Comment 23 Ray Van Dolson 2007-03-08 16:32:06 UTC
Comment on attachment 149581 [details]
strace -f /usr/bin/X -kb :0 &> /tmp/strace3.log

I should note that it appeared to be just looping on FD 9 and I couldn't reboot
with 'reboot' so I hard reset.	Maybe I should have tried attaching with gdb
first.

Comment 24 Ray Van Dolson 2007-03-30 13:54:04 UTC
Anyone out there? :-)  This actually also happens using the Ubuntu Feisty Fawn
Live CD.  Guess I could file a bug with them.  Wonder if it's the fault of Xorg
or the radeon module in the kernel.

Comment 25 Matěj Cepl 2007-03-30 15:04:58 UTC
ajax, tried to find dups on b.f.o, but there well ... There are two leads
https://bugs.freedesktop.org/show_bug.cgi?id=5341 (bug looks pretty similar, but
it is in some rather uncertain state, so I would hesitate to call it upstream of
this bug), and then the only things I found were not 100% same (IMHO)
https://bugs.freedesktop.org/show_bug.cgi?id=2581
https://bugs.freedesktop.org/show_bug.cgi?id=8243

One of the b.f.o bugs pointed (indirectly) to
http://www.openoffice.org/issues/show_bug.cgi?id=49902
which says, that it is actually
https://bugs.freedesktop.org/show_bug.cgi?id=1204
(but that looks very different)
and https://bugs.freedesktop.org/show_bug.cgi?id=1360 (even less likely).

There is also
https://launchpad.net/ubuntu/+source/xfree86/+bug/15219
which points to https://bugs.freedesktop.org/show_bug.cgi?id=3606 (looks different).

It is weird, because all of these bugs (except of b.f.o 5341) happen DURING the
running of Xorg (mostly when glxgears are involved), but we have here problem on
starup of X.

Comment 26 Ray Van Dolson 2007-04-10 07:12:45 UTC
Keep in mind guys that this is an Asus-branded 9800XT.  Don't know what
differences this would result in from a hardware perspective.  Would be
interesting to find someone else out there with an identical card to see if they
have the same issue.  There's on on eBay currently for $200. :-)

Comment 27 Ray Van Dolson 2007-04-12 05:24:03 UTC
Have had an interesting breakthrough here.

Due to a discussion on the dri-devel list here:

http://marc.info/?t=117634797400002&r=1&w=2

I changed my aperture settings in BIOS from 64MB to 128MB.  I can now start Xorg
without having alias radeon off in modprobe.conf.  I actually came across this
suggestion also on a Gentoo ATI page.

Still seems like this should work regardless of aperture settings.

Comment 28 Ray Van Dolson 2007-04-12 07:15:04 UTC
Michel Danzer has suggested that the following patch may address the issue:

http://gitweb.freedesktop.org/?p=mesa/drm.git;a=commitdiff;h=8ff026723cf170034173052a58c650c8c1f28c0b

See this post:

http://marc.info/?l=dri-devel&m=117635985514789&w=2

I don't even see a shared-core/radeon_cp.c in libdrm-2.3.0-1.fc6.src.rpm so I'm
not sure how easily this could be back-ported.

Comment 29 Ray Van Dolson 2007-04-12 15:08:01 UTC
Created attachment 152460 [details]
Kernel drm patch for AGP aperture size w/ Radeon

http://gitweb.freedesktop.org/?p=mesa/drm.git;a=commit;h=8ff026723cf170034173052a58c650c8c1f28c0b

Comment 30 Ray Van Dolson 2007-04-12 15:09:49 UTC
With the above patch applied to my FC6 kernel (.2933), I can now start Xorg with
the old 64MB aperture setting and DRI enabled.  The following appears in dmesg:

[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 17
[drm] Initialized radeon 1.25.0 20060524 on minor 0
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
[drm] Setting GART location based on new memory map
[drm] Can't use AGP base @0xf8000000, won't fit
[drm] Loading R300 Microcode
[drm] writeback test succeeded in 1 usecs

Note the "won't fit" error above.  This will go away if I re-up my aperture
setting to 128MB, but at least the system doesn't hang.  I don't know what the
performance consequences are for this.

Comment 31 Ray Van Dolson 2007-06-04 06:47:05 UTC
Any chance of getting this backported into FC6 or FC7?

Comment 32 Matěj Cepl 2007-10-05 22:07:42 UTC
Reporter, as a protection against soon-to-come end of support for FC6, could you
please confirm (and attach the logs) that you can reproduce this bug on F7 or on
Rawhide? Thanks.

Comment 33 Matěj Cepl 2007-10-05 22:09:28 UTC
*** Bug 200718 has been marked as a duplicate of this bug. ***

Comment 34 Ray Van Dolson 2007-10-05 22:21:18 UTC
I _believe_ the patch I mentioned above is actually in the F7 kernels.  I
thought I'd made a note of that here, but apparently not.

Michael Danzer applied this to the kernel upstream here:

 
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=commitdiff;h=80b2c386f3d8c3367533a8600b599f8686c9d386

Back in Feb of this year.  So this is included in F7 kernels that include this
patch.

You can close this out as far as I am concerned....

Comment 35 Matěj Cepl 2007-10-05 22:33:44 UTC
Closing per reporter's request.