Bug 177773

Summary: radeon driver hangs machine hard
Product: [Fedora] Fedora Reporter: Erwin Rol <redhatbugs>
Component: xorg-x11-drv-atiAssignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: linux
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-06 22:39:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150222    
Attachments:
Description Flags
DRI enabled and all XaaNo* options mentioned in xorg.conf none

Description Erwin Rol 2006-01-13 20:50:23 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8) Gecko/20060103 Fedora/1.5-4 Firefox/1.5

Description of problem:
The radeon driver hangs the machine in such a way that only a reset/power-off/on helps. With the old X driver (6.8 monolitic tree) this hardware did not cause any hangs. 

There seem to be a few things that changes the probability of a hang.
1. A patch from Benjamin Herrenschmidt http://lists.freedesktop.org/archives/xorg/2005-December/011678.html
makes it possible to atleast work for several hours befor a hang, without the patch it hangs within minutes.
2. there seems to be a difference in hang frequency depending on color depth. 16bit color seems to hang less than 24bit color, but this could just be a "feeling"
3. turning of all acceleration seems to help to prevent the hang, but make X extremly slow (read: unusable)

The card is a; 
01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)]
01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]

kernel/machine;
Linux xpc.home.erwinrol.com 2.6.15-1.1826.2.10_FC5 #1 SMP Wed Jan 11 18:13:37 EST 2006 x86_64 x86_64 x86_64 GNU/Linux



Version-Release number of selected component (if applicable):
kernel-2.6.15-1.1826.2.10_FC5 xorg-x11-drv-ati--6.5.7.2-1

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
It just seems to happen, there seems no way to really force it to happen

Additional info:

Comment 1 Ralph Loader 2006-01-15 02:40:57 UTC
This may be related to changes to radeon DRM in the kernel.

I had similar symptoms that went away around the time of:

commit 281ab031a8c9e5b593142eb4ec59a87faae8676a
Author: Benjamin Herrenschmidt <benh.org>
Date:   Fri Dec 16 16:52:22 2005 +1100

    [PATCH] radeon drm: fix agp aperture map offset

    This finally fixes the radeon memory mapping bug that was incorrectly
    fixed by the previous patch.  This time, we use the actual vram size as
    the size to calculate how far to move the AGP aperture from the
    framebuffer in card's memory space.

    If there are still issues with this patch, they are due to bugs in the X
    driver that I'm working on fixing too.

    Signed-off-by: Benjamin Herrenschmidt <benh.org>
    Cc: Mark M. Hoffman <mhoffman>
    Cc: Paul Mackerras <paulus>
    Signed-off-by: Linus Torvalds <torvalds>

and then re-appeared with

commit 392c14beaca2ee85a98d0c6b453501be67423a20
Author: Linus Torvalds <torvalds.org>
Date:   Thu Dec 29 13:01:54 2005 -0800

    Revert radeon AGP aperture offset changes

    This reverts the series of commits

        67dbb4ea33731415fe09c62149a34f472719ac1d
        281ab031a8c9e5b593142eb4ec59a87faae8676a
        47807ce381acc34a7ffee2b42e35e96c0f322e52

    that changed the GART VM start offset.  It fixed some machines, but
    seems to continually interact badly with some X versions.

(it's possible that my problems had another cause and the timing correlation was
just coincidence).

Disabling DRI stops the hangs for me.

Comment 2 Erwin Rol 2006-01-15 14:45:34 UTC
I don't think DRI is supported on my radeon, xdriinfo returns the following;

Xlib:  extension "XFree86-DRI" missing on display ":0.0".
Screen 0: not direct rendering capable.

what i did notice is that the hangs almost always happen when i am away from the
machine. I only had one hang when i was actually using the machine. Maybe some
powermanagement causes the hang ? 

Comment 3 Brian G. Anderson 2006-01-18 05:30:22 UTC
I too have hard machine lockup using the "ati" driver with DRI enabled.  The
machine will lockup hard, usually between 2minutes and 15 minutes after X is
started.  It has locked up in the middle of my typing so I don't think it is a
power management problem.  There are no messages in any log I can find.

Disabling DRI makes the problem go away, but as a previous poster noted, the
display is very slow now.

This behavior was seen on a clean install of FC5T2 with 'yum update' to 1/17. 
So xorg-x11-drv-ati is at version 6.5.7.2-1.  The chip is reported as "ATI
Technologies Inc M10 NT [FireGL Mobility T2] rev 128, Mem@ 0xe0000000/27,
0xc0100000/16, I/O @ 0x3000/0"

Comment 4 Mike A. Harris 2006-02-01 00:35:48 UTC
(In reply to comment #2)
> I don't think DRI is supported on my radeon, xdriinfo returns the following;

Correct, DRI is not supported on R300 or newer Radeon chipsets in Fedora
development.  There is an experimental driver, but it is not built, and
not ready for mainstream usage currently.

> Xlib:  extension "XFree86-DRI" missing on display ":0.0".
> Screen 0: not direct rendering capable.

That is to be expected for this hardware.
 
> what i did notice is that the hangs almost always happen when i am away from the
> machine. I only had one hang when i was actually using the machine. Maybe some
> powermanagement causes the hang ? 

It is possible that the screensaver is kicking in while you're away, and
that that is triggering a bug in the driver.  A lot of the acceleration
codepaths in the drivers are not used by normal software nowadays, and
many of the accel primatives seem to only get used by various screen savers
and other special-purpose applications, such as CAD software.  Try disabling
2D acceleration by using the following in the xorg.conf device section:

    Option "noaccel"

That will be horribly slow, but it is useful for diagnosis of the problem.
Indicate if this helps at all or not, you may need to use it this way for
a few hours or even a day perhaps.  If it does seem to resolve the issue,
then we can conclude for sure that it is likely faulty acceleration, in
which case the next step is to try the various "XaaNo" options from the
xorg.conf manpage one at a time, or in combinations to try to narrow down
the problematic acceleration.

Using the above diagnostic tests, what results are you able to obtain?


Comment 5 Mike A. Harris 2006-02-07 10:23:13 UTC
ping

Comment 6 Erwin Rol 2006-02-07 10:45:32 UTC
Sorry for the slow reply, i wanted to test it on another machine first, but
having problems installing the latest rawhide.

The "noaccel" option seem to make it work, and yes that is as fast as my first
512k trident card :-)

It could not have been the screen saver, becuase that wasn't trunned on at all,
and i also turned of all powersavings in the bios. 

I am now running only on my Xpress 200G (RS480) and that seems to work without a
hang with the normal rawhide ati driver.

I will try two other radeons, a PCIe one in this x86_64 machine, and a PCI one
in a machine i am trying to get rawhide to install on. 

Also i had some success with the patches from Benjamin from the xorg list, but
the last one cause problems with VT switching. 

Just wanted to let you know i didn't forgot about the bug, i just been to busy
with work to do more testing.

Comment 7 Brian G. Anderson 2006-02-08 20:06:29 UTC
Created attachment 124401 [details]
DRI enabled and all XaaNo* options mentioned in xorg.conf

Comment 8 Brian G. Anderson 2006-02-08 20:08:28 UTC
I have this exact problem on a IBM T42p thinkpad with an ATI FireGL Mobility T2
(lspci shows: 01:00.0 VGA compatible controller: ATI Technologies Inc M10 NT
[FireGL Mobility T2] (rev 80))

So this is what I have found while playing around with xorg.conf:
* If I have "Load dri" the machine will lock hard as soon as X is started;
forget screen savers,  I usually can't get passed the gdm screen.
* If I comment out "Load dri" the machine is stable and the moving of windows is
possible with no tearing
* If I have "Load dri" with "Option noaccel" in the device section the system is
stable, but window response is very slow; moving a window results in losts of tear
* If I have "Load dri" with every XaaNo* option, the system still locks hard (I
also accidently found out that "Load dri" commented out with every XaaNo*
results in very slow performance :)

I've attached a copy of my Xorg log when I had dri enabled and every XaaNo
option on in case that is of any help

Comment 9 Brian G. Anderson 2006-02-08 20:16:11 UTC
Oh I forgot to mention that I have updated to rawhide as of 8:00am 2/8 and I
noticed that mesa is mentioning that support for accelerated ATI drivers is
enabled.  I thought it odd that when I read my log it claims that no
accelleration is allowed for 9500/9700 it says that the mesa driver doesn't have
support.  Perhaps the message is outdated.

Comment 10 Erwin Rol 2006-02-08 20:25:51 UTC
I have a FireMV 2400 PCI (Quad head) card that directly hangs the machine when
used with the normal Rawhide x drivers. I have made a patched version of the
driver that makes it possible to use the card. I haven't tried DRI, which seems
to need an other patch. Maybe someone could try if the patched version works on
other hardware too.

The patched source RPM can be found here; 
http://www.erwinrol.com/downloads/software/xorg-x11-drv-ati-6.5.7.3-2.ER.1.src.rpm

Since the last RPM from rawhide has a higher version you might need to --force it. 

Comment 11 Mike A. Harris 2006-02-08 21:35:18 UTC
(In reply to comment #9)
> Oh I forgot to mention that I have updated to rawhide as of 8:00am 2/8 and I
> noticed that mesa is mentioning that support for accelerated ATI drivers is
> enabled.  I thought it odd that when I read my log it claims that no
> accelleration is allowed for 9500/9700 it says that the mesa driver doesn't have
> support.  Perhaps the message is outdated.

Please read the mesa package changelog entry for 6.4.2-2, which references
the r300 driver request bug which also contains more information.

Comment 12 Erwin Rol 2006-02-16 23:39:47 UTC
Benjamin Herrenschmidt checked in his radeon mmap patch into xorg CVS. I have
been running that patch for some weeks and things haven stable since. would it
be possible to create a new ati-drv rpm from that new CVS version.



Comment 13 Chris Adams 2006-02-17 15:40:45 UTC
Just to add a "me too"; with an up-to-date rawhide install on my Thinkpad Z60m
with "ATI Radeon Mobility X600 (M24) 3150 (PCIE)", X locks the system hard on
start.  Commenting out the 'Load "dri"' line or adding the 'Option "noaccel"'
line to the device section allows it to work.


Comment 14 Erwin Rol 2006-02-17 15:57:28 UTC
(In reply to comment #13)
> Just to add a "me too"; with an up-to-date rawhide install on my Thinkpad Z60m
> with "ATI Radeon Mobility X600 (M24) 3150 (PCIE)", X locks the system hard on
> start.  Commenting out the 'Load "dri"' line or adding the 'Option "noaccel"'
> line to the device section allows it to work.

Could you try the srpm i posted in comment #10 ? Still without the DRI , but
accell might work with it.

Comment 15 Chris Adams 2006-02-17 16:43:47 UTC
I rebuilt the latest rawhide xorg-x11-drv-ati with the two patches.  My system
locks on X start still with DRI enabled in the config.


Comment 16 Mike A. Harris 2006-02-18 00:56:08 UTC
Adam Jackson is currently planning on doing an Xorg server 1.0.2 update release
from the stable branch of CVS (not HEAD).  If Ben's patch is considered stable
enough for inclusion into the stable branch of Xorg server CVS, it may become
part of the 1.0.2 release, or a release subsequent to that.

Once a stable upstream Xorg server update has been released by X.Org which
includes Ben's patch, we will consider including it in Fedora development.


Comment 17 Chris Adams 2006-02-18 01:14:49 UTC
Perhaps a patch is needed for system-config-display that will disable DRI for
R300+ cards for now.  Otherwise, a bunch of systems will be left unusable after
install.

Comment 18 Mike A. Harris 2006-02-18 02:16:15 UTC
(In reply to comment #17)
> Perhaps a patch is needed for system-config-display that will disable DRI for
> R300+ cards for now.  Otherwise, a bunch of systems will be left unusable after
> install.

If the config tool does it, then two problems are created:

1) When the driver is considered stable and reliable, aka "fixed", the config
tool will still disable it until the config tool is updated.  Creates more
work for everyone, and more frustration for the end user.

2) When the driver is fixed/stable and the user upgrades to the new driver,
their old configuration will still continue to needlessly disable the
particular feature.


Due to these types of problems, since around Red Hat Linux 7.2 or 7.3 we
started patching the video drivers and/or Xserver directly to change any
defaults as needed.  This is the preferred method for any changes from
upstream defaults, as the changes are then self contained within the
X server, or driver that has the problem to begin with, and can be removed
at the same time the problem is resolved, causing all systems to be in
sync with the fixes at the same time, and not requiring the user to
reconfigure or perform any other manual changes.

I might update radeon over the weekend to account for this.


None of this is really relevent to this particular bug report however, as
the r300 DRI driver wasn't even enabled in rawhide until long after this
report was filed.  There is another bug tracking r300 DRI inclusion that
you might want to CC yourself on, however I don't have the bug ID handy.

HTH



Comment 20 Mike A. Harris 2006-02-22 08:17:28 UTC
It turns out that Option "nodri" does not seem to resolve DRI related
hangs on R300, from other reports we're getting.

This means that commenting out "Load "dri"" seems to be the only way right
now to have stable 2D-only setup if the r300 dri driver is supplied.

We may decide to disable r300 dri between now and FC5 rather than risk any
further instability.

It isn't clear that any of this is related to the _original_ reporter's
bug report here though.  The others who have CC'd on the bug report and
added comments, seem to be experiencing bug #177773 instead of this one,
however it isn't 100% clear.

Ben H is working on other fixes related to the patch refered to above
I'm told, which seems to indicate the patches are in a state of flux.
I'm leary to include them until they've had adequate testing in the wild,
so we'll leave this one for a few more days to see if the patch situation
changes upstream.


Comment 21 Mike A. Harris 2006-02-22 08:18:15 UTC
In comment #20, I meant bug #182196 in the 4th paragraph.

Comment 22 Mike A. Harris 2006-03-06 22:39:20 UTC
After re-reviewing all comments in this bug, I believe it is a straight
dupe of bug #182196.

*** This bug has been marked as a duplicate of 182196 ***