Bug 149732 - Hang with radeon driver when DRM DRI actve
Summary: Hang with radeon driver when DRM DRI actve
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Don Howard
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix
TreeView+ depends on / blocked
 
Reported: 2005-02-25 20:39 UTC by Rod Macdonald
Modified: 2007-11-30 22:07 UTC (History)
6 users (show)

Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-20 13:20:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
DRM driver patch fixing video stress hang on Dell server (806 bytes, patch)
2005-08-19 15:45 UTC, Rod Macdonald
no flags Details | Diff
DRM driver patch fixing video stress hang on Dell server (703 bytes, patch)
2005-12-01 21:52 UTC, Rod Macdonald
no flags Details | Diff
DRM driver patch fixing video stress hang on Dell server (703 bytes, patch)
2005-12-01 21:52 UTC, Rod Macdonald
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Rod Macdonald 2005-02-25 20:39:23 UTC
Platform: Dell Blade-server Poweredge 2850

The blade was either idle or running a bash script, when it becomes 
unresponsive aka hangs.

FSB analyzer trace shows that the CPU is trying to read video config 
space and never gets a reply. 

DRI is enabled by default installation (entry /usr/X11/XF86Config )

Workaround:  
Disabling DRI eliminates this issue at the price of some video 
performance.

Edit the XF86Config file (found in /etc/X11) to remove the line
load "dri" from the modules section  

The failure does not occur in low rez modes: 640x480 64k colors 
didn't crash but 1024x768 millions of colors does. 

This problem can be reproduced when all following conditions are true:
1. DELL system with on-board 7000m,
(The problem does not occur with a 7000m PCI board on a regular 
system).
2. DRI is enabled using PCIGART,
(The problem does not occur with DRI disabled)
3. Only single display contoller is used,
(The problem does not occur if clone mode is enabled).
4. Run eschertilerect100 many times,
(It's may not hang in first couple of times). 

Patch to fix drm kernel driver has been created for RHEL3u2 linux-
2.4.21-15.EL.  See "Additional Information" for the patch for fixing 
this problem. The problem is with the drm driver inside linux-2.4.21-
15.ELsmp kernel tree come with RHEL3u2.  

The problem was fixed in XFree86/Xorg some time ago.

Steps to reproduce:
1.  Install RH 3.0 Update 3
2.  Verify DRI is on by checking file /etc/X11/XF86Config for the 
line "load dri"
3.  Verify the graphics are using 32 bit color ("Millions of colors").
4.  Run: x11perf -eschertilerect100
    This normally runs for about 1 minute and exits without errors
but in this case it sometimes fails after about 30 seconds with a 
server hang.

FIX:  
Patch to fix drm kernel driver in RHEL3u2 linux-2.4.21-15.EL 
--- linux-2.4.21-15.EL/drivers/char/drm/radeon_cp.c.linux-2.4.21-
15.EL_orig     2004-10-29 16:29:56.000000000 -0400
+++ linux-2.4.21-15.EL/drivers/char/drm/radeon_cp.c     2004-10-29 
16:30:59.000000000 -0400
@@ -1347,7 +1347,7 @@

        LOCK_TEST_WITH_RETURN( dev );

-       if ( copy_from_user( &stop, (drm_radeon_init_t *)arg, sizeof
(stop) ) )
+       if ( copy_from_user( &stop, (drm_radeon_cp_stop_t *)arg, 
sizeof(stop) ) )
                return -EFAULT;

        /* Flush any pending CP commands.  This ensures any 
outstanding
@@ -1592,7 +1592,7 @@

        for ( i = d->granted_count ; i < d->request_count ; i++ ) {
                buf = radeon_freelist_get( dev );
-               if ( !buf ) return -EAGAIN;
+               if ( !buf ) return -EBUSY;

                buf->pid = current->pid;

Comment 1 Matt Domsch 2005-04-07 13:25:23 UTC
Per Sue Denham, this came in too late to make RHEL3 Update 5, will defer to
Update 6 to release a fix.

Comment 3 Larry Troan 2005-07-25 17:57:53 UTC
Please attach patch rather than submitting inline.

Comment 7 Ernie Petrides 2005-08-01 20:48:54 UTC
U6 closed a couple of weeks ago.  Moving from U6 to U7 proposed list.

Comment 8 Amit Bhutani 2005-08-03 00:54:39 UTC
Rod-

Can you please provide the patch to RH as requested since without that we are 
not making any progress as we are still hoping to get this in for U6.

Comment 11 Rod Macdonald 2005-08-19 15:45:05 UTC
Created attachment 117917 [details]
DRM driver patch fixing video stress hang on Dell server

The patch is for DRM driver inside the RHEL3 2.4.21 Linux kernel, 
which used an old version of DRI code.

Comment 15 Don Howard 2005-09-29 00:16:33 UTC
Has ATI tested this on RHEL3?

The patch is in RHEL4 and upstream 2.6.  It is not in RHEL3 or upstream 2.4.

The first part of the patch looks obviously correct.  I'm wondering if the error
return update will cause problems for exiting callers.  The 2.6 sources include:


		buf = radeon_freelist_get( dev );
		if ( !buf ) return DRM_ERR(EBUSY); /* NOTE: broken client */




Comment 18 Amit Bhutani 2005-10-13 00:15:06 UTC
Rod@ATI- Can you please provide answers to question posted in comment #15 ?

Comment 20 Peter Martuccelli 2005-10-28 15:31:21 UTC
Mustfix request is denied.  No PM ACK on this mustfix request, and no feedback
from ATI regarding the patch. 



Comment 22 Rod Macdonald 2005-11-21 16:44:04 UTC
RHEL3 U6:  Problem not reproducable.  Confirmed by test at ATI 32b OS - Dell 
PE6800 server, 20051027

RHEL4 U2:  Problem not reproducable.  Confirmed by test at ATI 32b OS - Dell 
PE6800 server, 20051027)

This item can be closed.


Comment 23 Raghavendra Biligiri 2005-11-23 06:13:09 UTC
Defect still reproducable on PE6800 with RHEL3-U6 installed.

Comment 24 Raghavendra Biligiri 2005-11-24 06:17:10 UTC
Steps to reproduce the defect:

  1. Install RHEL3-U6(kernel-2.4.21-37) 64-bit on PE6800.
  2. In the GUI mode run the command " x11perf -eschertilerect100 ".
  3. Run the above command a few times.
  4. System hangs

 


Comment 25 Raghavendra Biligiri 2005-11-29 11:02:59 UTC
With reference to comment#22 ATI was not able to reproduce this defect because 
they were trying it with RHEL3-U6 x86.The defect is reproducable only on RHEL3-
U6 x86_64.
ATI confirms that now they are able to reproduce the defect on RHEL3-U6 x86_64 
on PE6800 and are investigating the defect further.

Comment 26 Rod Macdonald 2005-12-01 21:52:07 UTC
Created attachment 121713 [details]
DRM driver patch fixing video stress hang on Dell server

This patch has been updated for RHEL3 U6

Comment 27 Rod Macdonald 2005-12-01 21:52:29 UTC
Created attachment 121714 [details]
DRM driver patch fixing video stress hang on Dell server

This patch has been updated for RHEL3 U6

Comment 28 Rod Macdonald 2005-12-01 22:02:03 UTC
Apologies for the double attachment, both patches are the same.  

This patch has been tested by Dell and was shown to prevent the system hang 
that occurs when running x11perf.  

Regarding the question in #15 above:

In the 2.6 kernel source this issue is corrected as shown

	buf = radeon_freelist_get( dev );
	if ( !buf ) return DRM_ERR(EBUSY); /* NOTE: broken client */

This will not cause problems for existing callers since this is a
defect. If the function returns EAGAIN the function will be retried over
and over potentially locking up the system. The correct return value
here is EBUSY.


Comment 32 Raghavendra Biligiri 2005-12-27 09:42:21 UTC
Patch not applied in RHEL3-U7-Beta(kernel-2.4.21-38).
Defect not Fixed in RHEL3-U7-Beta(kernel-2.4.21-38).

Comment 34 Rod Macdonald 2006-01-09 15:32:19 UTC
I see the hardware field has been changed to X86_64.  Can you insure that the 
code fix is also applied to 32 bit.  Although the problem is not reproducing 
there today (with RHEL3-U6) it did occur with earlier releases.

Comment 37 Don Howard 2006-01-24 20:28:13 UTC
The hardware field was updated in response to comments 24 and 25.
The patch is queued for inclusion in U8, and applies to all platforms.


Comment 38 Samuel Benjamin 2006-02-09 20:09:43 UTC
Raising priority to high based on Dell's U8 consideration.

Comment 39 Ernie Petrides 2006-02-16 00:48:17 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.1.EL).


Comment 42 Joshua Giles 2006-05-30 16:12:15 UTC
A kernel has been released that contains a patch for this problem.  Please
verify if your problem is fixed with the latest available kernel from the RHEL3
public beta channel at rhn.redhat.com and post your results to this bugzilla.

Comment 43 Ernie Petrides 2006-05-30 20:24:07 UTC
Reverting to ON_QA.

Comment 45 Red Hat Bugzilla 2006-07-20 13:20:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html



Note You need to log in before you can comment on or make changes to this bug.