Bug 487126 - r300: X livelock on resume when compiz is running
r300: X livelock on resume when compiz is running
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: mesa (Show other bugs)
10
i686 Linux
low Severity medium
: ---
: ---
Assigned To: Adam Jackson
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-24 07:30 EST by Roman Kagan
Modified: 2009-05-27 19:42 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-27 19:42:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xorg.0.log (90.45 KB, text/plain)
2009-02-24 07:30 EST, Roman Kagan
no flags Details

  None (edit)
Description Roman Kagan 2009-02-24 07:30:29 EST
Created attachment 333038 [details]
Xorg.0.log

Description of problem:
h/w:
IBM ThinkPad T43p 2687-D5U
ATI Technologies Inc M24GL [Mobility FireGL V3200] (rev 80)

When running compiz, doing a suspend+resume results in a locked up X.

Version-Release number of selected component (if applicable):
The problem appeared throughout the F10 lifetime; the latest versions are:

mesa-dri-drivers-7.2-0.15.fc10.i386
libdrm-2.4.0-0.21.fc10.i386
xorg-x11-server-Xorg-1.5.3-6.fc10.i386
xorg-x11-drv-ati-6.10.0-2.fc10.i386
kernel-PAE-2.6.27.15-170.2.24.fc10.i686

How reproducible:
always

Steps to Reproduce:
1. boot with nomodeset (IIRC with KMS on the problem remains)
2. run gnome session with compiz (aka desktop effects on)
3. suspend + resume
  
Actual results:
X locks up:
- screen is not updated
- no response to keyboard
the system stays alive; connecting via ssh shows that
- X eats 100% CPU
- Xorg.0.log (attached) reports a detected infinite loop
- strace reports endless loop of of signals + sigreturn()
- gdb never succeeds to attach to the running X


Expected results:
X continues to work at the point where it was suspended.

Additional info:
The system resumes fine with metacity (aka desktop effects off).
Comment 1 Roman Kagan 2009-02-24 07:36:32 EST
Enabling debug logging from drm via

# echo 1 > /sys/module/drm/parameters/debug

after the resume showed an endless stream of messages in /proc/kmsg

<7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
<7>[drm:radeon_cp_getparam] pid=2502
<7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
<7>[drm:radeon_cp_getparam] pid=2502
<7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
<7>[drm:radeon_cp_getparam] pid=2502
...
Comment 2 Roman Kagan 2009-02-24 07:55:07 EST
Relevant part from the X calltrace extracted from attachment 333038 [details] above (with addresses translated into source code line numbers with eu-addr2line):

11: /usr/lib/libdrm.so.2 [0x4d4d6cf]
    libdrm-20080930/libdrm/xf86drm.c:187
        in drmIoctl()

12: /usr/lib/libdrm.so.2(drmCommandWriteRead+0x34) [0x4d4d934]
    libdrm-20080930/libdrm/xf86drm.c:2342
        in drmCommandWriteRead()

13: /usr/lib/dri/r300_dri.so [0x2b677a]
    mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:69
        in radeonGetLastFrame()

14: /usr/lib/dri/r300_dri.so [0x2b690f]
    mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:135
        in radeonWaitForFrameCompletion()

15: /usr/lib/dri/r300_dri.so(radeonCopyBuffer+0xd2) [0x2b6c79]
    mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:189
        in radeonCopyBuffer()
Comment 3 Roman Kagan 2009-02-24 08:10:27 EST
The endless loop is in radeonWaitForFrameCompletion():

mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:135

...
                                while (radeonGetLastFrame(radeon) <
                                       sarea->last_frame) ;
...

Apparently the loop condition never goes false.

Doing a binary edit (I didn't have all the tools to rebuild the patched version from source) of /usr/lib/dri/r300_dri.so changing the jump address to quit the loop after the first iteration, equivalent of the following patch

--- a/src/mesa/drivers/dri/r300/radeon_ioctl.c
+++ b/src/mesa/drivers/dri/r300/radeon_ioctl.c
@@ -132,7 +132,7 @@
 	if (radeon->do_irqs) {
 		if (radeonGetLastFrame(radeon) < sarea->last_frame) {
 			if (!radeon->irqsEmitted) {
-				while (radeonGetLastFrame(radeon) <
+				if (radeonGetLastFrame(radeon) <
 				       sarea->last_frame) ;
 			} else {
 				UNLOCK_HARDWARE(radeon);

made it resume successfully.

I can't claim I understand the possible impact of the change; however I'm running the patched version with compiz for 4 days now; it survived 14 suspend/resume cycles with no problem noticed.
Comment 4 Roman Kagan 2009-02-26 10:05:30 EST
Forgot to note that in Fedora 9 suspend/resume with compiz on this machine worked just fine; the regression showed up when upgrading to F10 in November.
Comment 5 Roman Kagan 2009-03-02 12:39:27 EST
drm-radeon-fix-upstream-suspend.patch included in the newer kernels addresses this issue from the right angle: by resetting the relevant members of sarea data structure on resume.

Now suspend/resume with compiz works for me with unmodified mesa and the latest kernel from koji:

# rpmverify -f /usr/lib/dri/r300_dri.so 
# uname -r
2.6.27.19-170.2.38.fc10.i686.PAE
# grep -c -i resume /var/log/Xorg.0.log
8

Feel free to close the bug.

Note You need to log in before you can comment on or make changes to this bug.