Bug 487126 - r300: X livelock on resume when compiz is running
r300: X livelock on resume when compiz is running
Product: Fedora
Classification: Fedora
Component: mesa (Show other bugs)
i686 Linux
low Severity medium
: ---
: ---
Assigned To: Adam Jackson
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2009-02-24 07:30 EST by Roman Kagan
Modified: 2009-05-27 19:42 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-05-27 19:42:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Xorg.0.log (90.45 KB, text/plain)
2009-02-24 07:30 EST, Roman Kagan
no flags Details

  None (edit)
Description Roman Kagan 2009-02-24 07:30:29 EST
Created attachment 333038 [details]

Description of problem:
IBM ThinkPad T43p 2687-D5U
ATI Technologies Inc M24GL [Mobility FireGL V3200] (rev 80)

When running compiz, doing a suspend+resume results in a locked up X.

Version-Release number of selected component (if applicable):
The problem appeared throughout the F10 lifetime; the latest versions are:


How reproducible:

Steps to Reproduce:
1. boot with nomodeset (IIRC with KMS on the problem remains)
2. run gnome session with compiz (aka desktop effects on)
3. suspend + resume
Actual results:
X locks up:
- screen is not updated
- no response to keyboard
the system stays alive; connecting via ssh shows that
- X eats 100% CPU
- Xorg.0.log (attached) reports a detected infinite loop
- strace reports endless loop of of signals + sigreturn()
- gdb never succeeds to attach to the running X

Expected results:
X continues to work at the point where it was suspended.

Additional info:
The system resumes fine with metacity (aka desktop effects off).
Comment 1 Roman Kagan 2009-02-24 07:36:32 EST
Enabling debug logging from drm via

# echo 1 > /sys/module/drm/parameters/debug

after the resume showed an endless stream of messages in /proc/kmsg

<7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
<7>[drm:radeon_cp_getparam] pid=2502
<7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
<7>[drm:radeon_cp_getparam] pid=2502
<7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
<7>[drm:radeon_cp_getparam] pid=2502
Comment 2 Roman Kagan 2009-02-24 07:55:07 EST
Relevant part from the X calltrace extracted from attachment 333038 [details] above (with addresses translated into source code line numbers with eu-addr2line):

11: /usr/lib/libdrm.so.2 [0x4d4d6cf]
        in drmIoctl()

12: /usr/lib/libdrm.so.2(drmCommandWriteRead+0x34) [0x4d4d934]
        in drmCommandWriteRead()

13: /usr/lib/dri/r300_dri.so [0x2b677a]
        in radeonGetLastFrame()

14: /usr/lib/dri/r300_dri.so [0x2b690f]
        in radeonWaitForFrameCompletion()

15: /usr/lib/dri/r300_dri.so(radeonCopyBuffer+0xd2) [0x2b6c79]
        in radeonCopyBuffer()
Comment 3 Roman Kagan 2009-02-24 08:10:27 EST
The endless loop is in radeonWaitForFrameCompletion():


                                while (radeonGetLastFrame(radeon) <
                                       sarea->last_frame) ;

Apparently the loop condition never goes false.

Doing a binary edit (I didn't have all the tools to rebuild the patched version from source) of /usr/lib/dri/r300_dri.so changing the jump address to quit the loop after the first iteration, equivalent of the following patch

--- a/src/mesa/drivers/dri/r300/radeon_ioctl.c
+++ b/src/mesa/drivers/dri/r300/radeon_ioctl.c
@@ -132,7 +132,7 @@
 	if (radeon->do_irqs) {
 		if (radeonGetLastFrame(radeon) < sarea->last_frame) {
 			if (!radeon->irqsEmitted) {
-				while (radeonGetLastFrame(radeon) <
+				if (radeonGetLastFrame(radeon) <
 				       sarea->last_frame) ;
 			} else {

made it resume successfully.

I can't claim I understand the possible impact of the change; however I'm running the patched version with compiz for 4 days now; it survived 14 suspend/resume cycles with no problem noticed.
Comment 4 Roman Kagan 2009-02-26 10:05:30 EST
Forgot to note that in Fedora 9 suspend/resume with compiz on this machine worked just fine; the regression showed up when upgrading to F10 in November.
Comment 5 Roman Kagan 2009-03-02 12:39:27 EST
drm-radeon-fix-upstream-suspend.patch included in the newer kernels addresses this issue from the right angle: by resetting the relevant members of sarea data structure on resume.

Now suspend/resume with compiz works for me with unmodified mesa and the latest kernel from koji:

# rpmverify -f /usr/lib/dri/r300_dri.so 
# uname -r
# grep -c -i resume /var/log/Xorg.0.log

Feel free to close the bug.

Note You need to log in before you can comment on or make changes to this bug.