Bug 487126
Summary: | r300: X livelock on resume when compiz is running | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Roman Kagan <rkagan> | ||||
Component: | mesa | Assignee: | Adam Jackson <ajax> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 10 | CC: | ajax, cra, xgl-maint | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-05-27 23:42:01 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Enabling debug logging from drm via # echo 1 > /sys/module/drm/parameters/debug after the resume showed an endless stream of messages in /proc/kmsg <7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1 <7>[drm:radeon_cp_getparam] pid=2502 <7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1 <7>[drm:radeon_cp_getparam] pid=2502 <7>[drm:drm_ioctl] pid=2502, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1 <7>[drm:radeon_cp_getparam] pid=2502 ... Relevant part from the X calltrace extracted from attachment 333038 [details] above (with addresses translated into source code line numbers with eu-addr2line):
11: /usr/lib/libdrm.so.2 [0x4d4d6cf]
libdrm-20080930/libdrm/xf86drm.c:187
in drmIoctl()
12: /usr/lib/libdrm.so.2(drmCommandWriteRead+0x34) [0x4d4d934]
libdrm-20080930/libdrm/xf86drm.c:2342
in drmCommandWriteRead()
13: /usr/lib/dri/r300_dri.so [0x2b677a]
mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:69
in radeonGetLastFrame()
14: /usr/lib/dri/r300_dri.so [0x2b690f]
mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:135
in radeonWaitForFrameCompletion()
15: /usr/lib/dri/r300_dri.so(radeonCopyBuffer+0xd2) [0x2b6c79]
mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:189
in radeonCopyBuffer()
The endless loop is in radeonWaitForFrameCompletion(): mesa-20081001/src/mesa/drivers/dri/r300/radeon_ioctl.c:135 ... while (radeonGetLastFrame(radeon) < sarea->last_frame) ; ... Apparently the loop condition never goes false. Doing a binary edit (I didn't have all the tools to rebuild the patched version from source) of /usr/lib/dri/r300_dri.so changing the jump address to quit the loop after the first iteration, equivalent of the following patch --- a/src/mesa/drivers/dri/r300/radeon_ioctl.c +++ b/src/mesa/drivers/dri/r300/radeon_ioctl.c @@ -132,7 +132,7 @@ if (radeon->do_irqs) { if (radeonGetLastFrame(radeon) < sarea->last_frame) { if (!radeon->irqsEmitted) { - while (radeonGetLastFrame(radeon) < + if (radeonGetLastFrame(radeon) < sarea->last_frame) ; } else { UNLOCK_HARDWARE(radeon); made it resume successfully. I can't claim I understand the possible impact of the change; however I'm running the patched version with compiz for 4 days now; it survived 14 suspend/resume cycles with no problem noticed. Forgot to note that in Fedora 9 suspend/resume with compiz on this machine worked just fine; the regression showed up when upgrading to F10 in November. drm-radeon-fix-upstream-suspend.patch included in the newer kernels addresses this issue from the right angle: by resetting the relevant members of sarea data structure on resume. Now suspend/resume with compiz works for me with unmodified mesa and the latest kernel from koji: # rpmverify -f /usr/lib/dri/r300_dri.so # uname -r 2.6.27.19-170.2.38.fc10.i686.PAE # grep -c -i resume /var/log/Xorg.0.log 8 Feel free to close the bug. |
Created attachment 333038 [details] Xorg.0.log Description of problem: h/w: IBM ThinkPad T43p 2687-D5U ATI Technologies Inc M24GL [Mobility FireGL V3200] (rev 80) When running compiz, doing a suspend+resume results in a locked up X. Version-Release number of selected component (if applicable): The problem appeared throughout the F10 lifetime; the latest versions are: mesa-dri-drivers-7.2-0.15.fc10.i386 libdrm-2.4.0-0.21.fc10.i386 xorg-x11-server-Xorg-1.5.3-6.fc10.i386 xorg-x11-drv-ati-6.10.0-2.fc10.i386 kernel-PAE-2.6.27.15-170.2.24.fc10.i686 How reproducible: always Steps to Reproduce: 1. boot with nomodeset (IIRC with KMS on the problem remains) 2. run gnome session with compiz (aka desktop effects on) 3. suspend + resume Actual results: X locks up: - screen is not updated - no response to keyboard the system stays alive; connecting via ssh shows that - X eats 100% CPU - Xorg.0.log (attached) reports a detected infinite loop - strace reports endless loop of of signals + sigreturn() - gdb never succeeds to attach to the running X Expected results: X continues to work at the point where it was suspended. Additional info: The system resumes fine with metacity (aka desktop effects off).