Bug 472536 - X hangs after a while (RS690M, KMS, EXA)
X hangs after a while (RS690M, KMS, EXA)
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati (Show other bugs)
10
All Linux
high Severity medium
: ---
: ---
Assigned To: Dave Airlie
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-21 11:03 EST by Michal Schmidt
Modified: 2009-10-22 08:56 EDT (History)
5 users (show)

See Also:
Fixed In Version: xorg-x11-drv-ati-6.9.0-58.fc10.x86_64
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-11-26 17:59:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xorg.0.log (includes a backtrace) (113.67 KB, text/plain)
2008-11-21 11:06 EST, Michal Schmidt
no flags Details
dmesg (40.06 KB, text/plain)
2008-11-21 11:06 EST, Michal Schmidt
no flags Details
workaround patch (408 bytes, patch)
2008-11-24 11:41 EST, Michal Schmidt
no flags Details | Diff

  None (edit)
Description Michal Schmidt 2008-11-21 11:03:07 EST
Description of problem:
With default Fedora 10 settings (mode setting enabled, no xorg.conf) Xorg gets stuck after a few minutes of working in Gnome.
The machine is not dead, I can ssh into it. top shows the CPUs are idle. I got a backtrace of Xorg with gdb:
...
Loaded symbols for /lib64/libnss_files.so.2
pixman_fill_mmx (bits=<value optimized out>, stride=192, 
    bpp=<value optimized out>, x=<value optimized out>, 
    y=<value optimized out>, width=<value optimized out>, height=17, xor=0)
    at pixman-mmx.c:1799
1799                __asm__ (
Missing separate debuginfos, use: debuginfo-install expat-2.0.1-5.x86_64 freetype-2.3.7-1.fc10.x86_64 libcap-2.10-2.fc10.x86_64 mesa-dri-drivers-7.2-0.13.fc10.x86_64 xorg-x11-drv-ati-6.9.0-55.fc10.x86_64 xorg-x11-drv-evdev-2.0.7-3.fc10.x86_64 xorg-x11-drv-fbdev-0.3.1-7.fc9.x86_64 xorg-x11-drv-synaptics-0.15.2-1.fc10.x86_64 xorg-x11-drv-vesa-2.0.0-1.fc10.x86_64 zlib-1.2.3-18.fc9.x86_64
(gdb) bt
#0  pixman_fill_mmx (bits=<value optimized out>, stride=192, 
    bpp=<value optimized out>, x=<value optimized out>, 
    y=<value optimized out>, width=<value optimized out>, height=17, xor=0)
    at pixman-mmx.c:1799
#1  0x000000302322640d in pixman_fill (bits=0x0, stride=0, bpp=-80224256, x=0, 
    y=-80224256, width=0, height=18, xor=0) at pixman-utils.c:175
#2  0x000000000103fcd2 in fbFill (pDrawable=<value optimized out>, 
    pGC=<value optimized out>, x=0, y=0, width=36, height=18) at fbfill.c:48
#3  0x000000000103ff36 in fbPolyFillRect (pDrawable=0x146e3e0, pGC=0x1429590, 
    nrect=0, prect=<value optimized out>) at fbfillrect.c:77
#4  0x00007fcc02320b74 in ExaCheckPolyFillRect (pDrawable=0x146e3e0, 
    pGC=0x1429590, nrect=1, prect=0x151dad0) at exa_unaccel.c:229
#5  0x00007fcc02319954 in exaPolyFillRect (pDrawable=0x146e3e0, pGC=0x1429590, 
    nrect=1, prect=0x151dad0) at exa_accel.c:778
#6  0x0000000000529b06 in damagePolyFillRect (pDrawable=0x146e3e0, 
    pGC=0x1429590, nRects=1, pRects=0x151dad0) at damage.c:1337
#7  0x0000000000443b76 in ProcPolyFillRectangle (client=0x151c000)
    at dispatch.c:1795
#8  0x00000000004468d4 in Dispatch () at dispatch.c:454
#9  0x000000000042cd1d in main (argc=9, argv=0x7fff0a7eb7a8, 
    envp=<value optimized out>) at main.c:441


Notice the funny values of arguments to pixmap_fill.
The hardware is:
ATI Technologies Inc RS690M [Radeon X1200 Series].

Version-Release number of selected component (if applicable):
kernel-2.6.27.5-120.fc10.x86_64
xorg-x11-server-Xorg-1.5.3-5.fc10.x86_64
xorg-x11-drv-ati-6.9.0-55.fc10.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot with kernel modesetting, default X configuration (no xorg.conf).
2. Run gtkperf with lots of iterations (10000).
  
Actual results:
In less than a minute gtkperf stops drawing. Everything else in X is stuck too. The mouse pointer is stuck for half a minute or so, then starts reacting to mouse movement again. But the rest of Xorg will not resume working.

Expected results:
X should not hang.
Comment 1 Michal Schmidt 2008-11-21 11:06:03 EST
Created attachment 324314 [details]
Xorg.0.log (includes a backtrace)
Comment 2 Michal Schmidt 2008-11-21 11:06:40 EST
Created attachment 324315 [details]
dmesg
Comment 3 Matěj Cepl 2008-11-21 12:51:34 EST
Lovely!

Backtrace:
0: /usr/bin/Xorg(xorg_backtrace+0x26) [0x4e7a26]
1: /usr/bin/Xorg(mieqEnqueue+0x291) [0x4c8591]
2: /usr/bin/Xorg(xf86PostMotionEventP+0xc4) [0x491494]
3: /usr/bin/Xorg(xf86PostMotionEvent+0xa9) [0x491669]
4: /usr/lib64/xorg/modules/input//synaptics_drv.so [0x7fcbfbc12832]
5: /usr/lib64/xorg/modules/input//synaptics_drv.so [0x7fcbfbc14de2]
6: /usr/bin/Xorg [0x47a765]
7: /usr/bin/Xorg [0x46b307]
8: /lib64/libc.so.6 [0x3554432f60]
9: /usr/lib64/libpixman-1.so.0 [0x3023229d10]
10: /usr/lib64/libpixman-1.so.0(pixman_fill+0x3d) [0x302322640d]
11: /usr/lib64/xorg/modules//libfb.so(fbFill+0x482) [0x103fcd2]
12: /usr/lib64/xorg/modules//libfb.so(fbPolyFillRect+0x1c6) [0x103ff36]
13: /usr/lib64/xorg/modules//libexa.so(ExaCheckPolyFillRect+0x44) [0x7fcc02320b74]
14: /usr/lib64/xorg/modules//libexa.so [0x7fcc02319954]
15: /usr/bin/Xorg [0x529b06]
16: /usr/bin/Xorg(ProcPolyFillRectangle+0xe6) [0x443b76]
17: /usr/bin/Xorg(Dispatch+0x364) [0x4468d4]
18: /usr/bin/Xorg(main+0x45d) [0x42cd1d]
19: /lib64/libc.so.6(__libc_start_main+0xe6) [0x355441e546]
20: /usr/bin/Xorg [0x42c0f9]
Comment 4 Dave Airlie 2008-11-23 22:58:38 EST
I've just kicked off a new kernel build in koji.

kernel-2.6.27.5-123.fc10

it'll appear here when finished.

http://kojipkgs.fedoraproject.org/packages/kernel/2.6.27.5/123.fc10/

Can you install it and see if it helps?
Comment 5 Michal Schmidt 2008-11-24 07:11:35 EST
kernel-2.6.27.5-123.fc10.x86_64 made no difference. I can still easily reproduce the hang, the backtrace is the same and I see no new messages in dmesg or Xorg.0.log.


Then I upgraded to xorg-x11-drv-ati-6.9.0-56.fc10. The test hangs again, but the backtrace is now different:

(gdb) bt
#0  0x00000035544ddff7 in ioctl () from /lib64/libc.so.6
#1  0x000000356d603023 in drmIoctl (fd=9, request=3222299750, arg=0x7fff6b29c520) at xf86drm.c:186
#2  0x000000356d60326c in drmCommandWriteRead (fd=9, drmCommandIndex=<value optimized out>, data=0x7fff6b29c520, size=18446744073709551615) at xf86drm.c:2342
#3  0x0000000007b23116 in RADEONCSFlushIndirect (pScrn=0x1590780, discard=<value optimized out>) at radeon_accel.c:629
#4  0x0000000007b2300b in RADEONCPFlushIndirect (pScrn=0x1590780, discard=1) at radeon_accel.c:794
#5  0x0000000007b729ea in R300TextureSetupCP (pPict=0x183f0a0, pPix=0x1828810, unit=1) at radeon_exa_render.c:1238
#6  0x0000000007b73226 in R300PrepareCompositeCP (op=3, pSrcPicture=0x1833ec0, pMaskPicture=0x183f0a0, pDstPicture=0x1837620, pSrc=0x1833cf0, pMask=0x1828810, pDst=0x18339e0)
    at radeon_exa_render.c:1447
#7  0x0000000005490932 in exaTryDriverComposite (op=3 '\003', pSrc=0x1833ec0, pMask=0x183f0a0, pDst=0x1837620, xSrc=3, ySrc=5, xMask=<value optimized out>, 
    yMask=<value optimized out>, xDst=<value optimized out>, yDst=<value optimized out>, width=<value optimized out>, height=<value optimized out>) at exa_render.c:671
#8  0x00000000054912d5 in exaComposite (op=3 '\003', pSrc=0x1833ec0, pMask=0x183f0a0, pDst=0x1837620, xSrc=3, ySrc=5, xMask=0, yMask=0, xDst=3, yDst=5, width=11, height=11)
    at exa_render.c:936
#9  0x00000000005291b8 in damageComposite (op=9 '\t', pSrc=0x1833ec0, pMask=0x183f0a0, pDst=0x1837620, xSrc=3, ySrc=5, xMask=0, yMask=<value optimized out>, 
    xDst=<value optimized out>, yDst=<value optimized out>, width=<value optimized out>, height=<value optimized out>) at damage.c:576
#10 0x0000000005490554 in exaTrapezoids (op=9 '\t', pSrc=0x1833ec0, pDst=0x1837620, maskFormat=0x159c158, xSrc=3, ySrc=5, ntrap=0, traps=0x17e1530) at exa_render.c:1122
#11 0x000000000051a83d in ProcRenderTrapezoids (client=0x17f0dc0) at render.c:791
#12 0x00000000004468d4 in Dispatch () at dispatch.c:454
#13 0x000000000042cd1d in main (argc=9, argv=0x7fff6b29cd48, envp=<value optimized out>) at main.c:441
Comment 6 Michal Schmidt 2008-11-24 07:37:22 EST
On another try I received a backtrace almost identical to the one in https://bugzilla.redhat.com/show_bug.cgi?id=472314#c13 . The two bugs may be duplicates.


(gdb) bt
#0  0x00000035544ddff7 in ioctl () from /lib64/libc.so.6
#1  0x000000356d603023 in drmIoctl (fd=9, request=3222299750, arg=0x7fff7eb30ee0) at xf86drm.c:186
#2  0x000000356d60326c in drmCommandWriteRead (fd=9, drmCommandIndex=<value optimized out>, data=0x7fff7eb30ee0, size=18446744073709551615) at xf86drm.c:2342
#3  0x000000000755c116 in RADEONCSFlushIndirect (pScrn=0x1de9600, discard=<value optimized out>) at radeon_accel.c:629
#4  0x000000000755c31d in RADEONCSReleaseIndirect (pScrn=0x9) at radeon_accel.c:703
#5  0x000000000755c3fd in RADEONCPReleaseIndirect (pScrn=0x9) at radeon_accel.c:833
#6  0x00000000075a20e8 in RADEONLeaveServer () at radeon_dri.c:560
#7  RADEONDRISwapContext (pScreen=<value optimized out>, syncType=<value optimized out>, oldContextType=<value optimized out>, oldContext=0xffffffffffffffff, 
    newContextType=<value optimized out>, newContext=0x1de8660) at radeon_dri.c:585
#8  0x0000000005440809 in DRIDoBlockHandler (screenNum=<value optimized out>, blockData=<value optimized out>, pTimeout=<value optimized out>, 
    pReadmask=<value optimized out>) at dri.c:1655
#9  0x000000000543f8e6 in DRIBlockHandler (blockData=0x0, pTimeout=0x7fff7eb31268, pReadmask=0x7dc7c0) at dri.c:1622
#10 0x000000000044a355 in BlockHandler (pTimeout=0x7fff7eb31268, pReadmask=0x7dc7c0) at dixutils.c:387
#11 0x00000000004e4eb1 in WaitForSomething (pClientsReady=0x1eeee20) at WaitFor.c:223
#12 0x00000000004465ef in Dispatch () at dispatch.c:375
#13 0x000000000042cd1d in main (argc=9, argv=0x7fff7eb31438, envp=<value optimized out>) at main.c:441
Comment 7 Michal Schmidt 2008-11-24 11:41:13 EST
Created attachment 324503 [details]
workaround patch

The first version where the problem first appears for me is xorg-x11-drv-ati-6.9.0-41. The attached patch reverts the change done in that version.
A scratch build is here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=948177

I don't know what the change does, but I can't reproduce the hang with this patch.
Comment 8 Dave Airlie 2008-11-24 15:42:25 EST
Wierd I was getting a hang in gtkperf here but the kernel I generated fixed it, and gtkperf completes fine.


I'll have a look to see why the workaround helps not hit the wierd case.
Comment 9 Michal Schmidt 2008-11-25 03:31:54 EST
I upgraded to xorg-x11-drv-ati-6.9.0-57.fc10.x86_64. The hang is still reproducible, but there is a small change in the best way to reproduce it.

With previous versions running GtkCheckButton test with 10000 iterations was very likely to cause the hang. With -57 I haven't been able to reproduce the hang during this test.

With -57, instead of the GtkCheckButton test, I can now use GtkEntry test with 10000 iterations to cause the hang. Or I just run all tests with the default 100 iterations - then the hang is most likely to occur during GtkTextView scroll test.
Comment 10 Bug Zapper 2008-11-26 00:43:51 EST
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 11 Michal Schmidt 2008-11-26 02:42:31 EST
xorg-x11-drv-ati-6.9.0-58.fc10.x86_64 looks good! I can't reproduce this hang anymore.
Comment 12 Florent Le Coz 2008-11-26 03:35:30 EST
(In reply to comment #11)
> xorg-x11-drv-ati-6.9.0-58.fc10.x86_64 looks good! I can't reproduce this hang
> anymore.

Same for me, I can't reproduce this freeze (with the -58), even when :
- scrolling very very quickly on firefox
- playing huge 3D games
- creating some rapidly moving text in a terminal

Good news !
Comment 13 Matěj Cepl 2008-11-26 17:59:29 EST
cool, closing per reporter's comment 11

Note You need to log in before you can comment on or make changes to this bug.