Bug 527452 (xserver-lockup)

Summary: Error in drmIoctl at xf86drm.c / Cannot render with Radeon Cards / Freezes the mashine / No error messages
Product: [Fedora] Fedora Reporter: Joshua Covington <joshuacov>
Component: xorg-x11-drv-atiAssignee: Jérôme Glisse <jglisse>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 11CC: awilliam, mcepl, mcepl, rodd, xgl-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: card_IGP300/miI
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-06 20:45:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xorg.0.log
none
dmesg
none
Xorg.0.log-1.6.99.901.log (f12)
none
dmesg-2.6.31-23.fc12.x86_64 (f12)
none
dmesg-2.6.31.5-96.fc12.x86_64
none
Xorg.0.log from Xorg-server-1.7.0-1.fc12.x86_64 none

Description Joshua Covington 2009-10-06 13:31:51 UTC
Description of problem:
I have a problem with my xorg-x11-server-Xorg-1.6.4-0.1.fc11.x86_64. I read the instructions here http://www.x.org/wiki/Development/Documentation/ServerDebugging and this is the output. My Xserver goes in an infinite loop and starts consuming 100% my cpu. Therefore the whole system freezes and I have to manually reboot it. This happens all the time when I use firefox on random sites. After restart there is nothing in the xorg.0.log and everything repeats itself when I start firefox again. Therefore I need to remove the savesession.js from the firefox home directory.

On the second mashine I also turned this on: handle SIGUSR1 nostop, handle SIGUSR2 nostop, handle SIGPIPE nostop.


Version-Release number of selected component (if applicable):
kernel-debuginfo-2.6.30.8-64.fc11.x86_64
kernel-debuginfo-common-x86_64-2.6.30.8-64.fc11.x86_64
kernel-2.6.30.8-64.fc11.x86_64

xorg-x11-server-utils-7.4-7.1.fc11.x86_64
xorg-x11-server-common-1.6.4-0.1.fc11.x86_64
xorg-x11-server-Xorg-1.6.4-0.1.fc11.x86_64
xorg-x11-server-debuginfo-1.6.4-0.1.fc11.x86_64

xorg-x11-drv-ati-debuginfo-6.12.2-18.fc11.x86_64
xorg-x11-drv-ati-6.12.2-18.fc11.x86_64

mesa-dri-drivers-7.6-0.2.fc11.x86_64
mesa-libGLU-7.6-0.2.fc11.x86_64
mesa-libGL-7.6-0.2.fc11.x86_64
mesa-debuginfo-7.6-0.1.fc11.x86_64

libdrm-debuginfo-2.4.11-2.fc11.x86_64
libdrm-2.4.11-2.fc11.x86_64

How reproducible:
This happens when I start firefox on random sites. It doesn't happen instantly but in 100% of the time. The number of open tabs seems to be irrelevant but I don't think it's related to firefox. Maybe xorg-x11-drv-ati is the problem or even something in the drm?


Steps to Reproduce:
N/A
  
Actual results:

(gdb) handle SIGUSR1 nostop
Signal        Stop      Print   Pass to program Description
SIGUSR1       No        Yes     Yes             User defined signal 1
(gdb) handle SIGUSR2 nostop
Signal        Stop      Print   Pass to program Description
SIGUSR2       No        Yes     Yes             User defined signal 2
(gdb) handle SIGPIPE nostop
Signal        Stop      Print   Pass to program Description
SIGPIPE       No        Yes     Yes             Broken pipe
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x0000003cc3cd6827 in ioctl () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003cc3cd6827 in ioctl () from /lib64/libc.so.6
#1  0x0000003cd6003113 in drmIoctl (fd=8, request=3221775460, arg=0x7fff78cabbc0) at xf86drm.c:187
#2  0x0000003cd600335c in drmCommandWriteRead (fd=8, drmCommandIndex=<value optimized out>, data=0x7fff78cabbc0, size=<value optimized out>)   at xf86drm.c:2363
#3  0x00007f6c6a6b3f08 in radeon_bufmgr_gem_wait_rendering (buf=<value optimized out>) at radeon_bufmgr_gem.c:282
#4  0x00007f6c6a69a51a in RADEONPrepareAccess (pPix=0x243c2d0, index=0) at radeon_exa.c:279
#5  0x00007f6c69be43b4 in ExaDoPrepareAccess (pDrawable=0x243c2d0, index=0) at exa.c:523
#6  0x00007f6c69be44b8 in exaPrepareAccessReg (pDrawable=0x243c2d0, index=0, pReg=0x0) at exa.c:543
#7  0x00007f6c69beceac in ExaCheckComposite (op=<value optimized out>, pSrc=0x24430a0, pMask=0x2397610, pDst=0x27a04b0, xSrc=<value optimized
out>, ySrc=<value optimized out>, xMask=0, yMask=0, xDst=19, yDst=85,
width=55, height=18) at exa_unaccel.c:342
#8  0x00007f6c69beb564 in exaComposite (op=<value optimized out>, pSrc=0x24430a0, pMask=0x2397610, pDst=0x27a04b0, xSrc=<value optimized
out>, ySrc=<value optimized out>, xMask=0, yMask=0, xDst=19, yDst=85, width=55, height=18) at exa_render.c:967
#9  0x000000000052eb90 in damageComposite (op=8 '\b', pSrc=<value optimized out>, pMask=<value optimized out>, pDst=0x27a04b0, xSrc=1, ySrc=0, xMask=<value optimized out>, yMask=<value optimized out>, xDst=19, yDst=85, width=55, height=<value optimized out>) at damage.c:643
#10 0x000000000052720c in ProcRenderComposite (client=0x2625310) at render.c:720
#11 0x00000000004471d4 in Dispatch () at dispatch.c:456
#12 0x000000000042d205 in main (argc=<value optimized out>, argv=0x7fff78cac198, envp=<value optimized out>) at main.c:397


Expected results:
It works normally


Additional info:
01:05.0 VGA compatible controller: ATI Technologies Inc RS482 [Radeon Xpress 200M] (prog-if 00 [VGA controller])
        Subsystem: Acer Incorporated [ALI] Device 010f
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 66 (2000ns min), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 17  
        Region 0: Memory at c8000000 (32-bit, prefetchable) [size=128M]      
        Region 1: I/O ports at 9000 [size=256]  
        Region 2: Memory at c0100000 (32-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at c0120000 [disabled] [size=128K]    
        Capabilities: [50] Power Management version 2 
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-          
        Kernel driver in use: radeon
        Kernel modules: radeon, radeonfb

I've also attched Xorg.0.log and dmesg. They don't show any errors.

But once I found this in dmesg (never seen since then, maybe isn't related):
[drm:drm_buffer_object_validate] *ERROR* Out of aperture space or DRM memory quota.
[drm:drm_buffer_object_validate] *ERROR* Failed moving buffer.
ffff880142183000 19219 20000a7 10000a7
[drm:drm_buffer_object_validate] *ERROR* Out of aperture space or DRM memory quota.
[drm:drm_buffer_object_validate] *ERROR* Failed moving buffer.
ffff880142183000 19219 20000a7 10000a7
[drm:drm_buffer_object_validate] *ERROR* Out of aperture space or DRM memory quota.
[drm:drm_buffer_object_validate] *ERROR* Failed moving buffer.
ffff880142183000 19219 20000a7 10000a7

And once I saw this in the Xorg.0.log (never seen since then, maybe not related):
(EE) RADEON(0): ADVANCE_RING count != expected (14 vs 16) at
radeon_textured_videofuncs.c:1623
(EE) RADEON(0): ADVANCE_RING count != expected (14 vs 16) at
radeon_textured_videofuncs.c:1623
(EE) RADEON(0): ADVANCE_RING count != expected (14 vs 16) at radeon_textured_videofuncs.c:1623

Comment 1 Joshua Covington 2009-10-06 13:33:37 UTC
Created attachment 363836 [details]
Xorg.0.log

No errors shown.

Comment 2 Joshua Covington 2009-10-06 13:34:18 UTC
Created attachment 363837 [details]
dmesg

No errors shown.

Comment 3 Joshua Covington 2009-10-06 20:07:52 UTC
I tried the latest F12-Snap3-x86_64-Live-KDE from 18.09.2009. Installed are:

xorg-x11-drv-ati-debuginfo-6.13.0-0.4.20090908git651fe5a47.fc12.x86_64
xorg-x11-server-debuginfo-1.6.99.901-2.fc12.x86_64
mesa-debuginfo-7.6-0.11.fc12.x86_64
libdrm-debuginfo-2.4.12-0.10.fc12.x86_64
kernel-2.6.31-23.fc12.x86_64

I couldn't make firefox work with the (adobe's) libflashplayer.so. It crached after installing the .so and trying to visit a site with flash on it. Afterwards it wasn't possible to start anything, but I think this is a problem not connected to the xserver.

Since I used firefox to trigger this behaviour I cannot confirm that this problem doesn't exist in F12. After crashing firefox the xserver resieved SIGPIPE and everything froze. The gdb -bt is:

(gdb) cont
Continuing.

Program received signal SIGPIPE, Broken pipe.
^C
Program received signal SIGINT, Interrupt.
0x00007f2dfcb8d0a3 in __select_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f2dfcb8d0a3 in __select_nocancel () from /lib64/libc.so.6
#1  0x000000000045c4ea in WaitForSomething (
    pClientsReady=<value optimized out>) at WaitFor.c:229
#2  0x000000000042c382 in Dispatch () at dispatch.c:381
#3  0x0000000000421caa in main (argc=<value optimized out>,
    argv=<value optimized out>, envp=<value optimized out>) at main.c:285

Even my ssh was terminated and I couldn't login again. This was a total freeze. I've also attched the Xorg.0.log and the dmesg.

PS: maybe this whole behaivour isn't connected to the xserver but I wanted to try and see if this problems persist in the next F12. For me it still occurs in F12.

Comment 4 Joshua Covington 2009-10-06 20:09:41 UTC
Created attachment 363890 [details]
Xorg.0.log-1.6.99.901.log (f12)

Comment 5 Joshua Covington 2009-10-06 20:11:53 UTC
Created attachment 363891 [details]
dmesg-2.6.31-23.fc12.x86_64 (f12)

Comment 6 Rodd Clarkson 2009-10-13 10:14:43 UTC
Is this bug related?: https://bugzilla.redhat.com/show_bug.cgi?id=528593

Comment 7 Joshua Covington 2009-10-22 07:41:46 UTC
pcie_aspm=off isn't working for me. After pssing this on the kernel line the lockup still happens. I even ported back the xserver-1.7.0-exa-looping-forever-is-evil.patch to f11-server and this problem still occurs.

This should have been fixed in the current f12-beta. In order to trigger the lockup I use firefox with more than 30 open tabs with flash content. In f11 it does lock up but in f12-beta it's working fine (it has other problems though).

Comment 8 Adam Williamson 2009-11-01 21:55:10 UTC
Joshua's bug is different from 528593. 528593 is the widely-noted r600+/ICH9+ combination issue, Joshua has an older r400-generation chipset (Xpress 200m).

I believe this is likely the same bug as #521512 as far as F12 is concerned, but let's leave this separate as it's filed on F11.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 9 Joshua Covington 2009-11-02 00:04:14 UTC
I've been testing with the latest nightly builds from http://alt.fedoraproject.org/pub/alt/nightly-composes/kde/.

I can say that this bug should have been fixed in the current rawhide. I tried to reproduce it with 

kernel-2.6.31.5-96.fc12.x86_64
mesa-libGLU-7.6-0.13.fc12.x86_64
mesa-dri-drivers-7.6-0.13.fc12.x86_64
mesa-libGL-7.6-0.13.fc12.x86_64
libdrm-2.4.15-1.fc12.x86_64
xorg-x11-server-Xorg-1.7.0-1.fc12.x86_64
xorg-x11-server-common-1.7.0-1.fc12.x86_64
xorg-x11-drv-ati-6.13.0-0.10.20091006git457646d73.fc12.x86_64

but I couldn't. It could be nice if someone figures this out and ports the fix back to f11.

Comment 10 Joshua Covington 2009-11-02 00:07:23 UTC
Created attachment 367055 [details]
dmesg-2.6.31.5-96.fc12.x86_64

dmesg-2.6.31.5-96.fc12.x86_64. It shows other problems, though.

Comment 11 Joshua Covington 2009-11-02 00:11:21 UTC
Created attachment 367056 [details]
Xorg.0.log from Xorg-server-1.7.0-1.fc12.x86_64

I have to add that now my display isn't 1280x800 but 1***x800, but this is another issue.

As I said I hope this can be ported back to f11.

Comment 12 Matěj Cepl 2009-11-05 18:38:41 UTC
Since this bugzilla report was filed, there have been several major updates in various components of the Xorg system, which may have resolved this issue. Users who have experienced this problem are encouraged to upgrade their system to the latest version of their packages. For packages from updates-testing repository you can use command

yum upgrade --enablerepo='*-updates-testing'

Alternatively, you can also try to test whether this bug is reproducible with the upcoming Fedora 12 distribution by downloading LiveMedia of F12 Beta available at http://alt.fedoraproject.org/pub/alt/nightly-composes/ . By using that you get all the latest packages without need to install anything on your computer. For more information on using LiveMedia take a look at https://fedoraproject.org/wiki/FedoraLiveCD .

Please, if you experience this problem on the up-to-date system, let us now in the comment for this bug, or whether the upgraded system works for you.

If you won't be able to reply in one month, I will have to close this bug as INSUFFICIENT_DATA. Thank you.

[This is a bulk message for all open Fedora Rawhide Xorg-related bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]

Comment 13 Rodd Clarkson 2009-11-06 00:10:34 UTC
Let's close this bug.  I think all these issues have been address recently in f12.

Comment 14 Joshua Covington 2009-11-06 09:41:34 UTC
That was too fast.

I can say that this kind of problem cannot be triggered with the current -115 kernel in f12 but there are other issues with this kernel like:

1. my display is too wide (1280x800) on the right side (like 2***x800) and the mouse goes out of the display on the right side
2. there is a usb-key problem with this kernel (oops) and my usbs are useless when copying large files
3. problem with intel-hda-audio (kernel-oops with alc883)

Because of theses problems I still cannot update to f12 and now I'm stuck with a closed bug and no option for update!

Comment 15 Joshua Covington 2009-11-06 12:56:50 UTC
I have to add to the previous comment that the vt-switching isn't working either with kernel -115.

Overall the current state of f12 isn't suitable to replace the current f11. Therefore I'll reopen this bug.

Comment 16 Adam Williamson 2009-11-06 20:45:34 UTC
that's not how bug reporting works. you file one bug per bug report, we fix one bug per bug report. otherwise everyone gets extremely confused.

please test with kernel -122 and xorg-x11-server 1.7.1-7 - and all other updates from today's Rawhide - and if you can reproduce the issues you mentioned in #14, please file *new* bugs for them.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 17 Joshua Covington 2009-11-07 00:14:35 UTC
Adam, this is not a fix for this bug. Pointing my repos to rawhide and updating my machine brings me new issues.

I don't think a fix should break something else. Actually I haven’t had a trouble-free experience with my rs482 since the days of the ati-6.9.0 driver. In every “new” release “new” features are introduced (new api, mesa-3d, gallium, exa optimizations, etc) and I’m supposed to have a better experience with my almost 3-years-old card.  But it still cannot work flawlessly.  There’s always something that breaks with this card.

Over the time I filed some bugs and all of them were closed with “RAWHIDE – please update to the next version”. Some of them didn’t even receive an answer. This is how I moved from f9 to f11 (skip f10 (again) because of the card issues). Now with f11 I have some “workable” combination of mesa, libdrm, ati-drv, xserver, kernel and this small issue.

Updating to the current -115 brought 320 new packages (even kde was updated :-D). I experienced x-server crashes and other kernel issues. All of this is just because of the ”possible fix” for my issue.

The truth is that this hasn’t been fixed. The new code has broken other things that worked fine before this. This is just postponing the problem.

Maybe I should file a bug titled “make this 3-years-old card finally work with linux, before you turn your eyes to the latest GPUs…”.

Comment 18 Rodd Clarkson 2009-11-07 02:41:05 UTC
adam,

taking a better look at this bug, I shouldn't have been suggesting that we close it.  I had thought it a different bug that I had posted myself.

sorry about that.


Rodd

Comment 19 Adam Williamson 2009-11-07 04:43:09 UTC
look, it's very simple. it's impossible to track multiple problems in one bug report. the tools are not designed to work for that. if you try and do that, it makes it harder to fix _any_ of the problems, let alone all of them. therefore you get sadder and your hardware doesn't get fixed.

i'm not trying to be an asshole here, i'm trying to help you. the way to have issues worked on efficiently via bugzilla is to file one bug report per issue and close one issue per bug report. that's how things work well. if you try and do anything else, pain and confusion are the only results. it's not like i'm suggesting you take a hike or jump through burning rings of fire, just please isolate separate issues into separate reports.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers