Bug 445331

Summary: Radeon - DRM locking issue when disabling/changing composite with KDE kwin window manager
Product: [Fedora] Fedora Reporter: Shawn Starr <shawn.starr>
Component: mesaAssignee: Dave Airlie <airlied>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 11CC: kmcmartin, mcepl, philipp, vanmeeuwen+fedora, xgl-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-28 10:36:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xorg gdb backtrace with drmCommandNone as breakpoint
none
Better Debug now with Mesa DRI symbols none

Description Shawn Starr 2008-05-06 08:35:51 UTC
Description of problem:

When using KDE kwin's composite effects, kernel DRM radeon driver will get stuck
with no lock being held to issue these commands.


Version-Release number of selected component (if applicable):

kernel-2.6.25-14
xorg-x11-server-Xorg-1.4.99.901-26.20080415


How reproducible:


Steps to Reproduce:
1. Enable/disable kwin composite effects 
2. Observe system is unusable from within X
  
Actual results:

System writes logs with DRM errors:

May  6 04:01:15 segfault kernel: [drm:radeon_cp_idle] *ERROR* radeon_cp_idle
called without lock held, held  0 owner f4c8c480 f6b08d00
May  6 04:01:16 segfault kernel: [drm:radeon_cp_reset] *ERROR* radeon_cp_reset
called without lock held, held  0 owner f4c8c480 f6b08d00
May  6 04:01:16 segfault kernel: [drm:radeon_cp_start] *ERROR* radeon_cp_start
called without lock held, held  0 owner f4c8c480 f6b08d00

over and over


Expected results:

Composite effects should enable/disable without driver error

Additional info:

One note: The KDE window manager I am using comes from KDE SVN trunk (4.1),
however, the kernel driver and or Xorg should never exhibit this behavour as
kwin does not do calls to DRM locking directly.

Debugging (for me) set a breakpoint in the failure path of drmCommandNone() as
per Michel Dänzer to get more info.

Comment 1 Dave Airlie 2008-05-06 09:04:53 UTC
okay I've played around with kwin from F9 on my X300 hardware and not seeing this.

please attach xorg log and conf files.


Comment 2 Matěj Cepl 2008-05-06 13:00:56 UTC
Thanks for the bug report.  We have reviewed the information you have provided
above, and there is some additional information we require that will be helpful
in our diagnosis of this issue.

Please attach your X server config file (/etc/X11/xorg.conf) and X server log
file (/var/log/Xorg.*.log) to the bug report as individual uncompressed file
attachments using the bugzilla file attachment link below.

Could you please also try to run without any /etc/X11/xorg.conf whatsoever and
let X11 autodetect your display and video card? Attach to this bug
/var/log/Xorg.0.log from this attempt as well, please.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.

Comment 3 Shawn Starr 2008-05-14 05:11:20 UTC
Created attachment 305329 [details]
Xorg gdb backtrace with drmCommandNone as breakpoint

Here is the debug from gdb with backtrace. To trigger, I adjusted kwin's
texture 
properties and then clicked apply. Then we get stuck in loop.

Comment 4 Shawn Starr 2008-05-14 05:28:41 UTC
Created attachment 305330 [details]
Better Debug now with Mesa DRI symbols

Better Xorg gdb backtrace w/ drmCommandNone set as breakpoint

Comment 5 Bug Zapper 2008-05-14 10:42:23 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Shawn Starr 2008-05-30 23:39:44 UTC
Please look at this when the r300 dri driver development picks up more. This
seriously impacts KDE 4.1 and composite effects for Fedora 10+

There's no use to enable/disable composite if it causes a DRI lock/looping issue.

Comment 7 Teodosiy Kirilov 2008-06-07 14:27:50 UTC
I've got the same problem. 
The strange thing is, that if I start compiz, and then kwin again, the problem 
is gone (I can change the compositing options without a lock)

Using F9 with a radeon 9250
kdebase-workspace-4.0.4-4.fc9.i386
xorg-x11-server-Xorg-1.4.99.901-29.20080415.fc9.i386
mesa-libGL-7.1-0.31.fc9.i386
xorg-x11-drv-ati-6.8.0-14.fc9.i386

Comment 8 Shawn Starr 2008-08-25 03:22:55 UTC
I have not reproduced this in a while. With new kernel DRM and Mesa DRI, I'd keep this open a little longer. Just to be sure...

Comment 9 Matěj Cepl 2008-09-09 20:13:02 UTC
Let's close it know and if it will happen again, you can certainly reopen it again -- this is better way how to avoid we all forgetting this bug and making mess in our bugzilla.

Comment 10 Shawn Starr 2008-09-10 02:34:14 UTC
Reopen, this is still occurring. Maybe DRI2 will help in solving this once radeon has DRI2 support?

This is rawhide.

Comment 11 Matěj Cepl 2008-09-11 22:41:55 UTC
Just for the sake of completness, please attach your X server config file (/etc/X11/xorg.conf) and X server log file (/var/log/Xorg.*.log) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link below.

Could you please also try to run without any /etc/X11/xorg.conf whatsoever and let X11 autodetect your display and video card? Attach to this bug /var/log/Xorg.0.log from this attempt as well, please.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.

Comment 12 Shawn Starr 2008-09-13 04:48:14 UTC
I will provide soon. I am triggering this so repeatedly, it is not allowing me to use composite with KDE (unless I use compiz).

Something kwin is doing when it enables composite that is causing locking. But we don't know what

Comment 13 Shawn Starr 2008-10-14 04:11:11 UTC
There is no more a locking problem but the bug remains. 

From chatting on IRC the likely solution is to modify the X server GLX to support handling the destruction of windows.

I'm afraid this one will be around a while

Comment 14 Shawn Starr 2008-10-14 04:14:54 UTC
This is the latest backtrace
============================

Program received signal SIGSEGV, Segmentation fault.
dixLookupPrivate (privates=0x15c, key=0x65c570) at privates.c:131
131         PrivateRec *rec = *privates;                         
(gdb) bt f
#0  dixLookupPrivate (privates=0x15c, key=0x65c570) at privates.c:131
        rec = <value optimized out>                                  
        ptr = <value optimized out>                                  
#1  0x00655e0e in DRIGetDrawableInfo (pScreen=0xc, pDrawable=0x9cb15a8, index=0x9777028, 
    stamp=0x9777030, X=0x9777034, Y=0x9777038, W=0x977703c, H=0x9777040,                 
    numClipRects=0x9777044, pClipRects=0xbfd68ec8, backX=0x977704c, backY=0x9777050,     
    numBackClipRects=0x9777058, pBackClipRects=0xbfd68ec4) at dri.c:1386                 
        pDRIDrawablePriv = <value optimized out>                                         
        i = <value optimized out>                                                        
#2  0x006a977e in getDrawableInfo (driDrawable=0x9777018, index=0x9777028, stamp=0x9777030, 
    x=0x9777034, y=0x9777038, width=0x977703c, height=0x9777040, numClipRects=0x9777044,    
    ppClipRects=0x9777048, backX=0x977704c, backY=0x9777050, numBackClipRects=0x9777058,    
    ppBackClipRects=0x977705c, data=0x9063628) at glxdri.c:746                              
        pScreen = (ScreenPtr) 0xc                                                           
        pClipRects = <value optimized out>                                                  
        pBackClipRects = <value optimized out>                                              
        retval = 0 '\0'                                                                     
        size = <value optimized out>                                                        
#3  0x0080ceb8 in __driUtilUpdateDrawableInfo (pdp=0x9777018) at ../common/dri_util.c:254   
        psp = (__DRIscreenPrivate *) 0x8facb20                                              
#4  0x0081223a in radeonGetLock (rmesa=0x901b750, flags=0) at radeon_lock.c:112             
        hwContext = 3                                                                       
        drawable = (__DRIdrawablePrivate * const) 0x9777018                                 
        readable = (__DRIdrawablePrivate * const) 0x9777018                                 
        sPriv = (__DRIscreenPrivate *) 0x8facb20                                            
        sarea = (drm_radeon_sarea_t *) 0xa8dd7898                                           
        __PRETTY_FUNCTION__ = "radeonGetLock"                                               
#5  0x0081cd47 in r300FlushCmdBuf (r300=0x901b750, caller=0x8410a8 "r300DestroyContext")    
---Type <return> to continue, or q <return> to quit---                                      
    at r300_cmdbuf.c:153                                                                    
        __ret = 112 'p'                                                                     
        ret = <value optimized out>                                                         
#6  0x008169c9 in r300DestroyContext (driContextPriv=0x90622d8) at r300_context.c:410       
        r300 = (r300ContextPtr) 0x901b750                                                   
        current = <value optimized out>                                                     
        __PRETTY_FUNCTION__ = "r300DestroyContext"                                          
        __FUNCTION__ = "r300DestroyContext"                                                 
#7  0x0080fc6e in radeonDestroyContext (driContextPriv=0x65c570) at radeon_screen.c:1432    
No locals.                                                                                  
#8  0x0080cd7d in driDestroyContext (pcp=0x90622d8) at ../common/dri_util.c:516             
No locals.                                                                                  
#9  0x006ab419 in __glXDRIcontextDestroy (baseContext=0x9062228) at glxdri.c:297            
No locals.                                                                                  
#10 0x006a0289 in __glXFreeContext (cx=0x9062228) at glxext.c:149                           
No locals.                                                                                  
#11 0x006a02d7 in ContextGone (cx=0x15c, id=20971551) at glxext.c:98                        
No locals.
#12 0x0806d5e2 in FreeResourceByType (id=20971551, type=51, skipFree=0) at resource.c:597
        cid = <value optimized out>
        res = (ResourcePtr) 0x9538c28
        prev = <value optimized out>
        head = <value optimized out>
#13 0x0069c5b7 in __glXDisp_DestroyContext (cl=0x9065174, pc=0x9cb0570 "\232\004\002")
    at glxcmds.c:338
        client = (ClientPtr) 0x9071e00
        gcId = 20971551
#14 0x006a063a in __glXDispatch (client=0x9071e00) at glxext.c:512
        stuff = (xGLXSingleReq *) 0x9cb0570
        opcode = 4 '\004'
        cl = (__GLXclientState *) 0x9065174
        retval = 1
#15 0x08085e8f in Dispatch () at dispatch.c:454
        result = <value optimized out>
        client = (ClientPtr) 0x9071e00
        nready = 0
        start_tick = 12040
#16 0x0806b68d in main (argc=10, argv=0xbfd69264, envp=Cannot access memory at address 0x164
) at main.c:441
        i = <value optimized out>
        error = 136257204
        xauthfile = <value optimized out>
        alwaysCheckForInput = {0, 1}

Comment 15 Bug Zapper 2008-11-26 02:14:56 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 16 Shawn Starr 2009-03-19 06:01:51 UTC
This bug is now invalid.

Comment 17 Philip Prindeville 2009-10-10 07:29:35 UTC
I'm running FC11, updated, and seeing this currently.

xorg-x11-drv-ati-6.12.2-14.fc11.x86_64

Please reopen.

Comment 18 Philip Prindeville 2009-10-10 16:00:39 UTC
(In reply to comment #17)
> I'm running FC11, updated, and seeing this currently.
> 
> xorg-x11-drv-ati-6.12.2-14.fc11.x86_64
> 
> Please reopen.  

It's worth noting that my /var/log/messages file contains 20796649 lines of error messages, or 3161090718 bytes of same.  This is a real liability.  A machine could be put into a denial-of-service state by running out of space on /var.

I'm also seeing this when using Gnome, not KDE.

There's a reasonable workaround for this:

diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 1c1b13e..1107361 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -162,7 +162,8 @@ struct drm_device;
  * \param arg arguments
  */
 #define DRM_ERROR(fmt, arg...) \
-     printk(KERN_ERR "[" DRM_NAME ":%s] *ERROR* " fmt , __func__ , ##arg)
+     if (printk_ratelimit()) \
+             printk(KERN_ERR "[" DRM_NAME ":%s] *ERROR* " fmt , __func__ , ##arg)


that will not fix the problem, but at least alleviate the issue of one's log file overflowing.

Comment 19 Jeroen van Meeuwen 2009-10-12 19:17:21 UTC
Reopening on behalf of philipp64 (question in #fedora)

Comment 20 Kyle McMartin 2009-10-20 18:29:02 UTC
This patch is silly. It means that any real DRM error can't issue more than 10 messages per five-seconds. I think this is an unacceptable cost to paper over a real bug that nobody can seem to be bothered fixing. The *kernel* isn't filling /var, crappy syslog is.

If airlied says it's ok, then that's another story, but to my untrained eyes, it seems like an undue burden

Comment 21 Kyle McMartin 2009-10-20 18:48:39 UTC
http://userweb.kernel.org/~kyle/shut-up-LOCK_TEST_WITH_RETURN.diff

would make me happier... which limits the scope to this lock test...

Comment 22 Philip Prindeville 2009-10-20 19:13:56 UTC
Also relevant:

http://bugzilla.adiscon.com/show_bug.cgi?id=37

Comment 23 Philip Prindeville 2009-10-20 19:52:29 UTC
(In reply to comment #21)
> http://userweb.kernel.org/~kyle/shut-up-LOCK_TEST_WITH_RETURN.diff
> 
> would make me happier... which limits the scope to this lock test...  

I can live with this.  Can we get it into the next FC11 kernel release as a patch?

Comment 24 Kyle McMartin 2009-10-21 05:55:38 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=1758793

try this scratch build, please.

Comment 25 Philip Prindeville 2009-10-21 18:15:41 UTC
I'm remote this week from the location where the misbehaving machine is.

Will try it late next week when I'm back on that site.

Comment 26 Chuck Ebbert 2009-10-22 19:54:27 UTC
The ratelimit patch went in 2.6.30.9-92

Comment 27 Fedora Update System 2009-11-05 05:05:53 UTC
kernel-2.6.30.9-96.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.30.9-96.fc11

Comment 28 Matěj Cepl 2009-11-05 18:19:32 UTC
Since this bugzilla report was filed, there have been several major updates in various components of the Xorg system, which may have resolved this issue. Users who have experienced this problem are encouraged to upgrade their system to the latest version of their packages. For packages from updates-testing repository you can use command

yum upgrade --enablerepo='*-updates-testing'

Alternatively, you can also try to test whether this bug is reproducible with the upcoming Fedora 12 distribution by downloading LiveMedia of F12 Beta available at http://alt.fedoraproject.org/pub/alt/nightly-composes/ . By using that you get all the latest packages without need to install anything on your computer. For more information on using LiveMedia take a look at https://fedoraproject.org/wiki/FedoraLiveCD .

Please, if you experience this problem on the up-to-date system, let us now in the comment for this bug, or whether the upgraded system works for you.

If you won't be able to reply in one month, I will have to close this bug as INSUFFICIENT_DATA. Thank you.

[This is a bulk message for all open Fedora Rawhide Xorg-related bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]

Comment 29 Fedora Update System 2009-11-06 00:03:08 UTC
kernel-2.6.30.9-96.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 30 Bug Zapper 2010-04-27 12:01:58 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 31 Bug Zapper 2010-06-28 10:36:01 UTC
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 32 Shawn Starr 2013-04-09 08:08:10 UTC
So old, close it