Bug 1217844

Summary: (ati) Plasma 5 Screen Freezes, dri3/xcb deadllock?
Product: [Fedora] Fedora Reporter: Gerald Cox <gbcox>
Component: xorg-x11-drv-atiAssignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 22CC: awilliam, danofsatx, ed.greshko, erecio, gbcox, germano.massullo, hansecke, jgrulich, kde-sig, kevin, mike, mkyral, mwoehlke.floss, pahan, paul.lipps, rdieter, rmj, robatino, satellitgo, than, xgl-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-19 22:57:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1193742    
Bug Blocks:    
Attachments:
Description Flags
Plasmashell backtrace with full debug symbols installed
none
A frozen plasma display
none
Output from gdb with frozen plasmashell
none
a good gdb.txt while plasma frozen
none
gdb of frozen state
none
gdb of frozen state - 15228
none
gdb frozen - for comparison with 1063552
none
gdb plasmashell freeze - after sitting overnight none

Description Gerald Cox 2015-05-01 21:22:14 UTC
Description of problem:
Plasma screen freezes at random times.  You can move the mouse pointer around, but nothing else works.  The system is still running and I'm able to ssh in from another system.  To recover I reboot via:  /sbin/shutdown now -r

Version-Release number of selected component (if applicable):
plasma-workspace-5.3.0-3.fc22.x86_64
plasma-desktop-5.3.0-3.fc22.x86_64


How reproducible:
I haven't been able to figure out what is causing this behavior.


Steps to Reproduce:
N/A... system freezes at random times.

Actual results:
Screen freezes


Expected results:
Normal operation


Additional info:
I did find this bug report for Kubuntu which seems to describe the same issue:
https://bugs.launchpad.net/kubuntu-ppa/+bug/1384512
I did see another bug report describing plasma freezes, but it appeared specific to intel display drivers, I'm running amd.
https://bugzilla.redhat.com/show_bug.cgi?id=1193742

Comment 1 Gerald Cox 2015-05-01 21:44:23 UTC
Video Information follows:

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn PRO [Radeon HD 7850] (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device 042c
        Flags: bus master, fast devsel, latency 0, IRQ 62
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at fe400000 (64-bit, non-prefetchable) [size=256K]
        I/O ports at b000 [size=256]
        Expansion ROM at fe440000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [270] #19
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] #13
        Capabilities: [2d0] #1b
        Kernel driver in use: radeon
        Kernel modules: radeon

Comment 2 Rex Dieter 2015-05-01 23:44:35 UTC
does this help:

ALT-F2 (krunner):  kquitapp plasmashell ; plasmashell

if not, try harder:

ALT-F2 : killall plasmashell; plasmashell

Comment 3 Gerald Cox 2015-05-02 22:47:57 UTC
Rex,

Prior to your posting I tried rm -rf ~/.cache as suggested in  
http://bugs.kde.org/show_bug.cgi?id=338999
and the problem has yet to re-occur.  If it does, I will try your suggestion.

I remember a few years ago I had some issues upgrading between kde versions and had to remove .kde and .cache - this looks like it might be along the same lines... weird...

Comment 4 Rex Dieter 2015-05-02 22:56:53 UTC
Thanks, closing->worksforme (for now).

fwiw, broken ~/.cache files can also happen when one uses 'sudo <kdeapp>' which can cause ownership/permissions problems too.

(Qt is working on a solution to avoid that problem)

Comment 5 Gerald Cox 2015-05-06 16:30:24 UTC
Rex,

Well, I spoke prematurely... the problem is still occurring, albeit not as frequently.  At times I can release with an ALT-TAB, but just now I needed to
resort to your suggestion of:

ALT-F2 : kill all plasmashell ; plasmashell
which released the hang.  

Do you have a suggestion I could use to try to gather some information that could be used to get this fixed?

Thanks!

Comment 6 Rex Dieter 2015-05-06 17:34:33 UTC
Try to get a backtrace of the stuck plasmashell process.

means first,

debuginfo-install plasma-workspace

Then, when/if it freezes, open a (root) shell , run

$ gdb
...
(gdb) attach <processid_of_plasmashell>
...
(gdb) thread apply all bt

and post the output.

Comment 7 Dan Mossor [danofsatx] 2015-05-07 20:07:05 UTC
I am tempted to reopen this one, but I can't totally match my problems with this one. I will keep monitoring it, however.

Comment 8 Dan Mossor [danofsatx] 2015-05-09 20:04:25 UTC
System is still hanging, even with qt5-qtdeclaritive update applied.

Symptoms: I locked the session manually with ctrl-alt-L at 16:20 local time. plasmashell hung at 16:35 local time. I know the shell hung at 16:35 as that is the time displayed on the unresponsive lock screen. There are no entries in /var/log/Xorg.0.log or in ~/.xsession-errors. The only information available is in the attached backtrace.


There is no entry in the journal other than the lock event:

May 08 16:20:26 dmfedora.rez.lcl org.kde.kglobalaccel[1721]: kglobalaccel-runtime: Got XKeyPress event
May 08 16:20:26 dmfedora.rez.lcl org.kde.kglobalaccel[1721]: kglobalaccel-runtime: "Ctrl+Alt+L" = "Lock Session"
May 08 16:25:26 dmfedora.rez.lcl dbus[836]: [system] Activating service name='org.kde.powerdevil.backlighthelper' (using servicehelper)
May 08 16:25:26 dmfedora.rez.lcl dbus[836]: [system] Successfully activated service 'org.kde.powerdevil.backlighthelper'
May 08 16:27:56 dmfedora.rez.lcl dbus[836]: [system] Activating service name='org.kde.powerdevil.backlighthelper' (using servicehelper)
May 08 16:27:56 dmfedora.rez.lcl dbus[836]: [system] Successfully activated service 'org.kde.powerdevil.backlighthelper'
May 08 16:30:26 dmfedora.rez.lcl dbus[836]: [system] Activating service name='org.kde.powerdevil.backlighthelper' (using servicehelper)
May 08 16:30:26 dmfedora.rez.lcl dbus[836]: [system] Successfully activated service 'org.kde.powerdevil.backlighthelper'
May 08 16:31:13 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 1
May 08 16:31:13 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 1
May 08 16:31:15 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 1
May 08 16:31:15 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 2
May 08 16:46:15 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 1
May 08 16:46:15 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 1
May 08 16:46:16 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 1
May 08 16:46:16 dmfedora.rez.lcl sssd_be[879]: GSSAPI client step 2

Comment 9 Dan Mossor [danofsatx] 2015-05-09 20:05:40 UTC
Created attachment 1023846 [details]
Plasmashell backtrace with full debug symbols installed

Comment 10 Fedora Blocker Bugs Application 2015-05-09 20:11:32 UTC
Proposed as a Blocker for 22-final by Fedora user dmossor using the blocker tracking app because:

 Proposed as a conditional blocker as it will cause *all* desktop tasks to fail if the desktop doesn't work. Applicable to Intel and NVIDIA GPUs with no real idea of what is happening from upstream developers. Plasma 5 cannot be released with this condition that makes it unusable for the users.

Comment 11 Rex Dieter 2015-05-09 21:21:05 UTC
Same backtrace that the xorg/dri3 freedesktop.org bug talks about essentially,
http://bugs.freedesktop.org/show_bug.cgi?id=84252

Comment 12 Dan Mossor [danofsatx] 2015-05-09 21:51:27 UTC
So, should this be redirected to an xorg component then?

Comment 13 Rex Dieter 2015-05-09 23:01:09 UTC
I think mesa, at least that's the code being touched in the candidate patch in the freedesktop bug,
https://bugs.freedesktop.org/attachment.cgi?id=115200

Comment 14 Rex Dieter 2015-05-10 12:34:33 UTC
Based on the backtrace this is indeed a dup of previous bug #1193742 which is currently assigned to intel driver, I'll move that one over to mesa and mark this as dependant.

Comment 15 Dan Mossor [danofsatx] 2015-05-11 19:39:40 UTC
Discussed at the 2015-05-11 blocker review meeting[0]. Voted as punt (delay decision) on blocker, AcceptedFreezeException.

AGREED: 1217844 - punt (delay decision) on blocker, AcceptedFreezeException - this certainly seems bad enough to be worth fixing during freeze, we will wait for a bit more data on what is causing it and how frequently it is encountered (especially when booted live) to make a determination on blocker status (adamw, 17:41:42)

[0]: http://meetbot.fedoraproject.org/fedora-blocker-review/2015-05-11/

Comment 16 Ed Greshko 2015-05-11 23:56:59 UTC
I believe I am seeing either the same thing or similar plasma freeze on within a VM.  When I start firewall-config I get the dialog for entering the password but then a transparent window appears in the notifications area and everything on the systray is non-responsive.

If you want, I could supply gdb output...but not being a gdb user, please tell me how to save the output to a file for attaching.  I tried the procedure in comment #6 but it seems there is quite a bit of output and I keep getting prompted to continue to quit.

I've added an attachment showing the screen when things are frozen.

Comment 17 Ed Greshko 2015-05-11 23:58:00 UTC
Created attachment 1024401 [details]
A frozen plasma display

Comment 18 Gerald Cox 2015-05-12 03:46:40 UTC
(In reply to Fedora Blocker Bugs Application from comment #10)
> Proposed as a Blocker for 22-final by Fedora user dmossor using the blocker
> tracking app because:
> 
>  Proposed as a conditional blocker as it will cause *all* desktop tasks to
> fail if the desktop doesn't work. Applicable to Intel and NVIDIA GPUs with
> no real idea of what is happening from upstream developers. Plasma 5 cannot
> be released with this condition that makes it unusable for the users.

Just to be clear, this isn't just Intel and NVIDIA... I have AMD Radeon and have been experiencing the issue.  See comment #1.

As of late however, I have been able to clear the freeze condition by using ALT-TAB as described in the KDE Tracker.

Comment 19 Ed Greshko 2015-05-12 04:27:27 UTC
Created attachment 1024450 [details]
Output from gdb with frozen plasmashell

Of course I could use google to learn how to save gdb output to a file.  :-)

Comment 20 Rex Dieter 2015-05-12 10:10:22 UTC
Seems to be missing debuginfo, did you do the debuginfo-install step?

Comment 21 Ed Greshko 2015-05-12 11:58:16 UTC
Yes...  But before creating the crash log I did an "update" and enabled the wrong repo.  I'll redo soon....

Comment 22 Ed Greshko 2015-05-12 12:08:20 UTC
Created attachment 1024555 [details]
a good gdb.txt while plasma frozen

This one should be good.  Enabled the debuginfo repo for testing this time...

Comment 23 Rex Dieter 2015-05-12 13:51:57 UTC
Offhand, this one looks very different to the other linked bug, thread 1 seems to be the only one not polling/waiting:


Thread 1 (Thread 0x7f7b817a4900 (LWP 1478)):
#0  0x00007f7b7c131220 in qstrcmp(QByteArray const&, QByteArray const&)@plt ()
   from /lib64/libKF5ConfigCore.so.5
#1  0x00007f7b7c13b724 in operator< (k2=..., k1=...)
    at ../../../src/core/kconfigdata.h:132
#2  qMapLessThanKey<KEntryKey> (key2=..., key1=...)
    at /usr/include/qt5/QtCore/qmap.h:67
#3  lowerBound (akey=..., this=<optimized out>)
    at /usr/include/qt5/QtCore/qmap.h:131
#4  QMapData<KEntryKey, KEntry>::findNode (this=<optimized out>, akey=...)
    at /usr/include/qt5/QtCore/qmap.h:287
#5  0x00007f7b7c13df2d in constFind (akey=..., this=0x16dcaa8)
    at /usr/include/qt5/QtCore/qmap.h:827
#6  find (akey=..., this=0x16dcaa8) at /usr/include/qt5/QtCore/qmap.h:834
#7  KEntryMap::findEntry (this=this@entry=0x16dcaa8, group=..., key=..., 
    flags=..., flags@entry=...) at ../../../src/core/kconfigdata.cpp:74
#8  0x00007f7b7c134497 in KConfigPrivate::lookupData (this=0x16dca80, group=..., 
    key=key@entry=0x5c630a8 "Natural_hint-bottom-margin__1Size", flags=..., 
    flags@entry=...) at ../../../src/core/kconfig.cpp:952
#9  0x00007f7b7c14f959 in KConfigGroup::readEntry (
    this=this@entry=0x7fff16573850, 
    key=key@entry=0x5c630a8 "Natural_hint-bottom-margin__1Size", aDefault=...)
    at ../../../src/core/kconfiggroup.cpp:730
#10 0x00007f7b7f81f1d6 in readEntry<QRectF> (defaultValue=..., 
    key=<optimized out>, this=0x7fff16573850)
    at /usr/include/KF5/KConfigCore/kconfiggroup.h:723
#11 readEntry<QRectF> (aDefault=..., key=..., this=0x7fff16573850)
    at /usr/include/KF5/KConfigCore/kconfiggroup.h:248
#12 Plasma::Theme::findInRectsCache (this=0x3db4ed0, image=..., element=..., 
    rect=...) at ../../../src/plasma/theme.cpp:350
#13 0x00007f7b7f816452 in elementRect (elementId=..., this=0x3db5080)
    at ../../../src/plasma/svg.cpp:517
#14 Plasma::Svg::hasElement (this=<optimized out>, elementId=...)
    at ../../../src/plasma/svg.cpp:869
#15 0x00007f7b7f80824b in Plasma::FrameSvgPrivate::updateSizes (this=0x3db3c90)
    at ../../../src/plasma/framesvg.cpp:1030
#16 0x00007f7b7f809f98 in Plasma::FrameSvg::resizeFrame (this=0x3db4d10, 
    size=...) at ../../../src/plasma/framesvg.cpp:380
#17 0x00007f7b5a3c7b2f in Plasma::FrameSvgItem::geometryChanged (this=this@entry=
    0x3db3890, newGeometry=..., oldGeometry=...)

Comment 24 Martin Kyral 2015-05-14 13:33:19 UTC
I experience this kind of freezes too. Restarting kwin helps: switch to tty and from here pkill kwin ; DISPLAY=:0 kwin

Comment 25 Mike Chambers 2015-05-17 05:45:02 UTC
I have also experienced these freezes and this is on an nvidia graphics card with nouveau driver.

Comment 26 Rex Dieter 2015-05-17 12:14:52 UTC
re-linking often related mesa bug #1193742

Comment 27 Mike Chambers 2015-05-23 17:00:45 UTC
I don't know if a fix has been found, or if this just works better.

But instead of a fresh install, I did a fresh install of F21, fully updated it, then did a dnf update to F22, now fully updated, and so far no issues with screen freezing up.

Comment 28 Paul Lipps 2015-05-24 01:51:13 UTC
I'm getting the freezes and well with an nvidia graphics card using the nouveau driver.

Comment 29 Gerald Cox 2015-05-26 00:15:14 UTC
Created attachment 1029696 [details]
gdb of frozen state

created by:
gdb
(gdb) attach <processid_of_plasmashell>
(gdb) thread apply all bt

sorry about the formatting.  I cut and pasted what was in the terminal window.  I had to ssh into the machine.  When I tried the ALT-F2 which usually works, this time it didn't.  

Hopefully, this will provide a clue as to what is happening.

Comment 30 Rex Dieter 2015-05-26 00:45:14 UTC
The last backtrace is the intel/dri3 thing, see bug #1193742 , and more importantly, linked bug #1223477 that references an xorg-x11-intel-drv update,
https://admin.fedoraproject.org/updates/FEDORA-2015-8616

Comment 31 Gerald Cox 2015-05-26 01:35:01 UTC
(In reply to Rex Dieter from comment #30)
> The last backtrace is the intel/dri3 thing, see bug #1193742 , and more
> importantly, linked bug #1223477 that references an xorg-x11-intel-drv
> update,
> https://admin.fedoraproject.org/updates/FEDORA-2015-8616

I had already installed your update to libxcb a few days ago... and unfortunately, I'm not using xorg-x11-drv-intel, I'm using xorg-x11-drv-ati...

Comment 32 Gerald Cox 2015-05-26 01:38:04 UTC
(In reply to Gerald Cox from comment #31)
> 
> I had already installed your update to libxcb a few days ago... 
correction, found that I also have the i686 version on my system and it wasn't also upgraded... I just did that... hopefully, that will help...

Comment 33 Rex Dieter 2015-05-26 01:50:26 UTC
Interesting,

#0  0x000000368100c530 in pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
																																																											#1  0x000000368500a3d9 in _xcb_conn_wait () at /lib64/libxcb.so.1
																																																											#2  0x000000368500b651 in xcb_wait_for_special_event () at /lib64/libxcb.so.1
																																																											#3  0x000000368fc4ecd2 in dri3_find_back (c=c@entry=0x21da940, priv=priv@entry=0x5ed6770) at dri3_glx.c:1263
																																																											#4  0x000000368fc4fa9f in dri3_get_buffers (driDrawable=<optimized out>, loaderPrivate=0x5ed6770, buffer_type=dri3_buffer_back, format=4099) at dri3_glx.c:1289
																																																											#5  0x000000368fc4fa9f in dri3_get_buffers (driDrawable=<optimized out>, format=4099, stamp=0x64804b0, loaderPrivate=0x5ed6770, buffer_mask=<optimized out>, buffers=0x7ffc0075a900) at dri3_glx.c:1466
																																																												#6  0x00007fa96d7509df in dri2_allocate_textures (statts_count=<optimized out>, statts=<optimized out>, images=<optimized out>, drawable=<optimized out>) at dri2.c:254
																																																												#7  0x00007fa96d7509df in dri2_allocate_textures (ctx=0x22c1c90, drawable=0x64804b0, statts=0x5b9e260, statts_count=2) at dri2.c:377

I thought I had been assured that the intel driver was the only one using dri3, though maybe I'm misreading the backtrace here.

The symptoms are very similar, it's a deadlock in xcb's xcb_wait_for_special_event call?

Let's triage this one to ati then.

Comment 34 Adam Williamson 2015-06-10 22:31:40 UTC
Clearing F22 accepted / nominated freeze exception status as F22 has shipped, per https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Trackers . You may nominate as an Alpha, Beta or Final freeze exception for F23 if desired using the web application - https://qa.fedoraproject.org/blockerbugs/propose_bug (though it is not currently set up for F23) - or by marking the bug as blocking AlphaFreezeException, BetaFreezeException, or FinalFreezeException.

Comment 35 Gerald Cox 2015-08-06 15:54:20 UTC
Is there any more information that I can provide so this can be worked?  It's getting a bit annoying to need to hit ALT/F2 several times so I can enter:
"killall plasmashell ; plasmashell" several times a day.  This has been going on for three months now....

Comment 36 Rex Dieter 2015-08-06 18:12:39 UTC
Now that
https://admin.fedoraproject.org/updates/FEDORA-2015-10920/libxcb-1.11-8.fc22
went to stable updates, can you make sure you have that, and get a fresh backtrace?

That update should have fixed at least one of the causes of the deadlocks reported here.

Comment 37 Gerald Cox 2015-08-06 20:24:04 UTC
Thanks Rex, will do...

Comment 38 Gerald Cox 2015-08-16 16:23:33 UTC
Rex,
Well, I forgot to update all the debug info, so this run wasn't any good, however 
I got this strange message:

(gdb) attach 32086

Attaching to process 32086

/usr/bin/plasmashell (deleted): No such file or directory.


Was this because I had a debug info mismatch... seems like a strange message...

Comment 39 Rex Dieter 2015-08-16 18:50:49 UTC
> /usr/bin/plasmashell (deleted): No such file or directory.

means you (probably) upgraded the package owning /usr/bin/plasmashell , since the last time plasmashell started

Comment 40 Gerald Cox 2015-08-16 20:12:24 UTC
Created attachment 1063552 [details]
gdb of frozen state - 15228

Hope this helps... I've got everything setup to do more traces if needed.

Comment 41 Gerald Cox 2015-08-17 05:30:43 UTC
Created attachment 1063623 [details]
gdb frozen - for comparison with 1063552

Hope this helps...

Comment 42 Gerald Cox 2015-08-17 16:52:17 UTC
Created attachment 1064015 [details]
gdb plasmashell freeze - after sitting overnight

Additional backtrace for comparison purposes.

Comment 43 Gerald Cox 2015-08-17 16:54:01 UTC
If there is anything more I can do to help track this down, please let me know.

Comment 44 Gerald Cox 2015-10-20 12:33:58 UTC
Removed reference to upstream 338999 which has been marked as WONTFIX since it contains multiple issues.

Lets keep the focus of this bug on the plasmashell freeze issue which I originally reported.

Added 354126 upstream which reflects only the Plasmashell freeze issue.

Comment 45 Gerald Cox 2015-10-20 12:52:21 UTC
Rex, there was a reply to a comment:
https://bugs.freedesktop.org/show_bug.cgi?id=84252#c58

Did you do this?  If so, could you reference the bug number in the external tracker.

Comment 46 Rex Dieter 2015-10-20 12:56:03 UTC
Ideally, the bug should be filed by someone experiencing the bug, so that they can give feedback.

Comment 47 Gerald Cox 2015-10-20 13:14:57 UTC
(In reply to Rex Dieter from comment #46)
> Ideally, the bug should be filed by someone experiencing the bug, so that
> they can give feedback.

I wasn't aware that it was being requested.  I wasn't on copy to that bug, you asked the question and they replied to you.  If you wanted me to do it, it would have been nice for you to tell me about it back in July.

Comment 48 Rex Dieter 2015-10-20 13:18:17 UTC
Sorry. :(

Comment 49 Gerald Cox 2015-10-20 13:22:05 UTC
No worries.  I get frustrated sometimes which is my bad.  I really do very much appreciate all the work you do for KDE and for that matter Fedora.

Comment 50 Gerald Cox 2015-10-20 13:22:30 UTC
No worries.  I get frustrated sometimes which is my bad.  I really do very much appreciate all the work you do for KDE and for that matter Fedora.

Comment 51 Gerald Cox 2016-04-19 22:57:49 UTC
No longer occurs with the latest kde, plasma, qt updates.  Closing.