Bug 755892 - sometimes gnome-shell hangs and only sighup solves it
Summary: sometimes gnome-shell hangs and only sighup solves it
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: gnome-shell
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Owen Taylor
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 822481 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-22 10:47 UTC by Gianluca Cecchi
Modified: 2014-02-05 22:43 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-05 22:43:25 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output during a gdb backtrace when freeze going through (7.17 KB, application/x-gzip)
2011-11-22 10:51 UTC, Gianluca Cecchi
no flags Details
gdb of Xorg pid during freeze (8.87 KB, text/plain)
2011-11-22 13:32 UTC, Gianluca Cecchi
no flags Details
message when I switch to console window (95.88 KB, image/jpeg)
2011-11-22 14:11 UTC, Gianluca Cecchi
no flags Details
backtraces of gnome-shell and Xorg (10.39 KB, text/plain)
2012-07-27 00:42 UTC, James Livingston
no flags Details

Description Gianluca Cecchi 2011-11-22 10:47:39 UTC
Description of problem:
During normal jobs gnome-shell hangs

Version-Release number of selected component (if applicable):
3.2.1-2.fc16.x86_64

How reproducible:
Almost always 

Steps to Reproduce:
1. enter gnome session and wor for about 2-3 hours
2. the session freezes: moves and seems to be able to click on menu items, but without actual effect; no keyboard, no Alt+F2 or reload of gnome-shell possible 
3. Ctrl+Alt+F2 and in console session
kill -SIGHUP <pid_of gnome-shell process>
Ctrl+ALt+F1 to come back and all is ok again.
Sometimes the sighup causes gnome-shell to crash and losing all the desktop session,
  
Actual results:
gnome session hangs

Expected results:
To be able to work normally

Additional info:
I think I began to have this behaviour after kernel-3.1.1-1.fc16.x86_64

I'm using this option at boot (but I was using it with 3.1.0 kernel too)
i915.i915_enable_rc6=1

Also as I have a laptop (Asus U36SD) with socalled Optimus technology, I disable the nvidia discrete card in /etc/rc.d/rc.local with the command
#!/bin/bash
echo "Disabling Nvidia videa adapter..." | tee -a /var/log/nvida_disabled.log
/sbin/modprobe acpi_call
echo '\_SB.PCI0.PEG0.GFX0.DOFF' > /proc/acpi/call

acpi_call kernel module compiled myself each kernel upgrade using the source:
acpi-call_240611.orig.tar.gz

$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-3.1.1-2.fc16.x86_64 root=UUID=ce058d6c-d2ed-49e5-9869-965799f246a5 ro rd.md=0 rd.lvm=0 rd.dm=0 KEYTABLE=us quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 LANG=en_US.UTF-8 i915.i915_enable_rc6=1 elevator=deadline


00:02.0 VGA compatible controller: Intel Corporation 2nd Generation
Core Processor Family Integrated Graphics Controller (rev 09) (prog-if
00 [VGA controller])
       Subsystem: ASUSTeK Computer Inc. Device 1682
       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
       Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
       Latency: 0
       Interrupt: pin A routed to IRQ 48
       Region 0: Memory at dc400000 (64-bit, non-prefetchable) [size=4M]
       Region 2: Memory at b0000000 (64-bit, prefetchable) [size=256M]
       Region 4: I/O ports at e000 [size=64]
       Expansion ROM at <unassigned> [disabled]
       Capabilities: <access denied>
       Kernel driver in use: i915
       Kernel modules: i915

Any way to debug this?
Only rows I find in Xorg.0.log are:
[ 10911.891] (II) intel(0): Printing DDC gathered Modelines:
[ 10911.891] (II) intel(0): Modeline "1366x768"x0.0   69.30  1366 1425
1464 1472  768 773 782 785 -hsync -vsync (47.1 kHz)
[ 16044.192] (II) AIGLX: Suspending AIGLX clients for VT switch
[ 16079.534] (II) AIGLX: Resuming AIGLX clients after VT switch
[ 16079.870] (II) intel(0): EDID vendor "COR", prod id 6104
[ 16079.870] (II) intel(0): Printing DDC gathered Modelines:
[ 16079.870] (II) intel(0): Modeline "1366x768"x0.0   69.30  1366 1425
1464 1472  768 773 782 785 -hsync -vsync (47.1 kHz)
[ 16079.973] (**) Option "Device" "/dev/input/event4"
[ 16079.973] (--) synaptics: SynPS/2 Synaptics TouchPad: touchpad found
[ 16088.180] (II) AIGLX: Suspending AIGLX clients for VT switch <----
when I ctrl+Alt+F2
[ 16116.682] (II) AIGLX: Resuming AIGLX clients after VT switch <---
when I come back to X


In messages:
Nov 18 13:28:08 ope46 kernel: [15397.459003] CIFS VFS: Received no
data, expecting 4
Nov 18 13:29:08 ope46 kernel: [15457.454946] CIFS VFS: Received no
data, expecting 4
Nov 18 13:30:08 ope46 kernel: [15517.450810] CIFS VFS: Received no
data, expecting 4
Nov 18 13:40:40 ope46 gnome-session[5954]: WARNING: Application
'gnome-shell.desktop' killed by signal <--- when I run the kill
-SIGHUP command
Nov 18 13:56:09 ope46 kernel: [17077.935596] CIFS VFS: Received no
data, expecting 4
Nov 18 13:57:08 ope46 kernel: [17137.338785] CIFS VFS: Received no
data, expecting 4

Comment 1 Gianluca Cecchi 2011-11-22 10:50:24 UTC
Based on suggestions by Adam Jackson in Fedora test mailing list 
"
If you debuginfo-install gnome-shell, attach with gdb instead of sending SIGHUP, and run 'thread apply all backtrace', what do you get?
"
I ran it when the problem arose next time. I'm gong to attach the output of the session saved with the "script" command.

Comment 2 Gianluca Cecchi 2011-11-22 10:51:19 UTC
Created attachment 534992 [details]
output during a gdb backtrace when freeze going through

Comment 3 Gianluca Cecchi 2011-11-22 10:52:08 UTC
Comment from Adam after my post:

"
The interesting part seems to be:

Thread 1 (Thread 0x7fbf4aa8c9c0 (LWP 1609)):
#0  0x0000003cb7ee6443 in __GI___poll (fds=<optimized out>,
nfds=<optimized out>, timeout=<optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x0000003cbc208ba2 in ?? () from /usr/lib64/libxcb.so.1
#2  0x0000003cbc2090ff in ?? () from /usr/lib64/libxcb.so.1
#3  0x0000003cbc209184 in xcb_writev () from /usr/lib64/libxcb.so.1
#4  0x0000003cbc6456e7 in _XSend (dpy=0xb47a30, data=<optimized out>,
size=<optimized out>) at xcb_io.c:436
#5  0x0000003cbc639d55 in SendZImage (dest_scanline_pad=0,
dest_bits_per_pixel=32, req_yoffset=<optimized out>, req_xoffset=0,
image=0x7fffe21a7240,
   req=<optimized out>, dpy=0xb47a30) at PutImage.c:802

This is showing gnome-shell trying to write an image to the X server,
but blocking because the socket to the X server does not appear to be
ready for writing.

So there's (at least) three things that could be going wrong here, from
probably most to least likely:

1) the write queue to the X server really might be blocked
2) libxcb could have a logic bug that's getting stuck here
3) the kernel might have a bug in poll()

#1 typically only happens in two cases: either the X server is stuck
away from the dispatch loop, or it's explicitly ignoring you because
there's a grab in process.  In the former case SIGHUP wouldn't help,
simply reloading the shell won't un-stick the X server.  But in the
latter case, it might; if the grab is from one of the shell's other
threads, then closing all of the shell's display connections would reset
the grab.

So my next intuition would be to gdb the X server and see what's up.  If
you find it waiting patiently on a call to select(), then the second
case is more likely, and 'print AllClients' should show you an fd_set
with only one bit set.

- ajax
"

Comment 4 Gianluca Cecchi 2011-11-22 10:53:14 UTC
Comment by Alon Levy:
"
I think I have the same problem here, I've followed it once, gdbing the
server, it was in select, so maybe I'll try to do it again and do the
'print AllClients' - for me reproducing is 100% by doing a chvt /
suspend and resume. To get back to work (i.e.  workaround) I chvt to
some console, do "killall -9 gnome-shell; sleep 5; DISPLAY=:0.0
gnome-shell" and quickly change back. Recently gnome-shell started to
get unstuck occasionally if I wait about 10-20 seconds, but I'm not
always that patient.
"

Comment 5 Gianluca Cecchi 2011-11-22 10:55:44 UTC
In the mean time this morning I have applied the patch to xorg-x11-drv-intel.

xorg-x11-drv-intel-2.17.0-1.fc16.x86_64

The former was the default as shiped with F16: 2.16.0-2
I'm going to report if I still have the problem, as it normally happens 1-2 times a day...

Comment 6 Gianluca Cecchi 2011-11-22 13:31:31 UTC
After installing
# debuginfo-install xorg-x11-server-Xorg
# debuginfo-install expat libfontenc libgcc libstdc++ xorg-x11-drv-evdev xorg-x11-drv-fbdev xorg-x11-drv-intel xorg-x11-drv-synaptics xorg-x11-drv-vesa zlib

and having again the problem with 2.17.0 Intel Xorg driver, I got this with after gdb to Xorg process:
...
Loaded symbols for /lib64/libnss_files.so.2
0x00007f5e0f1f8213 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
82      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) print AllClients
$1 = {fds_bits = {140702960320512, 0 <repeats 15 times>}}
(gdb) print AllClients
$2 = {fds_bits = {140702960320512, 0 <repeats 15 times>}}

I'm going to fully attach the gdb session.

Gianluca

Comment 7 Gianluca Cecchi 2011-11-22 13:32:57 UTC
Created attachment 535024 [details]
gdb of Xorg pid during freeze

after gdb I run "print AllClients"

Comment 8 Gianluca Cecchi 2011-11-22 14:11:49 UTC
Created attachment 535041 [details]
message when I switch to console window

When I switch to a console window I currently get the message in the image.
But this happens always, not only when I'm experiencing the freeze, so I don't know if it is related or not....

Comment 9 Gianluca Cecchi 2011-11-29 10:24:11 UTC
Any news on this?
It is somehow tedious....
I think I have found some sort of correlation between applications running and problem arising...
Probably something related to remmina and instantiating rdp sessions from it.

I say this because until now I only experienced the problem when at office where I continuously use an external monitor connected through vga adapter on laptop and never at home where I'm only on the laptop display. 
So I thought this could interfere or be part of the cause...

Actually, right today I got it two times in 45 minutes while working at home where I only have the laptop display at usage...
But the new thing was that for the first time I was using remmina at home....
And I use it constantly at work....

Currently installed related packages are:
remmina-plugins-telepathy-0.9.2-2.fc15.x86_64
remmina-plugins-nx-0.9.2-2.fc15.x86_64
remmina-0.9.3-3.fc16.x86_64
remmina-plugins-common-0.9.2-2.fc15.x86_64
remmina-plugins-xdmcp-0.9.2-2.fc15.x86_64
remmina-plugins-rdp-0.9.2-2.fc15.x86_64
remmina-plugins-vnc-0.9.2-2.fc15.x86_64

Any suggestion with this new information?

Comment 10 Adam Jackson 2011-11-29 15:05:40 UTC
(gdb) print AllClients
$1 = {fds_bits = {140702960320512, 0 <repeats 15 times>}}

That's showing the server listening to more than one client, so this is not a server grab deadlock.

Comment 11 Mamoru TASAKA 2012-05-18 01:17:08 UTC
*** Bug 822481 has been marked as a duplicate of this bug. ***

Comment 12 Mamoru TASAKA 2012-05-18 01:19:44 UTC
(In reply to comment #11)
> *** Bug 822481 has been marked as a duplicate of this bug. ***

The above is from molecule (one of xscreensaver's hack).

Comment 13 James Livingston 2012-07-27 00:42:08 UTC
Created attachment 600639 [details]
backtraces of gnome-shell and Xorg

I'm seeing this a log when running Eclipse on F17. I can reproduce it semi-consistently by trying to see the Debug As sub-menu from a context menu.

gnome-shell-3.4.1-5.fc17
xorg-x11-server-Xorg-1.12.2-4.fc17


I'm attaching the backtrace (with debug symbols) of gnome-shell and Xorg, which seems to be the same as the other reported ones.

"print AllClients" reports the same as Comment #10. Is there anything else I can gather to help find the cause of this?

Comment 14 Fedora End Of Life 2013-01-16 16:25:23 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Fedora End Of Life 2013-02-13 20:04:25 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 16 Kamil Páral 2013-02-14 07:57:08 UTC
Moving to F17 per comment 13.

Comment 17 Naipaul Ojar 2013-03-05 00:04:39 UTC
I have also seen this bug both on F16 and now on F18.

I have found this issue on a number of times with kernel 3.7.1 right thru to 3.8.1 and gnome-shell-3.6.0.2

root       917     1  0 17:36 ?        00:00:00 /usr/sbin/abrtd -d -s
root       920     1  0 17:36 ?        00:00:00 /usr/bin/abrt-watch-log -F BUG corruption stack overflow protection fault WARNING: at nable to handle ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed INFO: possible recursive locking detected : nobody cared IRQ handler type mismatch /var/log/messages -- /usr/bin/abrt-dump-oops -xD
root       930     1  0 17:36 ?        00:00:00 /usr/bin/abrt-watch-log -F Backtrace /var/log/Xorg.0.log -- /usr/bin/abrt-dump-xorg -xD
nojar     2421  2054  0 17:36 ?        00:00:00 abrt-applet

Comment 18 Rui Gouveia 2013-06-10 19:46:15 UTC
Hi,

Is happening 3/4 times a day with:

[root@localhost log]# cat /etc/fedora-release 
Fedora release 18 (Spherical Cow)

[root@localhost log]# uname -a
Linux localhost.localdomain 3.9.4-200.fc18.x86_64 #1 SMP Fri May 24 20:10:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost log]# rpm -qa xorg-x11`*

[root@localhost log]# rpm -qa xorg-x11\*
xorg-x11-server-Xephyr-1.13.3-3.fc18.x86_64
xorg-x11-drv-synaptics-1.6.3-3.fc18.x86_64
xorg-x11-server-Xorg-1.13.3-3.fc18.x86_64
xorg-x11-fonts-misc-7.5-6.fc18.noarch
xorg-x11-utils-7.5-7.fc18.x86_64
xorg-x11-drv-cirrus-1.5.1-3.fc18.x86_64
xorg-x11-drv-openchrome-0.3.3-1.fc18.x86_64
xorg-x11-drv-vesa-2.3.2-2.fc18.x86_64
xorg-x11-drv-void-1.4.0-12.fc18.x86_64
xorg-x11-drv-intel-2.21.8-1.fc18.x86_64
xorg-x11-xkb-utils-7.7-5.fc18.x86_64
xorg-x11-drv-vmware-12.0.2-3.20120718gite5ac80d8f.fc18.x86_64
xorg-x11-font-utils-7.5-11.fc18.x86_64
xorg-x11-xauth-1.0.7-2.fc18.x86_64
xorg-x11-drv-qxl-0.0.22-5.20120718gitde6620788.fc18.x86_64
xorg-x11-drv-wacom-0.16.1-2.fc18.x86_64
xorg-x11-server-common-1.13.3-3.fc18.x86_64
xorg-x11-drv-fbdev-0.4.3-3.fc18.x86_64
xorg-x11-drv-ati-7.1.0-5.20130408git6e74aacc5.fc18.x86_64
xorg-x11-drv-evdev-2.7.3-5.fc18.x86_64
xorg-x11-xinit-1.3.2-7.fc18.x86_64
xorg-x11-drv-ast-0.97.0-2.fc18.x86_64
xorg-x11-drv-nouveau-1.0.7-1.fc18.x86_64
xorg-x11-drv-mga-1.6.2-6.fc18.x86_64
xorg-x11-server-utils-7.5-16.fc18.x86_64
xorg-x11-fonts-Type1-7.5-6.fc18.noarch
xorg-x11-drv-dummy-0.3.6-2.fc18.x86_64
xorg-x11-glamor-0.5.0-5.20130401git81aadb8.fc18.x86_64
xorg-x11-drv-vmmouse-13.0.0-1.fc18.x86_64

Please, let me know if more information is needed.

Thank you

Comment 19 Rui Gouveia 2013-06-10 19:49:52 UTC
Hi,

Again again. Forgot to say. This started after upgrading from F17 to F18 with fedup. 

Thanks.

Comment 20 Fedora End Of Life 2013-07-04 06:23:49 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Gianluca Cecchi 2013-07-04 06:30:15 UTC
I still have this problem.
I noticed that both in fedora 17 and fedora 18 it happens more often when using remmina to connect via rdp to a windows server.
Using remmina I can reproduce at least once an hour..
Can anyone else confirm if they are using remmina too?

Comment 22 Samuel Sieb 2013-07-04 06:38:02 UTC
I use remmina sometimes.  Fullscreen mode (unscaled) causes some really weird interactions with gnome-shell, but I don't recall it actually hanging it.  Maybe crashing it though.  I will try it again and see what happens.

Comment 23 Fedora End Of Life 2013-12-21 14:58:18 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Fedora End Of Life 2014-02-05 22:43:25 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.