Bug 751641

Summary: gdm hangs on user switch
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: gdmAssignee: Ray Strode [halfline] <rstrode>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 18CC: benjavalero, collura, jason, jon.dufresne, kelevel+redhat, kevin.russell, kirtis.bakalarczyk, michael_stevens, quentin, rstrode, runekl, scattol, txn2tahx3v, ufospoke
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-05 11:51:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Screenshot with gdm freezed
none
Server log of possibly hung Xorg process
none
gdm logs when gdm looses Xdisplays and eventually freezes
none
Xorg logs related to when gdm looses Xdisplays and eventually freezes
none
script to delete Xorgs (display higher than 5) to alleviate the issue of leaked Xorgs caused by thei bug
none
log files for display :32 from /var/log/gdm none

Description Bojan Smojver 2011-11-06 22:33:37 UTC
Description of problem:
I have two users (both using fallback mode, if it matters). Occasionally, when switch user is attempted, gdm paints the frame of the box in the middle of the screen where user names are supposed to go and hangs. I can still do Ctrl + Alt + Fn to go to the session of the user that was logged in, type in the password and log in.

Version-Release number of selected component (if applicable):
gdm-3.2.1.1-6.fc16.x86_64

How reproducible:
Sometimes.

Steps to Reproduce:
1. Switch user.

  
Actual results:
gdm (best guess) hangs.

Expected results:
Shouldn't hang, but display users and continue.

Additional info:

Comment 1 Rune Kleveland 2011-11-07 07:20:58 UTC
I have the same problem. It works to switch user after reboot, but after using the system for a little while the dialogue showing the available users become empty.  Pressing ctrl-alt-f2 and usig alt-arrow keys takes me back to the user i tried to swotch from.

gdm-3.2.1.1-8.fc16.x86_64

Comment 2 Kirtis Bakalarczyk 2011-11-08 16:13:07 UTC
I'm having this problem as well.  The login screen shows my user icon but won't allow any input.  I can click the power icon in the top right corner and it 'lights up' indicating that the menu should be appearing, but nothing happens.

Comment 3 cblaauw 2011-11-09 11:39:33 UTC
Same here, CRTL-ALT-BACKSPACE solves it for me.

When hung GDM shows just one user picture instead of a list and does not accept any input.

Comment 4 cblaauw 2011-11-14 18:15:06 UTC
One more thing...

at times when awakening from suspend the computer asks for the password but it does not accept any input (keyboard or mouse) CTRL-ALT-BACKSPACE solves this also, but seems not to be the correct way of handling this.

Comment 5 Jason Brooks 2011-11-26 20:37:19 UTC
I'm experiencing the same bug -- have to hit ctrl-alt-backspace to get back in business.

Comment 6 Benjamín Valero Espinosa 2011-12-03 10:03:19 UTC
Created attachment 539936 [details]
Screenshot with gdm freezed

This is a screenshot. For me, Ctrl + Alt + Backspace also solves the problem for me.

Comment 7 Benjamín Valero Espinosa 2011-12-03 14:47:30 UTC
(In reply to comment #6)
> Created attachment 539936 [details]
> Screenshot with gdm freezed
> 
> This is a screenshot. For me, Ctrl + Alt + Backspace also solves the problem
> for me.

Ctrl + Alt + F2 also works sometimes.

Comment 8 Mike Stevens 2011-12-07 03:41:55 UTC
Created attachment 541684 [details]
Server log of possibly hung Xorg process

Comment 9 Mike Stevens 2011-12-07 03:45:31 UTC
I've noticed that when the switch user hangs, there are two Xorg processes running.  The first was running the active user, while the second server (actually started a day earlier), seemed to be hung.  Killing the second Xorg process allowed switch user to proceed.

Comment 10 collura 2012-03-21 06:00:43 UTC
having similar experiences with logging out after running the update-gui (doesnt seem to happen after command line yum)

  https://bugzilla.redhat.com/show_bug.cgi?id=804361

asks for admin password to logout, if freezes excpet for mouse then alt-f2 'restart' to restart window manager sometimes works but sometimes need ctl-alt-backspace to restart session

Comment 11 scattol 2012-09-18 10:12:40 UTC
This bug is annoying. This is occuring multiple times a week on my fedora server.

Comment 12 scattol 2012-10-09 07:43:31 UTC
this is happening multiple times a day. Doing "ps -ef |grep gdm" will give 3 or more pages of processes with up to 30 displays. It occurs frequently and the restart works but it eats up the machine like crazy. Can't keep the machine up for a week.

Comment 13 scattol 2012-10-09 07:45:20 UTC
the machine is recent with NVIDIA drivers. It's definitely not to slow.

Comment 14 Jon Dufresne 2012-11-08 05:36:50 UTC
I recently upgraded to F18 and this is still happening frequently. Bumping version.

Comment 15 scattol 2012-11-09 02:26:57 UTC
looks like once it starts happening, it happens with every subsequent switch user done through GDM with extra Xorg display being created at each switch user. 

Furthermore each leaked Xorg uses the next higher display number (I am currently at display :45) the TTYs attached to each Xorg process isn't leaking though those seem to get recycled as the highest one I have is tty8

Currently killing the process tree structure starting at the parent of each leaked Xorg process frees the memory and keeps the machine up and running.

Comment 16 scattol 2012-11-09 02:36:50 UTC
It looks like the reason for the machine to freeze is that it eventually runs out of memory. Each Xorg process tree does take RAM and at the beginning of the characterising this issue I did notice that after the machine running a while the swap file was  used near capacity and the machine swapped more than it used to. It's my conjecture that the gdm eventually freezes on the OS lacking virtual memory and something seizes up.

Comment 17 scattol 2012-11-18 18:44:15 UTC
It looks like the extra Xorg processes (the ones that leak) appear when someone switches back into their account they already have running. And that behavior starts immediately after reboot. So it's not like it works flawlessly then something snaps and then it's in error. This condition exists from the start.

Don't know what the extra Xorg does but you are put back into your environment suggesting that, in the end, you do reconnect with the Xorg you originally logged in. Furthermore deleting the extra Xorgs does process tree does not cause problem for the exiting logged in users suggesting that these Xorgs are extra

Comment 18 scattol 2012-11-18 18:56:59 UTC
It looks like running out of ram or swap might not be the actual issue as I just had a freeze when the machine was clean thanks to a cronjob deleting the Xorg process trees. Looks like it get get locked up on it's own accord.

Comment 19 scattol 2012-11-18 19:09:31 UTC
Requesting severity raised to HIGH considering how much resources are leaked and how quickly essentially rendering this feature nearly useless.

Comment 20 scattol 2012-11-18 19:12:54 UTC
Is this related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=799671 Steps are different but in both cases and in a matter of days gdm is rendered useless by essentially logging in and out

Comment 21 scattol 2012-11-18 19:25:01 UTC
Created attachment 647274 [details]
gdm logs when gdm looses Xdisplays and eventually freezes

Here are the gdm logs for the last while. The machine froze on Nov 18 2012, so this is the day I packaged the logs. This issue would have been occuring all week before that date

Comment 22 scattol 2012-11-18 19:27:29 UTC
Created attachment 647275 [details]
Xorg logs related to  when gdm looses Xdisplays and eventually freezes

these are the Xorg logs that are related to the gdm logs uploaded at the same time. Note that the displays active with real users on them are 0 to 5. Displays above 5 are extranious and are deleted nightly by a cronjob. Presumably the logs reflect this but that's the result of a workaround not of the problem

Comment 23 scattol 2012-11-18 19:34:23 UTC
repro steps are straight forward:
   log in users (I have are 5)
   switch between users by using the switch user dropdown menu entry from the name on top left corner of the menu bar
   log as another of the users in the list of users
   doing this enough times eventually the display will not respond
   CTL-ALT-BACKSPACE will kill the display and restart but eventually it's going to fail and the machine needs to be hard reset. And by eventually I mean after a few days where there are probably 5 or 10 user switches per day.

Comment 24 Frédéric 2012-11-29 17:43:50 UTC
I also have this bug with F17 and this seems to be worse with kernel 3.6.7. I is very annoying, I have to hard reboot very often. My children keep telling me that linux is full of bugs and that I should use windows! Please do something Gromit! If I can help debugging this, I would be delighted to do but I do.

Comment 25 scattol 2012-11-30 00:07:58 UTC
Created attachment 654660 [details]
script to delete Xorgs (display higher than 5) to alleviate the issue of leaked Xorgs caused by thei bug

This is a script to delete the Xorg that have display number greater than 5 that (in my use case) are duplicate leaked by this bug.

run this script as a cron job once a night. 

The usecase is to reboot (or do "init 3" followed by "init 5" then log all the users in let this go. 

ER: find a way to determine which display is not attached to a log users.

Comment 26 scattol 2012-11-30 00:10:47 UTC
@frederic Bron

I've uploaded the script that I use to clean up the extra Xorg. It delays the problem but does not fix it. 

It looks like more gnome processes are leaked and not cleaned by the script. It relies on shell script killtree.sh found on the web.

Comment 27 scattol 2012-12-01 05:51:58 UTC
The above strategy doesn't fix the problem. It just delays it. When keyboard freeze up. The workaround is to telnet from another machine and issue an "init 2" command as root and then an "init 5" the display will be working again

Comment 28 Frédéric 2012-12-01 06:00:28 UTC
Is it possible to remove gdm and use another program that works? For me I have the problem everyday after only a few minutes of use as soon as I switch user without closing the first opened session.
I also get a freeze after password typing just when the wallpaper appears (this time Ctrl+Alt+Backspace works).

Any idea if this is specific to fedora? any idea how ubuntu works?

Comment 29 scattol 2012-12-01 13:38:06 UTC
in /var/log/gdm it contains log files for all the X displays opened. In my case the high numbered ones are all associated with this problem as there never are any users. 

In the :32-greeter.log we see the following error:
    Fatal IO error 11 (Resource temporarily unavailable) on X server :32.


this appears twice in the file and it's sometimes followed by:
      Window manager warning: Log level 16: gnome-shell: Fatal IO error 0 (Success) on X server :32. 



In some of them we also see the following error:
   (gnome-settings-daemon:2537): media-keys-plugin-WARNING **: Unable to get default sink


It's unclear if this is related but it certainly appears in the current displays that are problematic

Comment 30 scattol 2012-12-01 13:53:30 UTC
Created attachment 655546 [details]
log files for display :32 from /var/log/gdm

There are sample log files the show the problems described in the previous entry with various X errors in gdm logs

Comment 31 Frédéric 2012-12-01 17:59:43 UTC
Could it be related to the NVIDIA proprietary drivers? I see a lot of NVIDIA in /var/log/gdm. Do someone have those problems without?

Comment 32 scattol 2012-12-01 18:52:05 UTC
My recollection is that, on that same machine, using the nouveau drivers that shipped with the OS the machine was also unusable/unstable. I won't go as far as to swear that it occured with the nouveau drivers but I am pretty sure it did as in my quest to stabilised my machine the first thing I tried was to get the NVIDIA drivers. I had to work hard to install the proprietary drivers (and make them stick) and part of the reason was to have a more stable machine.

Comment 33 Frédéric 2012-12-01 19:00:21 UTC
Can we increase the severity to high="Problem due to crashes, loss of data, severe memory, leak, etc."

Comment 34 scattol 2012-12-01 19:18:18 UTC
Updated to NVIDIA 310 drivers. Still looks like it's leaking Xorg processes (which so far seems to be an indicator of the problem)

Comment 35 Fedora End Of Life 2013-12-21 08:30:09 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 36 scattol 2013-12-30 12:55:31 UTC
This issue also exists with Fedora 19.

Comment 37 Fedora End Of Life 2014-02-05 11:51:44 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 38 John Hein 2014-02-06 17:20:17 UTC
Maybe the version should have been bumped by someone who can edit this bug (and then reopen the bug).  It was reported that it happens in F19.