Description of problem: When running 'telinit 3' from a gnome terminal, gdm-simple-slave crashes. X and the user session then fails to properly exit (and if X was on tty1, bad things then ensue with both X and the newly spawned mingetty trying to run on it.) Attempting to trace yields: Program received signal SIGTERM, Terminated. 0x00f26422 in __kernel_vsyscall () (gdb) cont Continuing. Program received signal SIGTERM, Terminated. 0x00f26422 in __kernel_vsyscall () (gdb) Continuing. Program received signal SIGSEGV, Segmentation fault. 0x08054b89 in _gdm_session_conversation_stopped (session=0x90e6e70, service_name=0x90e6e78 "xC\f\tpassword") at gdm-session.c:714 714 g_return_if_fail (GDM_IS_SESSION (session)); (gdb) bt Cannot access memory at address 0xbf9ed54c Relevant messages: May 5 17:44:56 localhost gdm-simple-slave[3899]: GLib-GObject-CRITICAL: g_object_ref: assertion `G_IS_OBJECT (object)' failed May 5 17:44:56 localhost gdm-simple-slave[3899]: GLib-GObject-WARNING: invalid unclassed pointer in cast to `GdmSession' Version-Release number of selected component (if applicable): gdm-2.26.1-4.fc11.i586 How reproducible: 50% or so? Steps to Reproduce: 1. Log in 2. Start a terminal 3. su to root 4. run telinit 3 Actual results: gdm-simple-slave crashes Additional info: I think this happens occasionally on shutdown/reboot from the menu as well, but since that also terminates gnome-session, the more-or-less right thing happens.
Looks like it ran out of stack space: 00000420 <__kernel_vsyscall>: 420: 51 push %ecx 421: 52 push %edx 422: 55 push %ebp
Created attachment 342927 [details] gnome-simple-slave --debug log Here's the log from gnome-simple-slave --debug when it crashes. The interesting part is: gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Handling SessionStarted gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting 'session-started' signal with pid '13785' gdm-simple-slave-real[13660]: DEBUG(+): using ut_user foo gdm-simple-slave-real[13660]: DEBUG(+): Writing login record gdm-simple-slave-real[13660]: DEBUG(+): using ut_type USER_PROCESS gdm-simple-slave-real[13660]: DEBUG(+): using ut_tv time 1241723783 gdm-simple-slave-real[13660]: DEBUG(+): using ut_pid 13766 gdm-simple-slave-real[13660]: DEBUG(+): using ut_id :0 gdm-simple-slave-real[13660]: DEBUG(+): using ut_host :0 gdm-simple-slave-real[13660]: DEBUG(+): using ut_line tty7 gdm-simple-slave-real[13660]: DEBUG(+): Writing wtmp session record to /var/log/wtmp gdm-simple-slave-real[13660]: DEBUG(+): Adding new utmp record gdm-simple-slave-real[13660]: DEBUG(+): GdmSimpleSlave: session started 13785 gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Trying script /etc/gdm/PreSession/:0 gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script /etc/gdm/PreSession/:0 not found; skipping gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Trying script /etc/gdm/PreSession/Default gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Running process: /etc/gdm/PreSession/Default gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: XAUTHORITY=/var/run/gdm/auth-for-gdm-ZhuZU0/database gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: X_SERVERS=/var/gdm/:0.Xservers gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: LOGNAME=foo gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: PWD=/home/foo gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: HOME=/home/foo gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: DISPLAY=:0 gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: SHELL=/bin/bash gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: RUNNING_UNDER_GDM=true gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: USER=foo gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: USERNAME=foo gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Process exit status: 0 gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionWorkerJob: child (pid:13766) done (status:0) gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Worker job exited: 0 gdm-simple-slave-real[13660]: DEBUG(+): using ut_user foo gdm-simple-slave-real[13660]: DEBUG(+): Writing failed session attempt record gdm-simple-slave-real[13660]: DEBUG(+): using ut_type USER_PROCESS gdm-simple-slave-real[13660]: DEBUG(+): using ut_tv time 1241723806 gdm-simple-slave-real[13660]: DEBUG(+): using ut_pid 13766 gdm-simple-slave-real[13660]: DEBUG(+): using ut_id :0 gdm-simple-slave-real[13660]: DEBUG(+): using ut_host :0 gdm-simple-slave-real[13660]: DEBUG(+): using ut_line tty7 gdm-simple-slave-real[13660]: DEBUG(+): Writing btmp failed session attempt record to /var/log/btmp gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: Sending Problem (Unable to authenticate user) gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: There is no valid connection gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: Could not send Problem signal gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: stopping conversation gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting conversation-stopped signal gdm-simple-slave-real[13660]: DEBUG(+): GdmSimpleSlave: conversation stopped gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting conversation-stopped signal i..e, it decides that there was a failed session login for some reason when it gets the exit signal.
I pushed gdm-2.26.1-8.fc11 to fix the bogus failed session login reporting, but that didn't solve this issue. I need to investigate more.
Created attachment 343152 [details] patch! We initially debugged this happening during a user session, in which case the 'failed session attempt' is completely bogus. Ray tracked this down to is_authenticated never being set for the session. This fixed that case, but it still crashes if telinit is called when you're at the greeter, with the same trace, as is_authenticated isn't set then. What's happening is that the cleanup function has: static void worker_exited (GdmSessionWorkerJob *job, int code, GdmSessionConversation *conversation) { g_debug ("GdmSessionDirect: Worker job exited: %d", code); g_object_ref (conversation->job); if (!conversation->session->priv->is_authenticated) { char *msg; msg = g_strdup_printf (_("worker exited with status %d"), code); _gdm_session_authentication_failed (GDM_SESSION (conversation->session), conversation->service_name, ms g_free (msg); } else if (conversation->session->priv->is_running) { _gdm_session_session_exited (GDM_SESSION (conversation->session), code); } g_debug ("GdmSessionDirect: Emitting conversation-stopped signal"); _gdm_session_conversation_stopped (GDM_SESSION (conversation->session), conversation->service_name); g_object_unref (conversation->job); } When we're not authenticated, we send out the authentication-failed signal here. This calls gdm_session_stop_conversation (gdm-simple-slave.c:on_session_authentication_failed), which frees the conversation from the hash table. So, when we try and stop the conversation again at the end of worker_exited, we crash. The correct fix may be to take another reference somewhere so the hash entry isn't freed, or to fix the conversation_stop function so it doesn't explode on an already-stopped-and-freed conversation. But the simple hack is to not send the authentication-failed signal in the first place when the worker exits/crashes - it seems somewhat fishy to me in that in this case, it's not that authentication failed; it hasn't started to begin with. Stupid patch attached; it fixes the issue for me.
I tested with gdm-2.26.1-9.fc11.x86_64 from Koji which seems to have the above patch according to the changelog on my system (Lenovo Thinkpad R61, Core2Duo, Intel GM965 video) and I still get logged out of Gnome and kicked back to the GDM login screen after a few suspend/resume cycles.
May I ask why this is no longer a blocker?
Re: #6 (Disclaimer: I am neither QA nor Rel-Eng.) Blockers are things that would delay release. The effect of the original bug was a total lockup when trying to go from runlevel 5 to runlevel 3. It would be bad to release an OS that can't change runlevels without locking up. OTOH, now there's a less than perfect patch that at least keeps hitting the power button from being your only option to get anywhere once the bug happens. It still needs to work better, so it *should* be fixed ASAP (Tracker), but it's not so bad that the world needs to stop (Blocker). Is getting kicked out of a logged in session on suspend/resume really the same bug, though?
Oops - I may have my bugs mixed up. There are a few GDM bugs I have been tracking which were marked as duplicates (shutdown errors vs suspend/resume errors) and I'm not sure which is which at this point.
Created attachment 343858 [details] messages with no gdm-simple-slave crash, but ... I snagged gdm-2.26.1-10 and gdm-user-switch-applet-2.26.1-10 (for consistency) from Koji. It looks like -8, -9, and -10 should fix the bug. I suppose the good news is that gdm-simple-slave is no longer dumping errors into /var/log/messages. The bad news is that the patch doesn't fix the deadly bug symptom. "telinit 3" from inside gnome-terminal still kills the console, leaving the power button and ACPI the only option to get anywhere. Basically the screen blanks like it's going into text mode, but it stays blank, and A-C-Fn (or A-Fn) doesn't switch to tty[2-6]. In messages, mingetty on tty1 complains about invalid characters, dies, and respawns, which is too familiar by now. I'll attach Xorg.0.log, too. The final few lines (warnings as X exits) might be a hint.
Created attachment 343860 [details] Xorg.0.log with WW at X exit
That likely has nothing to do with gdm. (Although I could be wrong.)
In fact, I suspect that case is: - runlevel 5 is exited - prefdm is killed - gdm starts shutting down the sesssion, X.org, etc. - runlevel 3 starts - runlevel 3 setup finishes - mingetty is started on tty1 - X.org finishes exiting Which is a race that could have happened irrespective of whatever gdm is doing at session close, and probably could have happened/can happen in F-10.
Allen, so my guess what's happening for you is: init 3 causes upstart to send SIGTERM gdm gdm sends SIGTERM to gdm-simple-slave gdm-simple-slave sends SIGTERM to Xorg and somewhere along the line we aren't doing a waitpid() so then upstart starts a getty on the vt X is using and all hell breaks loose. If gdm *was* crashing that would explain why we aren't doing the waitpid, but you said it's not. Anyway, would you mind filing that as a separate issue? The latest packages fix the simple-slave crashes bill reported.
(there was a midair collision between me and Bill, but i posted my comment anyway)
Given the above, it sort of makes me wonder if successful exits from runlevel 5 previously depended on gdm crashing, so that X exited completely before mingetty started. Oh, well. Ryokai. Bug 500750
*** Bug 500908 has been marked as a duplicate of this bug. ***
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Is anyone still experiencing this problem despite installing the latest packages, or can this bug be closed?
Fairly sure this is all fixed now.