Bug 499272 - gdm-simple-slave crashes
gdm-simple-slave crashes
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: gdm (Show other bugs)
11
All Linux
low Severity medium
: ---
: ---
Assigned To: jmccann
Fedora Extras Quality Assurance
:
: 500908 (view as bug list)
Depends On:
Blocks: F11Target
  Show dependency treegraph
 
Reported: 2009-05-05 17:58 EDT by Bill Nottingham
Modified: 2015-01-14 18:22 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-31 14:14:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
gnome-simple-slave --debug log (35.15 KB, text/plain)
2009-05-07 15:39 EDT, Bill Nottingham
no flags Details
patch! (1.46 KB, patch)
2009-05-08 15:49 EDT, Bill Nottingham
no flags Details | Diff
messages with no gdm-simple-slave crash, but ... (1.19 KB, text/plain)
2009-05-13 15:40 EDT, Allen Kistler
no flags Details
Xorg.0.log with WW at X exit (67.32 KB, text/plain)
2009-05-13 15:41 EDT, Allen Kistler
no flags Details

  None (edit)
Description Bill Nottingham 2009-05-05 17:58:25 EDT
Description of problem:

When running 'telinit 3' from a gnome terminal, gdm-simple-slave crashes. X and the user session then fails to properly exit (and if X was on tty1, bad things then ensue with both X and the newly spawned mingetty trying to run on it.)

Attempting to trace yields:
Program received signal SIGTERM, Terminated.
0x00f26422 in __kernel_vsyscall ()
(gdb) cont
Continuing.

Program received signal SIGTERM, Terminated.
0x00f26422 in __kernel_vsyscall ()
(gdb) 
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x08054b89 in _gdm_session_conversation_stopped (session=0x90e6e70, 
    service_name=0x90e6e78 "xC\f\tpassword") at gdm-session.c:714
714	        g_return_if_fail (GDM_IS_SESSION (session));
(gdb) bt
Cannot access memory at address 0xbf9ed54c

Relevant messages:

May  5 17:44:56 localhost gdm-simple-slave[3899]: GLib-GObject-CRITICAL: g_object_ref: assertion `G_IS_OBJECT (object)' failed
May  5 17:44:56 localhost gdm-simple-slave[3899]: GLib-GObject-WARNING: invalid unclassed pointer in cast to `GdmSession'

Version-Release number of selected component (if applicable):

gdm-2.26.1-4.fc11.i586

How reproducible:

50% or so?

Steps to Reproduce:
1. Log in
2. Start a terminal
3. su to root
4. run telinit 3
  
Actual results:

gdm-simple-slave crashes

Additional info:

I think this happens occasionally on shutdown/reboot from the menu as well, but since that also terminates gnome-session, the more-or-less right thing happens.
Comment 1 Chuck Ebbert 2009-05-06 23:02:03 EDT
Looks like it ran out of stack space:

00000420 <__kernel_vsyscall>:
 420:   51                      push   %ecx
 421:   52                      push   %edx
 422:   55                      push   %ebp
Comment 2 Bill Nottingham 2009-05-07 15:39:54 EDT
Created attachment 342927 [details]
gnome-simple-slave --debug log

Here's the log from gnome-simple-slave --debug when it crashes.

The interesting part is:

gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Handling SessionStarted
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting 'session-started' signal with pid '13785'
gdm-simple-slave-real[13660]: DEBUG(+): using ut_user foo
gdm-simple-slave-real[13660]: DEBUG(+): Writing login record
gdm-simple-slave-real[13660]: DEBUG(+): using ut_type USER_PROCESS
gdm-simple-slave-real[13660]: DEBUG(+): using ut_tv time 1241723783
gdm-simple-slave-real[13660]: DEBUG(+): using ut_pid 13766
gdm-simple-slave-real[13660]: DEBUG(+): using ut_id :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_host :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_line tty7
gdm-simple-slave-real[13660]: DEBUG(+): Writing wtmp session record to /var/log/wtmp
gdm-simple-slave-real[13660]: DEBUG(+): Adding new utmp record
gdm-simple-slave-real[13660]: DEBUG(+): GdmSimpleSlave: session started 13785
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Trying script /etc/gdm/PreSession/:0
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script /etc/gdm/PreSession/:0 not found; skipping
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Trying script /etc/gdm/PreSession/Default
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Running process: /etc/gdm/PreSession/Default
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: XAUTHORITY=/var/run/gdm/auth-for-gdm-ZhuZU0/database
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: X_SERVERS=/var/gdm/:0.Xservers
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: LOGNAME=foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: PWD=/home/foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: HOME=/home/foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: DISPLAY=:0
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: SHELL=/bin/bash
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: RUNNING_UNDER_GDM=true
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: USER=foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: USERNAME=foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Process exit status: 0
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionWorkerJob: child (pid:13766) done (status:0)
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Worker job exited: 0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_user foo
gdm-simple-slave-real[13660]: DEBUG(+): Writing failed session attempt record
gdm-simple-slave-real[13660]: DEBUG(+): using ut_type USER_PROCESS
gdm-simple-slave-real[13660]: DEBUG(+): using ut_tv time 1241723806
gdm-simple-slave-real[13660]: DEBUG(+): using ut_pid 13766
gdm-simple-slave-real[13660]: DEBUG(+): using ut_id :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_host :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_line tty7
gdm-simple-slave-real[13660]: DEBUG(+): Writing btmp failed session attempt record to /var/log/btmp
gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: Sending Problem (Unable to authenticate user)
gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: There is no valid connection
gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: Could not send Problem signal
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: stopping conversation
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting conversation-stopped signal
gdm-simple-slave-real[13660]: DEBUG(+): GdmSimpleSlave: conversation stopped
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting conversation-stopped signal

i..e, it decides that there was a failed session login for some reason when it gets the exit signal.
Comment 3 Ray Strode [halfline] 2009-05-07 17:56:24 EDT
I pushed gdm-2.26.1-8.fc11 to fix the bogus failed session login reporting, but that didn't solve this issue. I need to investigate more.
Comment 4 Bill Nottingham 2009-05-08 15:49:31 EDT
Created attachment 343152 [details]
patch!

We initially debugged this happening during a user session, in which case
the 'failed session attempt' is completely bogus. Ray tracked this down to
is_authenticated never being set for the session. This fixed that case,
but it still crashes if telinit is called when you're at the greeter, with
the same trace, as is_authenticated isn't set then.

What's happening is that the cleanup function has:

static void
worker_exited (GdmSessionWorkerJob *job,
               int                  code,
               GdmSessionConversation *conversation)
{
        g_debug ("GdmSessionDirect: Worker job exited: %d", code);

        g_object_ref (conversation->job);
        if (!conversation->session->priv->is_authenticated) {
                char *msg;

                msg = g_strdup_printf (_("worker exited with status %d"), code);
                _gdm_session_authentication_failed (GDM_SESSION (conversation->session), conversation->service_name, ms
                g_free (msg);
        } else if (conversation->session->priv->is_running) {
                _gdm_session_session_exited (GDM_SESSION (conversation->session), code);
        }

        g_debug ("GdmSessionDirect: Emitting conversation-stopped signal");
        _gdm_session_conversation_stopped (GDM_SESSION (conversation->session),
                                           conversation->service_name);
        g_object_unref (conversation->job);
}

When we're not authenticated, we send out the authentication-failed signal
here. This calls gdm_session_stop_conversation
(gdm-simple-slave.c:on_session_authentication_failed), which frees the
conversation from the hash table. So, when we try and stop the conversation
again at the end of worker_exited, we crash.

The correct fix may be to take another reference somewhere so the hash entry
isn't freed, or to fix the conversation_stop function so it doesn't explode
on an already-stopped-and-freed conversation. But the simple hack is
to not send the authentication-failed signal in the first place when the
worker exits/crashes - it seems somewhat fishy to me in that in this case,
it's not that authentication failed; it hasn't started to begin with.

Stupid patch attached; it fixes the issue for me.
Comment 5 Hezekiah M. Carty 2009-05-11 22:14:00 EDT
I tested with gdm-2.26.1-9.fc11.x86_64 from Koji which seems to have the above patch according to the changelog on my system (Lenovo Thinkpad R61, Core2Duo, Intel GM965 video) and I still get logged out of Gnome and kicked back to the GDM login screen after a few suspend/resume cycles.
Comment 6 Hezekiah M. Carty 2009-05-12 15:25:06 EDT
May I ask why this is no longer a blocker?
Comment 7 Allen Kistler 2009-05-12 16:08:52 EDT
Re: #6

(Disclaimer:  I am neither QA nor Rel-Eng.)

Blockers are things that would delay release.  The effect of the original bug was a total lockup when trying to go from runlevel 5 to runlevel 3.  It would be bad to release an OS that can't change runlevels without locking up.

OTOH, now there's a less than perfect patch that at least keeps hitting the power button from being your only option to get anywhere once the bug happens.  It still needs to work better, so it *should* be fixed ASAP (Tracker), but it's not so bad that the world needs to stop (Blocker).

Is getting kicked out of a logged in session on suspend/resume really the same bug, though?
Comment 8 Hezekiah M. Carty 2009-05-12 16:13:39 EDT
Oops - I may have my bugs mixed up.  There are a few GDM bugs I have been tracking which were marked as duplicates (shutdown errors vs suspend/resume errors) and I'm not sure which is which at this point.
Comment 9 Allen Kistler 2009-05-13 15:40:22 EDT
Created attachment 343858 [details]
messages with no gdm-simple-slave crash, but ...

I snagged gdm-2.26.1-10 and gdm-user-switch-applet-2.26.1-10 (for consistency) from Koji.  It looks like -8, -9, and -10 should fix the bug.  I suppose the good news is that gdm-simple-slave is no longer dumping errors into /var/log/messages.  The bad news is that the patch doesn't fix the deadly bug symptom.

"telinit 3" from inside gnome-terminal still kills the console, leaving the power button and ACPI the only option to get anywhere.  Basically the screen blanks like it's going into text mode, but it stays blank, and A-C-Fn (or A-Fn) doesn't switch to tty[2-6].  In messages, mingetty on tty1 complains about invalid characters, dies, and respawns, which is too familiar by now.

I'll attach Xorg.0.log, too.  The final few lines (warnings as X exits) might be a hint.
Comment 10 Allen Kistler 2009-05-13 15:41:25 EDT
Created attachment 343860 [details]
Xorg.0.log with WW at X exit
Comment 11 Bill Nottingham 2009-05-13 16:08:33 EDT
That likely has nothing to do with gdm. (Although I could be wrong.)
Comment 12 Bill Nottingham 2009-05-13 16:30:26 EDT
In fact, I suspect that case is:

- runlevel 5 is exited
- prefdm is killed
- gdm starts shutting down the sesssion, X.org, etc.
- runlevel 3 starts
- runlevel 3 setup finishes
- mingetty is started on tty1
- X.org finishes exiting

Which is a race that could have happened irrespective of whatever gdm is doing at session close, and probably could have happened/can happen in F-10.
Comment 13 Ray Strode [halfline] 2009-05-13 16:39:31 EDT
Allen,

so my guess what's happening for you is:

init 3 causes upstart to send SIGTERM gdm
gdm sends SIGTERM to gdm-simple-slave
gdm-simple-slave sends SIGTERM to Xorg

and somewhere along the line we aren't doing a waitpid() so then upstart starts a getty on the vt X is using and all hell breaks loose.

If gdm *was* crashing that would explain why we aren't doing the waitpid, but you said it's not.  Anyway, would you mind filing that as a separate issue?

The latest packages fix the simple-slave crashes bill reported.
Comment 14 Ray Strode [halfline] 2009-05-13 16:39:54 EDT
(there was a midair collision between me and Bill, but i posted my comment anyway)
Comment 15 Allen Kistler 2009-05-13 18:37:49 EDT
Given the above, it sort of makes me wonder if successful exits from runlevel 5 previously depended on gdm crashing, so that X exited completely before mingetty started.  Oh, well.

Ryokai.  Bug 500750
Comment 16 Norbert Fabritius 2009-05-16 12:30:00 EDT
*** Bug 500908 has been marked as a duplicate of this bug. ***
Comment 17 Bug Zapper 2009-06-09 11:13:15 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 18 Christopher Beland 2009-07-31 12:25:29 EDT
Is anyone still experiencing this problem despite installing the latest packages, or can this bug be closed?
Comment 19 Bill Nottingham 2009-07-31 14:14:25 EDT
Fairly sure this is all fixed now.

Note You need to log in before you can comment on or make changes to this bug.