Bug 499272
Summary: | gdm-simple-slave crashes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bill Nottingham <notting> | ||||||||||
Component: | gdm | Assignee: | jmccann | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 11 | CC: | ackistler, beland, benjavalero, bnocera, cschalle, dcantrell, hez, jmccann, maurizio.antillon, mtasaka, nori.f, rstrode, rvokal | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2009-07-31 18:14:25 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 446451 | ||||||||||||
Attachments: |
|
Description
Bill Nottingham
2009-05-05 21:58:25 UTC
Looks like it ran out of stack space: 00000420 <__kernel_vsyscall>: 420: 51 push %ecx 421: 52 push %edx 422: 55 push %ebp Created attachment 342927 [details]
gnome-simple-slave --debug log
Here's the log from gnome-simple-slave --debug when it crashes.
The interesting part is:
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Handling SessionStarted
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting 'session-started' signal with pid '13785'
gdm-simple-slave-real[13660]: DEBUG(+): using ut_user foo
gdm-simple-slave-real[13660]: DEBUG(+): Writing login record
gdm-simple-slave-real[13660]: DEBUG(+): using ut_type USER_PROCESS
gdm-simple-slave-real[13660]: DEBUG(+): using ut_tv time 1241723783
gdm-simple-slave-real[13660]: DEBUG(+): using ut_pid 13766
gdm-simple-slave-real[13660]: DEBUG(+): using ut_id :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_host :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_line tty7
gdm-simple-slave-real[13660]: DEBUG(+): Writing wtmp session record to /var/log/wtmp
gdm-simple-slave-real[13660]: DEBUG(+): Adding new utmp record
gdm-simple-slave-real[13660]: DEBUG(+): GdmSimpleSlave: session started 13785
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Trying script /etc/gdm/PreSession/:0
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script /etc/gdm/PreSession/:0 not found; skipping
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Trying script /etc/gdm/PreSession/Default
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Running process: /etc/gdm/PreSession/Default
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: XAUTHORITY=/var/run/gdm/auth-for-gdm-ZhuZU0/database
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: X_SERVERS=/var/gdm/:0.Xservers
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: LOGNAME=foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: PWD=/home/foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: HOME=/home/foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: DISPLAY=:0
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: SHELL=/bin/bash
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: RUNNING_UNDER_GDM=true
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: USER=foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: script environment: USERNAME=foo
gdm-simple-slave-real[13660]: DEBUG(+): GdmSlave: Process exit status: 0
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionWorkerJob: child (pid:13766) done (status:0)
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Worker job exited: 0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_user foo
gdm-simple-slave-real[13660]: DEBUG(+): Writing failed session attempt record
gdm-simple-slave-real[13660]: DEBUG(+): using ut_type USER_PROCESS
gdm-simple-slave-real[13660]: DEBUG(+): using ut_tv time 1241723806
gdm-simple-slave-real[13660]: DEBUG(+): using ut_pid 13766
gdm-simple-slave-real[13660]: DEBUG(+): using ut_id :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_host :0
gdm-simple-slave-real[13660]: DEBUG(+): using ut_line tty7
gdm-simple-slave-real[13660]: DEBUG(+): Writing btmp failed session attempt record to /var/log/btmp
gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: Sending Problem (Unable to authenticate user)
gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: There is no valid connection
gdm-simple-slave-real[13660]: DEBUG(+): GreeterServer: Could not send Problem signal
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: stopping conversation
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting conversation-stopped signal
gdm-simple-slave-real[13660]: DEBUG(+): GdmSimpleSlave: conversation stopped
gdm-simple-slave-real[13660]: DEBUG(+): GdmSessionDirect: Emitting conversation-stopped signal
i..e, it decides that there was a failed session login for some reason when it gets the exit signal.
I pushed gdm-2.26.1-8.fc11 to fix the bogus failed session login reporting, but that didn't solve this issue. I need to investigate more. Created attachment 343152 [details]
patch!
We initially debugged this happening during a user session, in which case
the 'failed session attempt' is completely bogus. Ray tracked this down to
is_authenticated never being set for the session. This fixed that case,
but it still crashes if telinit is called when you're at the greeter, with
the same trace, as is_authenticated isn't set then.
What's happening is that the cleanup function has:
static void
worker_exited (GdmSessionWorkerJob *job,
int code,
GdmSessionConversation *conversation)
{
g_debug ("GdmSessionDirect: Worker job exited: %d", code);
g_object_ref (conversation->job);
if (!conversation->session->priv->is_authenticated) {
char *msg;
msg = g_strdup_printf (_("worker exited with status %d"), code);
_gdm_session_authentication_failed (GDM_SESSION (conversation->session), conversation->service_name, ms
g_free (msg);
} else if (conversation->session->priv->is_running) {
_gdm_session_session_exited (GDM_SESSION (conversation->session), code);
}
g_debug ("GdmSessionDirect: Emitting conversation-stopped signal");
_gdm_session_conversation_stopped (GDM_SESSION (conversation->session),
conversation->service_name);
g_object_unref (conversation->job);
}
When we're not authenticated, we send out the authentication-failed signal
here. This calls gdm_session_stop_conversation
(gdm-simple-slave.c:on_session_authentication_failed), which frees the
conversation from the hash table. So, when we try and stop the conversation
again at the end of worker_exited, we crash.
The correct fix may be to take another reference somewhere so the hash entry
isn't freed, or to fix the conversation_stop function so it doesn't explode
on an already-stopped-and-freed conversation. But the simple hack is
to not send the authentication-failed signal in the first place when the
worker exits/crashes - it seems somewhat fishy to me in that in this case,
it's not that authentication failed; it hasn't started to begin with.
Stupid patch attached; it fixes the issue for me.
I tested with gdm-2.26.1-9.fc11.x86_64 from Koji which seems to have the above patch according to the changelog on my system (Lenovo Thinkpad R61, Core2Duo, Intel GM965 video) and I still get logged out of Gnome and kicked back to the GDM login screen after a few suspend/resume cycles. May I ask why this is no longer a blocker? Re: #6 (Disclaimer: I am neither QA nor Rel-Eng.) Blockers are things that would delay release. The effect of the original bug was a total lockup when trying to go from runlevel 5 to runlevel 3. It would be bad to release an OS that can't change runlevels without locking up. OTOH, now there's a less than perfect patch that at least keeps hitting the power button from being your only option to get anywhere once the bug happens. It still needs to work better, so it *should* be fixed ASAP (Tracker), but it's not so bad that the world needs to stop (Blocker). Is getting kicked out of a logged in session on suspend/resume really the same bug, though? Oops - I may have my bugs mixed up. There are a few GDM bugs I have been tracking which were marked as duplicates (shutdown errors vs suspend/resume errors) and I'm not sure which is which at this point. Created attachment 343858 [details]
messages with no gdm-simple-slave crash, but ...
I snagged gdm-2.26.1-10 and gdm-user-switch-applet-2.26.1-10 (for consistency) from Koji. It looks like -8, -9, and -10 should fix the bug. I suppose the good news is that gdm-simple-slave is no longer dumping errors into /var/log/messages. The bad news is that the patch doesn't fix the deadly bug symptom.
"telinit 3" from inside gnome-terminal still kills the console, leaving the power button and ACPI the only option to get anywhere. Basically the screen blanks like it's going into text mode, but it stays blank, and A-C-Fn (or A-Fn) doesn't switch to tty[2-6]. In messages, mingetty on tty1 complains about invalid characters, dies, and respawns, which is too familiar by now.
I'll attach Xorg.0.log, too. The final few lines (warnings as X exits) might be a hint.
Created attachment 343860 [details]
Xorg.0.log with WW at X exit
That likely has nothing to do with gdm. (Although I could be wrong.) In fact, I suspect that case is: - runlevel 5 is exited - prefdm is killed - gdm starts shutting down the sesssion, X.org, etc. - runlevel 3 starts - runlevel 3 setup finishes - mingetty is started on tty1 - X.org finishes exiting Which is a race that could have happened irrespective of whatever gdm is doing at session close, and probably could have happened/can happen in F-10. Allen, so my guess what's happening for you is: init 3 causes upstart to send SIGTERM gdm gdm sends SIGTERM to gdm-simple-slave gdm-simple-slave sends SIGTERM to Xorg and somewhere along the line we aren't doing a waitpid() so then upstart starts a getty on the vt X is using and all hell breaks loose. If gdm *was* crashing that would explain why we aren't doing the waitpid, but you said it's not. Anyway, would you mind filing that as a separate issue? The latest packages fix the simple-slave crashes bill reported. (there was a midair collision between me and Bill, but i posted my comment anyway) Given the above, it sort of makes me wonder if successful exits from runlevel 5 previously depended on gdm crashing, so that X exited completely before mingetty started. Oh, well. Ryokai. Bug 500750 *** Bug 500908 has been marked as a duplicate of this bug. *** This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Is anyone still experiencing this problem despite installing the latest packages, or can this bug be closed? Fairly sure this is all fixed now. |