Bug 214649
Summary: | gnome-session stops (goes into T-state) for unknown (yet) reason | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kris Rusocki <kszysiu> | ||||||||||
Component: | gnome-session | Assignee: | Ray Strode [halfline] <rstrode> | ||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 6 | CC: | ajax, hp, johnp, kszysiu, roland | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | i386 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2007-09-19 11:33:52 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Kris Rusocki
2006-11-08 19:08:15 UTC
Created attachment 140696 [details]
strace output just before stop
Ooookay. Investigated problem a little bit more... everything (dbus-launch, gnome-session, gconfd, gnome-keyring-daemon) goes nuts (T) after input on Xorg VT. Steps to reproduce: 1. boot FC6 (without any display manager) 2. log in 3. startx 4. switch back to the VT 5. press enter 6. ps auxw | grep T This is really quite odd. Normally SIGTTIN would be sent to a process if it tried to read from its standard input while not being the foreground process of the controlling terminal. The fact that we get it when you press enter in the controlling terminal makes me wonder if gnome-session is calling poll() on /dev/tty and then doing a read() or something. That would be crazy though. So I tried to gdb attach to the process when it was stopped, but gdb didn't like that too much (bug 214740). I then added a patch to make gnome-session raise(SIGSEGV) when getting SIGTTIN. I then gdb attached to it when it segfaulted. First off, two gnome-session processes segfaulted at the same time. One was in a read() call alright, but it was for a freshly created pipe, not for the standard input. The other process wasn't even in a read() call it was in the middle forking. very strange.
> Normally SIGTTIN would be sent to a process if it tried to read from its
> standard input while not being the foreground process of the controlling
> terminal.
How about scenario where you have several processess run on a tty
none of which is foreground process and only *one* of them tries to
read from tty? (Should TTIN be deliverd to the very one which reads
or maybe? to all of them?)
I've found a definitely offending code in dbus (dbus-launch.c:428).
There's a loop that consumes any tty input with non-blocking read.
We get there (:428) because (guessing) select() returns tty_fd for reading...
I'll try removing this code in dbus-launch and see what difference
it makes.
Blocking read that was, sorry. Anyway... will report back on that dbus-launch code in few hours. Yes. So.. yes.... I just did a proof-of-concept workaround in dbus-launch (dbus-launch.c:366). I explicitly set tty_fd to -1 and that seem to prevent all Bad Things from happenning. Q1: Why does dbus-launch look at tty input in the first place? (it discards it after reading so why bother?) It looks (does it?) like my assumption was correct - only dbus-launch got tty fd selected for reading, only dbus-launch tried to read from tty fd... BUT some other processess hooked up to this tty also got TTIN (?. Q2: Why? and is it expected behavior? Created attachment 140847 [details]
Prevent dbus-launch from looking at tty input
It actually was :361.. aanyway.
so what dbus-launch is doing looks pretty bogus. I think the idea is to try to handle terminal sessions exiting or something (and it has a separate code path for X sessions exiting). On the other hand, I don't understand why it's causing the behavior we're seeing here. Roland, do you have any idea what's going on? As an aside, gnome-session should probably set it's stdin to /dev/null anyway since it's not going to use it (and maybe lead its own session). Also, maybe xinitrc should be redirecting /dev/null to standard input before running dbus-launch. dbus-launch is looking for eof from the terminal, so if you e.g. ran it from an xterm then closed the xterm (but not the X session) dbus-launch would exit. Not sure of the right fix. I believe it is looking for eof because HUP didn't always happen, or something. I don't really remember, maybe there's something in the ChangeLog. When a background process tries a read, SIGTTIN is sent to the entire process group. (Similarly, a ^Z or ^C sends SIGTSTP or SIGINT to the whole pgrp.) However, when a background process has ignored or blocked SIGTTIN and calls read, there is no SIGTTIN generated at all and the call just fails with EIO. So a process could ignore SIGTTN before reading, but that is not actually useful for anything other than masking the error. If the process in fact should be reading at all, then it needs to be the foreground process group when it reads. That makes sense, someone has to "own" stdin on a terminal at any given time or who knows who will read each byte. A sledgehammer fix might be for dbus-launch to give all its child processes /dev/null for stdin instead of sharing the tty with them - i.e. just take over the terminal input exclusively. That's what we want whenever launching a GUI app for sure (I think g_spawn_* does this by default even). However, that would break useful things like "dbus-launch bash" Maybe the right behavior is something like dbus-launch should only look at EOF on stdin if its non-bus-daemon child exits (safe to assume that any child terminal app would exit if the terminal closes?) or if it has no child process. Dumb (but straightforward) 'fix' would be introducing command-line parameter that would tell dbus-launch not to pay attention to stdin at all... (and one could put it /the parameter/ into xinitrc). But I bet that it's not satisfactory, or is it? For the record: I browsed cvs a little bit and it seems that this stdin-read-loop was in the very first functional version commited (1.2). So if there's any story to it - it's not in the cvs.. At our current level of understanding, I think that command line option would amount to --do-random-other-wrong-thing-besides-the-default-wrong-thing First things first - we should understand what the situations are where dbus- launch is used and how they should work and how they currently do work. So like I mentioned in comment 8, we could workaround this by adding < /dev/null to the end of our dbus-launch lines in /etc/X11/xinit/xinitrc but I don't think we should have to do that. I think this is really an xinit bug though. It should set stdin to /dev/null and start a new session for the client it starts. I talked to Adam a bit about this and he thinks its probably okay to fix things up there. I still want to brain dump a bit though... I think trapping for SIGHUP instead of reading for EOF *should* work. xterm is going to run bash and bash is going to start a new session. I think that when xterm exits it sends a SIGHUP to every member of the process group that bash is running in (which will probably be just the lone bash process I guess). Looking at the vte code, that's what it does. So, bash will then die from the hang up signal. Now, taking a step backward... if you type "dbus-launch" at a bash prompt then bash is going to create a new process group for dbus-launch and make dbus-launch the process group leader of that process group. When bash (the session leader) dies, dbus-launch is going to be orphaned. /sbin/init will become dbus-launch's parent, and the entire dbus-launch process group (including dbus-launch itself) should get sent SIGHUP, because orphaned process groups are supposed to get SIGHUP (according to my stevens book). So I'm a little surprised the hangup technique didn't pan out. Havoc, do you remember any details about why it wasn't working? Created attachment 140864 [details] xinit-1.0.2-client-session.patch this patch makes the changes to xinit I mentioned in comment 15 Basically no, I don't remember at all what I was testing or why I did it that way... We can try just removing the read-from-stdin code and see what happens. I guess it might be good to test that dbus-launch properly kills the launched bus in some common situations with that change, though; e.g. from linux console, from an xterm, from gnome-terminal, and maybe from an ssh session that gets dropped. Krzysztof, I've built xorg-x11-xinit-1.0.2-14.fc6 which should be showing up in updates-testing sometime soon. When it does, can you test if it address this problem? Ray, no, it does not seem to help; btw, the only difference between stock xorg-x11-xinit-1.0.2-12 and -14 are four bytes in the xinit binary. Are you sure that your modification went in? (which, as I understand, should touch scripts, right?) When I copied the fix over from rawhide to fc6-updates I didn't actually apply it to the build. Sorry about that. -15.fc6 is building now. Created attachment 141015 [details]
dont crash the sitter after trying to open the display
Hi Havoc,
So I briefly had a look at removing the read-from-stdin code tonight. One
thing I ran into right away is that the existing code didn't seem to work. If
I opened a terminal and ran dbus-launch --exit-with-session bash and then
looked the ps output, dbus-launch wasn't running. It turns out it crashes
right after opening the display because the x11_init function tries to
pre-create some atoms needed by the autolaunch code path without having a
machine uuid (so a NULL value gets passed to strlen() somewhere). A possible
fix for that is above.
After removing the read-from-stdin code, things still worked okay with xterms
and gnome-terminals and ssh connections, but not so well for virtual consoles.
I haven't figured out why yet.
Can you post the crash fix to the dbus list? Thiago should probably see it as well as John. Ray, -15 works as expected. Krzysztof, thanks, I'll push the update out of testing and close this then. (and I'll move the rest of the dbus discussion to the dbus list). fc6 update contains latest packages. Closing bug |