Red Hat Bugzilla – Bug 214649
gnome-session stops (goes into T-state) for unknown (yet) reason
Last modified: 2011-07-26 14:03:34 EDT
Description of problem: every now and then (unable to reproduce reliably)
gnome-session stops (probably after getting SIGSTOP?) ang goes to T-state.
As a result many applications won't run and will hang on reading from
gnome-session's ICE socket.
(Ain't an X guy, sorry couldn't provide more info)
Version-Release number of selected component (if applicable):
How reproducible: Unable to reproduce reliably.
Steps to Reproduce:
Actual results: gnome-session stops
Expected results: gnome-session doesn't stop
Additional info: killall -CONT gnome-session does the trick.
Created attachment 140696 [details]
strace output just before stop
Ooookay. Investigated problem a little bit more... everything (dbus-launch,
gnome-session, gconfd, gnome-keyring-daemon) goes nuts (T) after input on Xorg VT.
Steps to reproduce:
1. boot FC6 (without any display manager)
2. log in
4. switch back to the VT
5. press enter
6. ps auxw | grep T
This is really quite odd.
Normally SIGTTIN would be sent to a process if it tried to read from its
standard input while not being the foreground process of the controlling terminal.
The fact that we get it when you press enter in the controlling terminal makes
me wonder if gnome-session is calling poll() on /dev/tty and then doing a read()
or something. That would be crazy though.
So I tried to gdb attach to the process when it was stopped, but gdb didn't like
that too much (bug 214740). I then added a patch to make gnome-session
raise(SIGSEGV) when getting SIGTTIN. I then gdb attached to it when it
segfaulted. First off, two gnome-session processes segfaulted at the same time.
One was in a read() call alright, but it was for a freshly created pipe, not
for the standard input. The other process wasn't even in a read() call it was
in the middle forking.
> Normally SIGTTIN would be sent to a process if it tried to read from its
> standard input while not being the foreground process of the controlling
How about scenario where you have several processess run on a tty
none of which is foreground process and only *one* of them tries to
read from tty? (Should TTIN be deliverd to the very one which reads
or maybe? to all of them?)
I've found a definitely offending code in dbus (dbus-launch.c:428).
There's a loop that consumes any tty input with non-blocking read.
We get there (:428) because (guessing) select() returns tty_fd for reading...
I'll try removing this code in dbus-launch and see what difference
Blocking read that was, sorry. Anyway... will report back on that dbus-launch
code in few hours.
Yes. So.. yes.... I just did a proof-of-concept workaround in dbus-launch
(dbus-launch.c:366). I explicitly set tty_fd to -1 and that seem to prevent
all Bad Things from happenning.
Q1: Why does dbus-launch look at tty input in the first place? (it discards it
after reading so why bother?)
It looks (does it?) like my assumption was correct - only dbus-launch got
tty fd selected for reading, only dbus-launch tried to read from tty fd... BUT
some other processess hooked up to this tty also got TTIN (?.
Q2: Why? and is it expected behavior?
Created attachment 140847 [details]
Prevent dbus-launch from looking at tty input
It actually was :361.. aanyway.
so what dbus-launch is doing looks pretty bogus. I think the idea is to try to
handle terminal sessions exiting or something (and it has a separate code path
for X sessions exiting). On the other hand, I don't understand why it's causing
the behavior we're seeing here.
Roland, do you have any idea what's going on?
As an aside, gnome-session should probably set it's stdin to /dev/null anyway
since it's not going to use it (and maybe lead its own session). Also, maybe
xinitrc should be redirecting /dev/null to standard input before running
dbus-launch is looking for eof from the terminal, so if you e.g. ran it from an
xterm then closed the xterm (but not the X session) dbus-launch would exit.
Not sure of the right fix. I believe it is looking for eof because HUP didn't
always happen, or something. I don't really remember, maybe there's something
in the ChangeLog.
When a background process tries a read, SIGTTIN is sent to the entire process
group. (Similarly, a ^Z or ^C sends SIGTSTP or SIGINT to the whole pgrp.)
However, when a background process has ignored or blocked SIGTTIN and calls
read, there is no SIGTTIN generated at all and the call just fails with EIO. So
a process could ignore SIGTTN before reading, but that is not actually useful
for anything other than masking the error. If the process in fact should be
reading at all, then it needs to be the foreground process group when it reads.
That makes sense, someone has to "own" stdin on a terminal at any given time or
who knows who will read each byte.
A sledgehammer fix might be for dbus-launch to give all its child
processes /dev/null for stdin instead of sharing the tty with them - i.e. just
take over the terminal input exclusively. That's what we want whenever
launching a GUI app for sure (I think g_spawn_* does this by default even).
However, that would break useful things like "dbus-launch bash"
Maybe the right behavior is something like dbus-launch should only look at EOF
on stdin if its non-bus-daemon child exits (safe to assume that any child
terminal app would exit if the terminal closes?) or if it has no child process.
Dumb (but straightforward) 'fix' would be introducing command-line parameter
that would tell dbus-launch not to pay attention to stdin at all... (and
one could put it /the parameter/ into xinitrc).
But I bet that it's not satisfactory, or is it?
For the record: I browsed cvs a little bit and it seems that this
stdin-read-loop was in the very first functional version commited (1.2).
So if there's any story to it - it's not in the cvs..
At our current level of understanding, I think that command line option would
amount to --do-random-other-wrong-thing-besides-the-default-wrong-thing
First things first - we should understand what the situations are where dbus-
launch is used and how they should work and how they currently do work.
So like I mentioned in comment 8, we could workaround this by adding
< /dev/null to the end of our dbus-launch lines in /etc/X11/xinit/xinitrc
but I don't think we should have to do that. I think this is really an xinit
bug though. It should set stdin to /dev/null and start a new session for the
client it starts. I talked to Adam a bit about this and he thinks its probably
okay to fix things up there.
I still want to brain dump a bit though... I think trapping for SIGHUP instead
of reading for EOF *should* work. xterm is going to run bash and bash is going
to start a new session. I think that when xterm exits it sends a SIGHUP to
every member of the process group that bash is running in (which will probably
be just the lone bash process I guess). Looking at the vte code, that's what it
does. So, bash will then die from the hang up signal.
Now, taking a step backward... if you type "dbus-launch" at a bash prompt then
bash is going to create a new process group for dbus-launch and make dbus-launch
the process group leader of that process group.
When bash (the session leader) dies, dbus-launch is going to be orphaned.
/sbin/init will become dbus-launch's parent, and the entire dbus-launch process
group (including dbus-launch itself) should get sent SIGHUP, because orphaned
process groups are supposed to get SIGHUP (according to my stevens book).
So I'm a little surprised the hangup technique didn't pan out. Havoc, do you
remember any details about why it wasn't working?
Created attachment 140864 [details]
this patch makes the changes to xinit I mentioned in comment 15
Basically no, I don't remember at all what I was testing or why I did it that
We can try just removing the read-from-stdin code and see what happens. I guess
it might be good to test that dbus-launch properly kills the launched bus in
some common situations with that change, though; e.g. from linux console, from
an xterm, from gnome-terminal, and maybe from an ssh session that gets dropped.
Krzysztof, I've built xorg-x11-xinit-1.0.2-14.fc6 which should be showing up in
updates-testing sometime soon. When it does, can you test if it address this
Ray, no, it does not seem to help; btw, the only difference between stock
xorg-x11-xinit-1.0.2-12 and -14 are four bytes in the xinit binary. Are
you sure that your modification went in? (which, as I understand, should
touch scripts, right?)
When I copied the fix over from rawhide to fc6-updates I didn't actually apply
it to the build. Sorry about that. -15.fc6 is building now.
Created attachment 141015 [details]
dont crash the sitter after trying to open the display
So I briefly had a look at removing the read-from-stdin code tonight. One
thing I ran into right away is that the existing code didn't seem to work. If
I opened a terminal and ran dbus-launch --exit-with-session bash and then
looked the ps output, dbus-launch wasn't running. It turns out it crashes
right after opening the display because the x11_init function tries to
pre-create some atoms needed by the autolaunch code path without having a
machine uuid (so a NULL value gets passed to strlen() somewhere). A possible
fix for that is above.
After removing the read-from-stdin code, things still worked okay with xterms
and gnome-terminals and ssh connections, but not so well for virtual consoles.
I haven't figured out why yet.
Can you post the crash fix to the dbus list? Thiago should probably see it as
well as John.
Ray, -15 works as expected.
thanks, I'll push the update out of testing and close this then.
(and I'll move the rest of the dbus discussion to the dbus list).
fc6 update contains latest packages.