Bug 214649 - gnome-session stops (goes into T-state) for unknown (yet) reason
gnome-session stops (goes into T-state) for unknown (yet) reason
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: gnome-session (Show other bugs)
6
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Ray Strode [halfline]
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-08 14:08 EST by Kris Rusocki
Modified: 2011-07-26 14:03 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-09-19 07:33:52 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace output just before stop (2.44 KB, text/plain)
2006-11-08 14:25 EST, Kris Rusocki
no flags Details
Prevent dbus-launch from looking at tty input (256 bytes, text/x-patch)
2006-11-09 17:15 EST, Kris Rusocki
no flags Details
xinit-1.0.2-client-session.patch (659 bytes, patch)
2006-11-10 00:39 EST, Ray Strode [halfline]
no flags Details | Diff
dont crash the sitter after trying to open the display (1.74 KB, patch)
2006-11-13 00:01 EST, Ray Strode [halfline]
no flags Details | Diff

  None (edit)
Description Kris Rusocki 2006-11-08 14:08:15 EST
Description of problem: every now and then (unable to reproduce reliably)
gnome-session stops (probably after getting SIGSTOP?) ang goes to T-state.

As a result many applications won't run and will hang on reading from
gnome-session's ICE socket.

(Ain't an X guy, sorry couldn't provide more info)

Version-Release number of selected component (if applicable):
gnome-session-2.16.0-3.fc6


How reproducible: Unable to reproduce reliably.


Steps to Reproduce:
1.
2.
3.
  
Actual results: gnome-session stops


Expected results: gnome-session doesn't stop


Additional info: killall -CONT gnome-session does the trick.
Comment 1 Kris Rusocki 2006-11-08 14:25:22 EST
Created attachment 140696 [details]
strace output just before stop
Comment 2 Kris Rusocki 2006-11-08 22:18:51 EST
Ooookay. Investigated problem a little bit more... everything (dbus-launch,
gnome-session, gconfd, gnome-keyring-daemon) goes nuts (T) after input on Xorg VT.

Steps to reproduce:
1. boot FC6 (without any display manager)
2. log in
3. startx
4. switch back to the VT
5. press enter
6. ps auxw | grep T
Comment 3 Ray Strode [halfline] 2006-11-09 00:30:26 EST
This is really quite odd.

Normally SIGTTIN would be sent to a process if it tried to read from its
standard input while not being the foreground process of the controlling terminal.

The fact that we get it when you press enter in the controlling terminal makes
me wonder if gnome-session is calling poll() on /dev/tty and then doing a read()
or something.  That would be crazy though.

So I tried to gdb attach to the process when it was stopped, but gdb didn't like
that too much (bug 214740).  I then added a patch to make gnome-session
raise(SIGSEGV) when getting SIGTTIN.  I then gdb attached to it when it
segfaulted.  First off, two gnome-session processes segfaulted at the same time.
 One was in a read() call alright, but it was for a freshly created pipe, not
for the standard input.  The other process wasn't even in a read() call it was
in the middle forking.

very strange.

Comment 4 Kris Rusocki 2006-11-09 10:38:58 EST
> Normally SIGTTIN would be sent to a process if it tried to read from its
> standard input while not being the foreground process of the controlling
> terminal.

How about scenario where you have several processess run on a tty
none of which is foreground process and only *one* of them tries to
read from tty? (Should TTIN be deliverd to the very one which reads
or maybe? to all of them?)

I've found a definitely offending code in dbus (dbus-launch.c:428).
There's a loop that consumes any tty input with non-blocking read.
We get there (:428) because (guessing) select() returns tty_fd for reading...

I'll try removing this code in dbus-launch and see what difference
it makes.
Comment 5 Kris Rusocki 2006-11-09 13:10:08 EST
Blocking read that was, sorry. Anyway... will report back on that dbus-launch
code in few hours.
Comment 6 Kris Rusocki 2006-11-09 17:09:57 EST
Yes. So.. yes.... I just did a proof-of-concept workaround in dbus-launch
(dbus-launch.c:366). I explicitly set tty_fd to -1 and that seem to prevent
all Bad Things from happenning.

Q1: Why does dbus-launch look at tty input in the first place? (it discards it
after reading so why bother?)

It looks (does it?) like my assumption was correct - only dbus-launch got
tty fd selected for reading, only dbus-launch tried to read from tty fd... BUT
some other processess hooked up to this tty also got TTIN (?.

Q2: Why? and is it expected behavior?

Comment 7 Kris Rusocki 2006-11-09 17:15:16 EST
Created attachment 140847 [details]
Prevent dbus-launch from looking at tty input

It actually was :361.. aanyway.
Comment 8 Ray Strode [halfline] 2006-11-09 18:01:00 EST
so what dbus-launch is doing looks pretty bogus.  I think the idea is to try to
handle terminal sessions exiting or something (and it has a separate code path
for X sessions exiting).  On the other hand, I don't understand why it's causing
the behavior we're seeing here. 

Roland, do you have any idea what's going on?

As an aside, gnome-session should probably set it's stdin to /dev/null anyway
since it's not going to use it (and maybe lead its own session).  Also, maybe
xinitrc should be redirecting /dev/null to standard input before running
dbus-launch.
Comment 9 Havoc Pennington 2006-11-09 18:18:35 EST
dbus-launch is looking for eof from the terminal, so if you e.g. ran it from an 
xterm then closed the xterm (but not the X session) dbus-launch would exit.

Not sure of the right fix. I believe it is looking for eof because HUP didn't 
always happen, or something. I don't really remember, maybe there's something 
in the ChangeLog.


Comment 10 Roland McGrath 2006-11-09 18:48:45 EST
When a background process tries a read, SIGTTIN is sent to the entire process
group.  (Similarly, a ^Z or ^C sends SIGTSTP or SIGINT to the whole pgrp.)
However, when a background process has ignored or blocked SIGTTIN and calls
read, there is no SIGTTIN generated at all and the call just fails with EIO.  So
a process could ignore SIGTTN before reading, but that is not actually useful
for anything other than masking the error.  If the process in fact should be
reading at all, then it needs to be the foreground process group when it reads.
Comment 11 Havoc Pennington 2006-11-09 19:15:17 EST
That makes sense, someone has to "own" stdin on a terminal at any given time or 
who knows who will read each byte.

A sledgehammer fix might be for dbus-launch to give all its child 
processes /dev/null for stdin instead of sharing the tty with them - i.e. just 
take over the terminal input exclusively. That's what we want whenever 
launching a GUI app for sure (I think g_spawn_* does this by default even).

However, that would break useful things like "dbus-launch bash"

Maybe the right behavior is something like dbus-launch should only look at EOF 
on stdin if its non-bus-daemon child exits (safe to assume that any child 
terminal app would exit if the terminal closes?) or if it has no child process.
Comment 12 Kris Rusocki 2006-11-09 20:36:06 EST
Dumb (but straightforward) 'fix' would be introducing command-line parameter
that would tell dbus-launch not to pay attention to stdin at all... (and
one could put it /the parameter/ into xinitrc).

But I bet that it's not satisfactory, or is it?
Comment 13 Kris Rusocki 2006-11-09 21:26:09 EST
For the record: I browsed cvs a little bit and it seems that this
stdin-read-loop was in the very first functional version commited (1.2).

So if there's any story to it - it's not in the cvs..
Comment 14 Havoc Pennington 2006-11-09 21:52:33 EST
At our current level of understanding, I think that command line option would 
amount to --do-random-other-wrong-thing-besides-the-default-wrong-thing

First things first - we should understand what the situations are where dbus-
launch is used and how they should work and how they currently do work.
Comment 15 Ray Strode [halfline] 2006-11-09 23:40:29 EST
So like I mentioned in comment 8, we could workaround this by adding 
< /dev/null to the end of our dbus-launch lines in /etc/X11/xinit/xinitrc
but I don't think we should have to do that.  I think this is really an xinit
bug though.  It should set stdin to /dev/null and start a new session for the
client it starts.  I talked to Adam a bit about this and he thinks its probably
okay to fix things up there.
Comment 16 Ray Strode [halfline] 2006-11-09 23:47:12 EST
I still want to brain dump a bit though...  I think trapping for SIGHUP instead
of reading for EOF *should* work.  xterm is going to run bash and bash is going
to start a new session.  I think that when xterm exits it sends a SIGHUP to
every member of the process group that bash is running in (which will probably
be just the lone bash process I guess).  Looking at the vte code, that's what it
does.  So, bash will then die from the hang up signal.

Now, taking a step backward... if you type "dbus-launch" at a bash prompt then
bash is going to create a new process group for dbus-launch and make dbus-launch
the process group leader of that process group.

When bash (the session leader) dies, dbus-launch is going to be orphaned. 
/sbin/init will become dbus-launch's parent, and the entire dbus-launch process
group (including dbus-launch itself) should get sent SIGHUP, because orphaned
process groups are supposed to get SIGHUP (according to my stevens book).

So I'm a little surprised the hangup technique didn't pan out.  Havoc, do you
remember any details about why it wasn't working?
Comment 17 Ray Strode [halfline] 2006-11-10 00:39:30 EST
Created attachment 140864 [details]
xinit-1.0.2-client-session.patch

this patch makes the changes to xinit I mentioned in comment 15
Comment 18 Havoc Pennington 2006-11-10 00:56:53 EST
Basically no, I don't remember at all what I was testing or why I did it that 
way...

We can try just removing the read-from-stdin code and see what happens. I guess 
it might be good to test that dbus-launch properly kills the launched bus in 
some common situations with that change, though; e.g. from linux console, from 
an xterm, from gnome-terminal, and maybe from an ssh session that gets dropped.


Comment 19 Ray Strode [halfline] 2006-11-10 10:58:13 EST
Krzysztof, I've built xorg-x11-xinit-1.0.2-14.fc6 which should be showing up in
updates-testing sometime soon.   When it does, can you test if it address this
problem?
Comment 20 Kris Rusocki 2006-11-11 03:36:21 EST
Ray, no, it does not seem to help; btw, the only difference between stock
xorg-x11-xinit-1.0.2-12 and -14 are four bytes in the xinit binary. Are
you sure that your modification went in? (which, as I understand, should
touch scripts, right?)
Comment 21 Ray Strode [halfline] 2006-11-11 10:52:44 EST
When I copied the fix over from rawhide to fc6-updates I didn't actually apply
it to the build.  Sorry about that.  -15.fc6 is building now.
Comment 22 Ray Strode [halfline] 2006-11-13 00:01:49 EST
Created attachment 141015 [details]
dont crash the sitter after trying to open the display

Hi Havoc,

So I briefly had a look at removing the read-from-stdin code tonight.  One
thing I ran into right away is that the existing code didn't seem to work.  If
I opened a terminal and ran dbus-launch --exit-with-session bash and then
looked the ps output, dbus-launch wasn't running.  It turns out it crashes
right after opening the display because the x11_init function tries to
pre-create some atoms needed by the autolaunch code path without having a
machine uuid (so a NULL value gets passed to strlen() somewhere).  A possible
fix for that is above.

After removing the read-from-stdin code, things still worked okay with xterms
and gnome-terminals and ssh connections, but not so well for virtual consoles.

I haven't figured out why yet.
Comment 23 Havoc Pennington 2006-11-13 00:28:45 EST
Can you post the crash fix to the dbus list? Thiago should probably see it as 
well as John.

Comment 24 Kris Rusocki 2006-11-13 03:44:24 EST
Ray, -15 works as expected.
Comment 25 Ray Strode [halfline] 2006-11-13 10:38:34 EST
Krzysztof,

thanks, I'll push the update out of testing and close this then.

(and I'll move the rest of the dbus discussion to the dbus list).
Comment 26 A S Alam 2007-09-19 07:33:52 EDT
fc6 update contains latest packages.
Closing bug

Note You need to log in before you can comment on or make changes to this bug.