Red Hat Bugzilla – Bug 127048
Processes stuck in a system call
Last modified: 2015-01-04 17:07:28 EST
Description of problem:
An attempt to ssh to a machine sitting idle for a while
(an order of two hours) resulted in a stuck bash
process. A ps output showed the following:
root 2098 Ss 12:42 0:00 /usr/sbin/sshd
root 9589 Ss 16:22 0:00 \_ sshd: michal [priv]
michal 9591 S 16:22 0:00 \_ sshd: michal@notty
michal 10622 Ss 16:22 0:00 \_ -bash
while an attempt to strace gave this:
Process 10622 attached - interrupt to quit
and that was all.
While trying to open a new terminal window this, for a
change bogged down in gconfd-2. A new window showed up
but it remained empty while ps told that
root 10663 S 16:25 0:00 /usr/libexec/gconfd-2 14
with strace like this:
Process 10663 attached - interrupt to quit
Getting new login shells on a text console luckily was
not a problem. The whole "Show State" from sysrq is
The trouble is not obvious to reproduce. After
rebooting into another kernel (2.6.7-1.457) I could
open terminal windows and ssh to a machine in question
without further troubles but ... I can say the same
after I rebooted with 2.6.7-1.459 again.
It is quite possible that I misattributed what I
reported in bug #126814 and that this was really
another manifestation of the same trouble. That one,
for a change, showed up immediately after a yum run
when bunch of updates was performed. I failed to notice
such behaviour in any circumstances on i386
installation (but maybe I was just "lucky").
Version-Release number of selected component (if applicable):
Created attachment 101548 [details]
output from "sysrq t" with stuck processes
I got myself today in the same situation but running 2.6.7-1.469.
Symptoms are the same, i.e. sys_read() is sitting and not reading
anything. I have results of "Show State" saved is somebody wants
to have a look.
The same problem struck again after a running a bunch of recent
updates. This time the kernel was 2.6.7-1.492. Symptoms do not
change. Cannot ssh to a stricken machine as bash is blocked on
'read()', cannot open a new gnome terminal window because
gconfd-2 waits for 'poll()' to return. No problems with a new
login on a local text console.
The only possible clue is that a spawned sshd process shows
'sshd: michal@notty' instead of an expected 'sshd: michal@pts/0'
or something like that. But devpts still seems to be mounted
on /dev/pts and nothing appears to be wrong with it.
Although running yum with a bunch of updates seems like a pretty
good way to generate that condition this happens on other occasions
too even if I have no way to do that on demand.
(The other comment was supposed to be mailed yesterday but it got
stuck somewhere and I failed to notice :-).
The same happened again after updates from fedora-3-0.20040723.
But this time I tried to logout from X and on a console to
unmount and mount again devpts on /dev/pts. After that operation
I can log from a remote via ssh and from a terminal window on
a console again. There is still someting weird, though.
Namely 'who' shows this:
michal :0 Jul 23 13:05
michal pts/0 Jul 23 13:05 (:0.0)
root pts/1 Jul 23 12:57 (:0.0)
michal pts/1 Jul 23 13:06 ("remote")
"root" login at 12:57 is from _before_ things got tied into a knot;
all other are _after_ /dev/pts got remounted. No root login is
active at this moment and both /dev/pts/0 and /dev/pts/1, which
are the only pseudo-ttys at the moment, are owned by me.
OTOH 'last' has this to say about that "root" login in question:
root pts/1 :0.0 Fri Jul 23 12:57 - 13:06 (0:09)
with a logout timestmap coinciding with the moment when pts/1 was
"taken over" by my remote login.
I got hit by the problem again (kernel 2.6.7-1.494 this time).
I can only add that I was able to remount devpts, making it read-only
and read-write again, but this did not have any effect. Only
unmounting devpts and mounting it again helps.
When the problem struck I also tried, from an already exisiting
window, something like 'gnome-terminal -e tcsh'. Although a terminal
window eventually opens, after a very long wait, I could not find
any traces of csh in a ps output even after quite some time. This
is likely not very surprising as 'tcsh' seem to be starting in such
situation as a child of bash (or rather probably of $SHELL) which
is waiting for a never coming pseudo-tty. Pseudo-ttys already
in use are fine.
There was also this report:
(which seems to be for x86) but I have no idea if it is related.
I have experienced this as well several times, only since upgrading
to FC3/rawhide, on kernel .492 and probably early ones as well. It
always strikes after the system has been up for some time; almost 12
days the last time it occurred, but I normally open a bunch of shells
right after I boot and then don't open any more, so I couldn't say
when the condition actually started.
Note that I am running the i686 kernels on an Athlon XP, so this
definitely affects more than just x86_64.
Before I had no idea how to approach the problem... next it happens
I'll try to verify the various tests that Michal tried above.
Less than 24 hours after my latest reboot (updated to stuff from
20040729), the problem hit again.
(I'm not entirely sure which of the last four are germane; you'll see
why I listed /dev-related stuff shortly)
After some testing, I'm beginning to think that I may not have the
same problem as Michal. LMK if I should file this as a separate bug.
First, the problem occurs for me when trying to open another tab in
Konsole (haven't tried ssh'ing in, don't have sshd configured). The
tab opens, but you just get the cursor, no shell prompt.
Unlike what Michal reports, I don't get another process hanging; I
don't get another process at all. I did a "strace -f -ff -F -ttt -o
konsole.strace konsole" to see what was going on; that's attached (I
got no .pid files so either I'm running it wrong or it didn't get
around to forking a bash). In the strace, after all the KDE Krap,
you'll notice it spends a good ten seconds trying to open a pty, with
As Michal reports, I can still login fine on the text console. Also,
"bash" or "sh" or whatever from one of the already-open shells in
konsole works fine.
All the /dev/pt* are 0666, so it doesn't appear to be a permissions
xterm: Error 32, errno 2: No such file or directory
Reason: get_pty: not enough ptys
... which also points to some kind of pty exhaustion.
I'm not sure what else to check. If you can't reproduce this and
being able to telnet in (yeah I really should set up sshd) would be
helpful, let me know; my system is still otherwise stable so I'm
keeping it running in this state.
Created attachment 102342 [details]
strace of konsole startup
I never said that this is permission problem; only that unmounting
devpts and mounting it again makes things work. Just remount is
The fact that Wes got "not enough ptys" from xterm, where likely
only a few were really in use, seems to indicate that this is another
manifestation of the same trouble.
The next time I will run into that I will try if I can get the
same error from 'xterm' and also what 'sysctl kernel.pty.nr'
and 'sysctl kernel.pty.max' have to say.
Would also be interesting to know if 'who' shows anything funny ...
(system is still in the bad state)
$ cat /proc/sys/kernel/pty/nr
$ cat /proc/sys/kernel/pty/max
doesn't look like pty exhaustion, assuming the accounting is
Output of "who" is completely normal, and does not change no matter
how many hung-trying-to-start-shell konsole tabs I have open--but I
haven't used Michal's unmount-mount devpts trick.
I just run update through yum which updated 25 packages. After that
I got stuck in "no-new-shell mode" again although mozilla, for
example, got up without any problems. 'who' says
root :0 Aug 2 12:20
root pts/0 Aug 2 12:20 (:0.0)
root pts/1 Aug 2 12:25 (:0.0)
(with the current local time beeing roughly an hour later), and
kernel.pty.nr = 2
kernel.pty.max = 4096
Nothing unusual in permits and no visible changes in mount.
Shucks! Forgot to look directely what 'cat /proc/mounts' has
to say but probably nothing special.
After logging out, unmounting devpts and mounting it again I am
back in business. More and more baffled with every incident.
I got into the same funk once again. In the meantime I
run rsync to get grab some more packages and other than that a machine
was sitting idle for a while. 'cat /proc/mounts' does not show
up anything unusual and "none /dev/pts devpts rw 0 0" in particular.
An output of 'who' is a bit more interesting:
root tty1 Aug 2 13:18
root :0 Aug 2 13:19
root pts/0 Aug 2 13:19 (:0.0)
root pts/1 Aug 2 12:25 (:0.0)
michal pts/1 Aug 2 13:19 ("remote")
but this could be just 'who' a bit confused.
BTW - kernel this time is 2.6.7-1.501
I should add that after logging out and unmounting /dev/pts
'w' says "3 users" but it shows only the current login on tty1.
OTOH 'who' "remembers" a non-existent now logins from "remote"
and from ":0.0". In any case these numbers are far from
This pty bug is already in bugzilla somewhere...
Do you mean bug #128154? Surely it was not there when I filed
the original report. :-)
I haven't experienced this bug in a while. Is anyone else (who like
me is keeping up with new kernel builds) still seeing it? Over on <a
Warren he hasn't seen it since 525... Maybe time to close both of
these bugs with resolution rawhide?
any reoccurance of this bug with the last few update kernels ?
No recurrances here yet; it still appears to be fixed. I'm currently using
2.6.10-1.1076_FC4, haven't had a chance to reboot to today's rawhide kernel
(1087) yet. Rest assured you'll hear about it if I do see the bug again ;-)
ok, I'll close this, feel free to reopen if it reoccurs.
> any reoccurance of this bug with the last few update kernels ?
I am not sure if a bug #145021 gives a new instance of the same
problem or this is something new.