Bug 128154 - cannot spawn new psuedo tty (xterm, gnome-terminal, ssh)
Summary: cannot spawn new psuedo tty (xterm, gnome-terminal, ssh)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Dave Jones
QA Contact:
URL:
Whiteboard:
: 126772 127902 128346 128558 129416 130595 131214 133128 135051 (view as bug list)
Depends On:
Blocks: FC3Target FC3BugWeekTracker FC4Target
TreeView+ depends on / blocked
 
Reported: 2004-07-19 13:33 UTC by James Laska
Modified: 2015-01-04 22:08 UTC (History)
36 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-06 05:23:18 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace xterm (53.95 KB, text/plain)
2004-07-19 13:35 UTC, James Laska
no flags Details
A testcase to show the memory leak and (276 bytes, text/plain)
2004-09-21 16:43 UTC, H.J. Lu
no flags Details

Description James Laska 2004-07-19 13:33:44 UTC
KERNEL: 2.6.7-1.492
ARCH: i386

I can login fine from any physical console tty1-7.  However, after my
machine has been up for a while, and I have several terminals open
(some with multiple tabs)...my display hangs for a while (~1min) when
I attempt to open a new terminal or tab in an existing terminal.  The
same hang occurs if I try to ssh into my machine.

I have straced xterm in an attempt to show where the bottleneck is
occuring.  I will attach the output after filing the bug.

I currently have the following tty's in use:

$ finger
Login     Name             Tty      Idle  Login Time   Office    
Office Phone
jlaska    James A. Laska  *:0             Jul 19 07:39
jlaska    James A. Laska   pts/0    1:06  Jul 19 07:42 (:0.0)
jlaska    James A. Laska   pts/1       4  Jul 19 07:54 (:0.0)
jlaska    James A. Laska   pts/2    1:06  Jul 19 07:55 (:0.0)
jlaska    James A. Laska   pts/3       8  Jul 19 07:55 (:0.0)
jlaska    James A. Laska   pts/4          Jul 19 08:36 (:0.0)

Please let me know if there is additional information I can provide to
assist in debugging.

Comment 1 James Laska 2004-07-19 13:35:00 UTC
Created attachment 102033 [details]
strace xterm

Comment 2 Warren Togami 2004-07-19 14:15:09 UTC
This may be related to the strange behavior that I see with
gnome-terminal for about the past 1.5 weeks while using rawhide.  When
opening new gnome-terminal windows, or new gnome-terminal tab of an
existing session, something would use 100% CPU and the system will
appear to be deadlocked for a while (mouse unable to move).  After
roughly 30-60 seconds the system returns, and CPU usage goes back
down.  gnome-terminal does open a new terminal or tab, but the shell
would fail to start.  Attempting to run konsole would be similar, with
100% CPU briefly and failed startup of the shell, but the entire
system does not lockup entirely like gnome-terminal.

In order to recover, I only need to fully close all gnome-terminal
sessions within my GNOME session.  Then newly opened terminals work.

What seems to trigger this behavior was something that happens during
certain RPM upgrades from rawhide, or the daily prelink.

Also note that the 100% CPU usage when attempting to start a terminal
does not appear on top or ps output.  This may be an indication that
something is rapidly creating a new process or thread, and it dies
quickly.

Comment 3 Warren Togami 2004-07-19 14:17:54 UTC
One thing that I have not tried yet:
It would be good if we could better isolate the problem by trying the
latest FC2 update kernel with rawhide userspace.  If it is a kernel
problem like we suspect, then this problem should go away completely
by using only the FC2 kernel.

Comment 4 James Laska 2004-07-19 14:59:58 UTC
warren: the problem you described EXACTLY matches what I've been
seeing the last week or so from rawhide.  I can try rolling back to
the latest stable FC2 kernel ... however it will take a bit since this
is my primary work machine.  I will post back with results...

Comment 5 James Laska 2004-07-22 12:54:25 UTC
warren:  I'm currently running with the FC2 updated kernel
2.6.6-1.435.2.3 and all other packages from rawhide-latest.  I will
post if I encounter the issue again later today.

Comment 6 Warren Togami 2004-07-23 01:28:09 UTC
Try this procedure:
1) Boot the FC2 kernel.
2) Start a gnome-terminal.
3) From a VT, erase /etc/prelink.cache
4) Run /etc/cron.daily/prelink
Allow it to complete.

At this point I suspect this will trigger the problem.  See if
behavior of this is any different between FC2 and FC3 kernel.

Comment 7 Warren Togami 2004-07-23 02:57:39 UTC
*** Bug 128346 has been marked as a duplicate of this bug. ***

Comment 8 James Laska 2004-07-23 17:48:03 UTC
Nice reproducer, hitting this, or a similar, problem after making the
above prelink changes on 2.6.6-1.435.2.3 (FC2).  The gnome-terminal
will hang and not screen refresh.  Any new gnome-terminals hang as
well.  However, after several minutes the terminal comes back.  This
is slightly different from what I am seeing on the rawhide-latest
kernel in that when the terminal finally became available, there was
no shell on the terminal.

So I'm not certain this is the exact problem.  I will move back up to
FC3 kernel and see if the prelink changes trigger the problem immediately.

Comment 9 Ray Strode [halfline] 2004-07-26 16:59:21 UTC
*** Bug 128558 has been marked as a duplicate of this bug. ***

Comment 10 pascal kolijn 2004-07-27 07:15:13 UTC
I've even experienced it if I want to log in remotely (ssh) to the
box, I get no shell. But the ssh tunnels are there.....

my kernel rawhide 2.6.7-1.494smp, and I'm not sure if it is prelink
related, prelink is a nightly job, right ? I seems to me as if the
number of pty's at a certain moment are all used up !

Comment 11 Mihai Ibanescu 2004-07-29 17:54:42 UTC
Trying to start xterm from a terminal will print:

xterm: Error 32, errno 2: No such file or directory
Reason: get_pty: not enough ptys


Stracing it:

...
open("/dev/ptyeb", O_RDWR)              = -1 ENXIO (No such device or
address)
open("/dev/ptyec", O_RDWR)              = -1 ENXIO (No such device or
address)
open("/dev/ptyed", O_RDWR)              = -1 ENXIO (No such device or
address)
open("/dev/ptyee", O_RDWR)              = -1 ENXIO (No such device or
address)
open("/dev/ptyef", O_RDWR)              = -1 ENXIO (No such device or
address)
write(2, "xterm: Error 32, errno 2: ", 26xterm: Error 32, errno 2: ) = 26
open("/usr/share/locale/locale.alias", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0
...

Comment 12 Mihai Ibanescu 2004-07-29 19:22:19 UTC
Actually, the place where it goes haywire is:

close(5)                                = 0
open("/dev/ptmx", O_RDWR)               = -1 EIO (Input/output error)
open("/dev/ptyp0", O_RDWR)              = -1 ENXIO (No such device or
address)
open("/dev/ptyp1", O_RDWR)              = -1 ENXIO (No such device or
address)


/dev/ptmx normally opens just fine, and then te codepath is different
- it gets /dev/pts and then goes on its merry way.

/me keeps searching

Comment 13 Barry K. Nathan 2004-07-30 00:02:57 UTC
2.6.8-rc2-mm1 (and, more recently, 2.6.8-rc2-bk8) have a patch to fix
some kind of pty-related leak. I don't know if that's the fix for this
bug, but it could be.

Comment 14 Alexandre Oliva 2004-08-01 22:27:45 UTC
FWIW, I don't seem to run into this problem with kernel 1.488, but
1.494, 1.499 and 1.501 all fail in the same way as described above.

Comment 15 Rik van Riel 2004-08-01 23:57:42 UTC
1.499 and newer have 2.6.8-rc2-bk8, so they should have the patch
mentioned by Barry.  I'm trying to reproduce the problem here (using a
script to log in, do stuff and log out again) but I'm not seeing any
use of ptys beyond the needed ones.

This is with kernel 1.499

Comment 16 Barry K. Nathan 2004-08-02 07:11:42 UTC
I saw this problem on PowerPC months before seeing it on x86. The PPC
machine in question is no longer mine (for the foreseeable future
anyway) and is now running Mac OS X for its new duties. But, when I
had it, I had this problem with (I think) kernel 1.456. Unfortunately
I didn't have a chance to report it at the time.

(It's possible that I had the problem with an even earlier kernel,
possibly as early as 1.422. I don't remember for sure; what I am
absolutely certain about, however, is that the problem appeared before
1.460.)

On a related note, comment 11 of bug 127902 may be worth looking at.
(Maybe someone should copy-and-paste it into this bug, in fact.)

Comment 17 Mihai Ibanescu 2004-08-02 20:26:46 UTC
Still happens with kernel 1.499

[misa@abulafia python]$ python -c 'open("/dev/ptmx", "rw")'
Traceback (most recent call last):
  File "<string>", line 1, in ?
IOError: [Errno 5] Input/output error: '/dev/ptmx'

[misa@abulafia python]$ uname -a
Linux abulafia.devel.redhat.com 2.6.7-1.499 #1 Wed Jul 28 12:11:10 EDT
2004 x86_64 x86_64 x86_64 GNU/Linux

[misa@abulafia python]$ uptime
 16:25:24 up 3 days, 20:22, 29 users,  load average: 0.45, 0.49, 0.36

No easy way to reproduce this bug, other than leave the box run for
about 3 days.

Just confirmed that my other rawhide box running 499 is having the
same problem.

The weird problem is that the problem goes away after gnome-terminal
is restarted.


Comment 18 Barry K. Nathan 2004-08-02 23:24:39 UTC
> The weird problem is that the problem goes away after gnome-terminal
> is restarted.

Heh... not for me it doesn't. :( In my case I can certainly kill X and
that doesn't cure the problem. IIRC I can even go from runlevel 5 to 3
(so X doesn't restart) and the pty's will still stay unavailable.

Comment 19 Alexandre Oliva 2004-08-03 01:50:41 UTC
On comment #14, I said I didn't get the problem with 1.488.  I lied. 
Just got it for the first time.

Comment 20 Jens Petersen 2004-08-05 13:14:59 UTC
*** Bug 126772 has been marked as a duplicate of this bug. ***

Comment 21 Bill Nottingham 2004-08-05 18:27:48 UTC
*** Bug 127902 has been marked as a duplicate of this bug. ***

Comment 22 Michal Jaegermann 2004-08-05 22:12:46 UTC
See also bug #127048 with a list of quite detailed reports in it.
It looks remarkably similar.

In short the only "cure" I found is to get down to logins only
on a console, unmount /dev/pts and mount it again; even that is
only a temporary solution.  On my rig this is really a killer bug
but I did not run into it on an x86 installation.

Comment 23 G.Wolfe Woodbury 2004-08-08 20:17:50 UTC
Just got bit by this bug with FC3t1 kernel 509 on a rawhide install
from 2004-08-07 (clean rawhide install)

Comment 24 Geoff Reedy 2004-08-09 12:46:55 UTC
I've finally gotten the bug to happen while I've got a kernel running
with some printk's to try and diagnose this thing.  Where ever someone
could be returning EIO to tty_open (and thus it passing back to user
space) I put in a printk to show what condition was causing the
problem.    

My logs show the following:

Aug  8 11:07:47 localhost kernel: pty_open failed with -EIO closed: 0
lock: 0 count: 0
Aug  8 15:45:38 localhost kernel: init_dev returning -EIO because
tty->count != 0, instead it was 1
Aug  8 15:45:38 localhost kernel: tty_open init_dev failed with -5 the
id was 0

Here is my guess at what is happening:

The first thing I have in my logs is that pty_open is returning EIO
because tty->link->count is 0 instead of one.  I don't recall trying
to get a new pty (if I were trying to get one, this whole explanation
falls apart, but the conclusions still hold which is kind of strange)
 at the time that event is logged so lets assume that pty_open was
called opening something other than /dev/ptmx (perhaps the slave end
which happens whenever a new program is started on a given pseudo
terminal.  This return value gets back to tty_open which reacts to any
failure code by de-allocating the pty id specified in the index
variable, even if one was not allocated in this call.  

On subsequent attempts to open /dev/ptmx idr_get_new happily returns
an id of zero since it has been previously freed.  init_dev asks the
devpts driver if the tty already exists.  Since we are dealing with
/dev/pts/0 now and I still have my terminal window open it looks at
the existing tty object.  Pty masters are restricted to only be opened
once (otherwise people might somehow be able to inject keystrokes or
something like that) and gnome-terminal still has the master for
/dev/pts/0 open so the check fails.

There are a couple of conclusions here:

1. Something is broken if pty_open is hitting the condition it is. 
You can only call pty_open on a master pty once in which case
tty->link will be the slave and it should only have a count of one
from init_dev incrementing the slave's count.  If you are pty_opening
the slave tty->link is the the master which should only be able to be
opened.

2. The part of tty_open that calls idr_remove on a failure should
probably have an if (device == MKDEV(TTYAUX_MAJOR)) around it so that
it is called only if idr_get_new was also called on that invokation of
tty_open.  Just doing this should fix the symptom we're seeing, and is
necessary since things will also get messed up if for some reason the
console device fails to open or something like that.  But someone
should try and figure out why tty->link->count went to zero.  

3. A work around to allow you to open more terminals when you get bit
by this bug:  Close whatever is using /dev/pts/0.  Then when
idr_get_new comes up with id 0 it is actually unused and things will
work normally until the bug happens again.

Comment 25 H.J. Lu 2004-08-09 15:31:21 UTC
*** Bug 129416 has been marked as a duplicate of this bug. ***

Comment 26 Michal Jaegermann 2004-08-09 17:10:51 UTC
> 3. A work around to allow you to open more terminals when you get
> bit by this bug:  Close whatever is using /dev/pts/0.

My empirical observations seem to indicate that this is not enough.
If something is holding open /dev/pts/0 then obviously you cannot
unmount devpts.  But closing it is not enough.  Only unmounting
/dev/pts and mouting it again clears the condition.  At least for
a while.  OTOH I could not get anything unusual with 'lsof'; but
all that probably means that something is indeed miscounted and that
this status persists.  Just remounting /dev/pts is insufficient -
at least in my case.

Comment 27 Geoff Reedy 2004-08-09 17:23:58 UTC
Oh yeah, specifically closing /dev/pts/0 was specific to my situation.  I just had to close 
that one, I could leave all my other terminals open and did not have to unmount /dev/pts.

Perhaps there is a slightly different but related problem that involves the devpts 
filesystem.

As far as I can tell from the code and my logs the pty that has been mistakenly marked as 
available needs to be closed.  Without having extra logging in the kernel there is no way to 
know which one this is.  To totally clear the problem (at least until it spontaneously 
happens again) you would have to close all the terminals to be sure to get the one that is 
hung up.  Although closing any terminals before the one that is incorrectly marked as free 
should allow you to open exactly that many terminals before encountering the problem 
again.

Comment 28 Michal Jaegermann 2004-08-09 20:00:51 UTC
Just to be sure I run 'strace xterm' on x86_64; with kernel
2.6.7-1.509 this time.  It is remarkably similar to what James
put in an attachment from comment #1.  One sees

open("/dev/ptmx", O_RDWR)               = -1 EIO (Input/output error)

and it is downhill from that moment on.  So I do not see much point
in dropping it here too; but if somebody thinks that this would be
a good idea for a completness then give a shout and I will add it
to this report.

Comment 29 Michal Jaegermann 2004-08-13 20:34:44 UTC
There is something here
ftp://ftp.linux.org.uk/pub/people/viro/ptmx-delta
Not tested, and I did not even look yet how it fits inot
Fedora kernels, but it seems to have the right smell. :-)

Comment 30 Michal Jaegermann 2004-08-13 20:37:37 UTC
Of course s/inot/into/ above.  Reading before hitting "Commit"
has some advantages.

Comment 31 Barry K. Nathan 2004-08-15 03:10:37 UTC
The patch mentioned in comment #29 is now in kernel-2.6.8-1.520. (I
just looked at several previous kernel SRPMS; the patch is present
from 515 forward, but not in 509.)

Comment 32 Michal Jaegermann 2004-08-15 03:23:50 UTC
kernel-2.6.8-1.520 is a test kernel for FC2 (i.e. a different compiler)
but indeed so far I was not hit by the problem with kernel-2.6.8-1.517.
I am afraid that I cannot say one way or another about 2.6.8-1.515.

Comment 33 Ray Strode [halfline] 2004-08-23 20:32:44 UTC
*** Bug 130595 has been marked as a duplicate of this bug. ***

Comment 34 Detmar Meurers 2004-08-26 16:44:09 UTC
With kernel-2.6.8-1.521 I'm getting what looks like it might be the
same bug:

I'm trying to run code which opens a pty in the following way, which 
used to work fine:

int get_tty(void) {
  static unsigned char ptyc3[] = "pqrstuvwxyz";
  static unsigned char ptyc4[] = "0123456789abcdef";
  unsigned char *s3, *s4;  

  int ourpty = -1;
  for (s3 = ptyc3; *s3 != 0; s3++) {
    for (s4 = ptyc4; *s4 != 0; s4++) {
      ptynam[8] = ttynam[8] = *s3;
      ptynam[9] = ttynam[9] = *s4;
      if ((ourpty = open(ptynam,O_RDWR)) >= 0) {

The call to open in the last line now always fails with errno 6 (No
such device or address).

Any ideas for a permanent or temporary fix would be much appreciated.



Comment 35 Barry K. Nathan 2004-08-26 22:40:24 UTC
Re: comment #34

Are you saying that your problem happens:
+ always with 521 and
+ never with previous kernels? (i.e. 521 has a regression for you)

Or are you saying:
+ it's been happening intermittently (or always) with the last few kernels
+ and 521 fails to fix it?


Comment 36 Detmar Meurers 2004-08-26 23:00:02 UTC
The latter.

Following a helpful suggestion Misa sent me, I've since rewritten my
code (which e.g. on SuSe 9.0 worked fine) to use the Unix98 setup
(opening /dev/ptmx, then calling ptsname) instead of probing for
/dev/ptyXY myself. That works, apparently independent of the kernel I use.

I've also reverted from 2.6.8-1.521 to the old 2.6.6-1.435 though
since ssh was hanging for me when using the 521 kernel and I haven't
had time to boil down what the problem is there.






Comment 37 Colin Walters 2004-08-30 01:37:19 UTC
I can definitely reproduce this with kernel-2.6.8-1.521; I just
upgraded my FC2 server to it, and it seems to happen within a day or so.

In particular, running screen and trying to allocate a new pty in it
seems to more or less immediately kill the box (well, it's ability to
allocate new ptys anyways).

People have been mentioning 520 fixes it - is the patch not in 521?

Comment 38 Warren Togami 2004-08-30 01:54:38 UTC
I haven't had this problem for a while running 525+.  Tried that?

Comment 39 Warren Togami 2004-08-30 05:01:30 UTC
*** Bug 131214 has been marked as a duplicate of this bug. ***

Comment 40 Boris Glawe 2004-09-02 15:47:03 UTC
switching to runlevel 1 and back to 5 is a workaround. Opening one or
more terminals/ptys will work again, after having returned to the
previous runlevel. Maybe this information helps in finding the problem !?

greets

Comment 41 Mike Soh 2004-09-19 11:50:07 UTC
I'm running kernel 2.6.8.1

Still having this problem.  I found some leads to changing some
CONFIG_PTY_COUNTS (or something like that) setting.  I couldn't find
it in my .config.

I'm running Fedora Core 2 with a vanilla kernel 2.6.8.1

The only resolution that I've been able to find is restart x.

Comment 42 Doncho Gunchev 2004-09-19 12:25:02 UTC
I had this problem with FC3t1, but it vanished last 3-5 days. I use  
kernel 2.6.8-1.533 (540 and 541 do not boot on my smp pc, but I 
gusess I have to mkinitrd for them). 

Comment 43 Barry K. Nathan 2004-09-19 13:22:50 UTC
Re: comment #41

It might be interesting to see if this problem still happens in
2.6.9-rc2 or later.

Comment 44 Warren Togami 2004-09-19 21:07:49 UTC
This has been fixed a while ago as I have noted in Comment #38. 
vanilla kernels are NOT supported and you are on your own.

Comment 45 H.J. Lu 2004-09-21 15:14:45 UTC
Could someone please take a look at bug 132617 and bug 132621.
At least, one of them looks like exactly the same as this bug.
There is a testcase with a patch in each bug report. You can run
the testcases to see if the bugs have been fixed or not.

Comment 46 Michal Jaegermann 2004-09-21 16:34:10 UTC
> Could someone please take a look at bug 132617 and bug 132621.
Hm, got "not authorized" on both so it is hard to take a look.

Comment 47 H.J. Lu 2004-09-21 16:43:15 UTC
Created attachment 104074 [details]
A testcase to show the memory leak and 

Do

# gcc x.c -lutil
# ./a.out

machine will lost memory slowly until run out of memory and stop response.
It will also cause machine to refuse ssh and telnet login.

Comment 48 H.J. Lu 2004-09-23 16:35:20 UTC
When I rlogin into a machine running 1-584 kernel, kernel reports

Sep 23 09:35:39 gnu-64 login: FATAL: can't reopen tty: No such file or
directorySep 23 09:35:41 gnu-64 pam_rhosts_auth[2701]: allowed to
hjl.intel.com as hjl
Sep 23 09:35:41 gnu-64 login: FATAL: can't reopen tty: No such file or
directorySep 23 09:35:47 gnu-64 su(pam_unix)[2704]: session opened for
user root by hjl(uid=500)
Sep 23 09:36:06 gnu-64 pam_rhosts_auth[2725]: allowed to
hjl.intel.com as hjl
Sep 23 09:36:06 gnu-64 login: FATAL: can't reopen tty: No such file or
directorySep 23 09:36:07 gnu-64 pam_rhosts_auth[2727]: allowed to
hjl.intel.com as hjl
Sep 23 09:36:07 gnu-64 login(pam_unix)[2728]: session opened for user
hjl by (uid=0)
Sep 23 09:36:07 gnu-64 login -- hjl[2728]: LOGIN ON pts/1 BY hjl FROM
gnu-d

I can only rlogin after a few tries:
gnu-d:pts/12[2]> rlogin gnu-64                                       
        ~
rlogin: connection closed.
gnu-d:pts/12[2]> rlogin gnu-64                                       
        
rlogin: connection closed.
gnu-d:pts/12[2]> rlogin gnu-64                                       
        
rlogin: connection closed.
gnu-d:pts/12[2]> rlogin gnu-64                                       
        
Last login: Thu Sep 23 09:35:18 from gnu-d



Comment 49 Jef Spaleta 2004-09-28 03:07:15 UTC
Okay, I'm somewhat conflicted about this, but I'm going to set this
report to reassign  becuase I think the comment 47 is important enough
to make sure the right people at least see that comment which was
posted after the bug was marked modified.

I ran the test code in comment 47 and i definitely saw memory leaking.
 This issue probably needs to be addressed, but I'm not sure if this
needs to be refiled under glibc or stay as kernel. Considering its
just opening/closing pty's it certaintly seems like a related problem
to the initially reported symptoms.  I'll let someone more knowlegable
make a final determination as to what to do next with this report. But
I think comment 47 deserves a review. With kernel 451 in normal usage
situations I can't produce the initially reported problem.  

-jef


Comment 50 Ivica Milosevic 2004-09-29 03:59:27 UTC
Hi,

I have the same problem when I try to ssh. When I straced sshd proces 
I saw I/O error on /dev/ptmx. Then I ssh machine /bin/bash -i and 
there was one screen but system seams to forget about his pts, and I 
fuser -k /dev/pts/1. Then everything worked again. It seams that 
system forget about some pts-s and when someone try to ssh (or 
anything which uses pts-s) try to alocate pts which is in use and 
there we go, I/O error...

Comment 51 Ray Strode [halfline] 2004-10-08 13:46:10 UTC
*** Bug 135051 has been marked as a duplicate of this bug. ***

Comment 52 Bryan Wright 2004-10-14 18:39:07 UTC
For what it's worth, I'm currently running into this problem with
a machine running the 2.6.8-1.521 kernel.  The machine is a laptop,
and sometimes when I boot it and log in, xterms just won't start.
The error in .xsession-errors is "get_pty: not enough ptys".  After
a reboot, things will be fine.  

I've been configuring this laptop over the last week or so, and 
I've rebooted it dozens of times.  I've only run into the problem 
twice, but in both cases the only solution I've found is a reboot.

When it's there, the problem seems to be present immediately 
after booting.  (I.e., it's not something that starts happening
after I've been logged in for a while.)

A solution or reliable work-around would be much appreciated.

Comment 53 Boris Glawe 2004-10-14 18:56:13 UTC
@ Bryan Wright:
I've posted a workaround above. You can switch to runlevel 1 and back
to runlevel 5 (or whatever runlevel you have previously been in).
Command: "init 1" and "init 5" as root. This is very fast in
comparison to a reboot.

Comment 54 HR 2004-10-20 10:05:50 UTC
This just happened on my FC2 release-yummified with devel kernel
2.6.8-1.541 (installed due to comments #38 and #44), so I would
definately say that 525+ does NOT solve the problem unless 541 has
reintroduced it.

/var/log/secure:
Oct 20 10:09:56 slugger sshd[6283]: error: openpty: No such file or
directory
Oct 20 10:09:56 slugger sshd[6285]: error: session_pty_req: session 0
alloc failed

The server is close-to-idle meaning it's up, running a range of server
services along with Xorg+gnome and such and cruising at load 0.0x.
Since I only have remote access during work hours, I couldn't check
any status with rgds to open pts' and whatnot. I need access to the
the server from work, so I had to ask someone at the location to flip
the PSU switch. The person in question had just logged into gnome at
the console which may very well be what triggered the problem. Logins
through ssh worked fine a few hours ago.

I often get this message in messages during boot, after rc5 is complete:

Oct 15 23:06:48 slugger init: open(/dev/pts/0): No such file or directory

AFAICT the only thing starting up at this point would be X...

The uptime of the server had been less than 12 hours due to what I
expect was a deadlock issue with cryptoloop, so this bug really blows.
(Yes I know cryptoloop blows too, but its the only released crypto
alternative, aint it...?) Any solution requiring console access is
worthless to me - and anyone else using FC2 in a server setup i recon.

Dunno if its related at all, but the server was subject to a series of
login(breakin) attempts on ssh during the night.

Comment 55 Greg Hartman 2004-10-20 11:21:03 UTC
I've been using FC2 in a location where I don't have console access.
I've been working around the problem with ssh root@<server> reboot.

Apparently, ssh doesn't allocate a pty if it has a command to run. I
suspect that I could get a shell by doing the same with /bin/bash
instead of reboot.

I admit that rebooting the machine twice a day isn't a very good
solution...

Comment 56 Satish Balay 2004-10-21 14:15:53 UTC
The test code from comment #47 appears to work (without hogging system
memory) on kernel-2.6.9-1.639. (run more than 15 min)

perhaps this issue is fixed now? <assuming the test code is a proper
indicator of the original bug reported>

Comment 57 Oliver Falk 2004-10-27 08:11:03 UTC
I had a similar problem, after upgrading kudzu, hal, udev, some
dependency that was resolved by yum... Please note, that I still ran
dev, instead of udev at this time. The newest udev, seems to obsolete
dev (please correct me if I'm wrong; But I believe so, since dev was
no longer installed after I upgraded...). udev mounted /dev and
afterwards the mountpoint /dev/pts was no longer accessible - as you
can guess... => ssh login => cannot allocate pty stuff.
So for my understading, it might be good to remount /dev/pts...

Comment 58 Nalin Dahyabhai 2004-11-02 01:36:32 UTC
*** Bug 133128 has been marked as a duplicate of this bug. ***

Comment 59 H.J. Lu 2004-11-10 00:28:59 UTC
FYI, I still see the random rlogin failure under RHEL 4 beta 2 on
ia64.

Comment 60 Panu Matilainen 2005-05-06 07:44:11 UTC
I'm seeing something similar to this on RHEL 4 x86_64. After the system has been
up for some time (I typically only log in to this from console once every week
or so, so dunno how quickly it starts happening) starting new
xterm/gnome-terminal fails, yet I can ssh into the system just fine. Strace of
gnome-terminal says this:

27044 open("/dev/ptmx", O_RDWR)         = 3
27044 statfs("/dev/pts", {f_type="DEVPTS_SUPER_MAGIC", f_bsize=4096, f_blocks=0,
f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255,
f_frsize=4096}) = 0
27044 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
27044 ioctl(3, TIOCGPTN, [1])           = 0
27044 stat("/dev/pts/1", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 1), ...}) = 0
27044 statfs("/dev/pts/1", {f_type="DEVPTS_SUPER_MAGIC", f_bsize=4096,
f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0},
f_namelen=255, f_frsize=4096}) = 0
27044 ioctl(3, TIOCSPTLCK, [0])         = 0 27044 ioctl(3, SNDCTL_TMR_TIMEBASE
or TCGETS, {B38400 opost isig icanon echo ...}) = 0
27044 ioctl(3, TIOCGPTN, [1])           = 0 27044 stat("/dev/pts/1",
{st_mode=S_IFCHR|0600, st_rdev=makedev(136, 1), ...}) = 0
27044 open("/dev/pts/1", O_RDWR|O_NOCTTY) = -1 EACCES (Permission denied)

..which seems to be the problem. 'mount -o remount /dev/pts' seems to have cured
the thing for now.

Comment 61 Kenny Yu 2005-06-01 13:37:36 UTC
FC3 for X86_64 was fine for an initial installation, but after upgrading the
Openssh this problem showed up after the system is up for several hours.

ssh can execute any command on the FC3 host, but dies with the following message
for a simple ssh login.

Warning: no access to tty (Bad file descriptor).
                                                Thus no job control in this shel
l.

After examination I found that the directory /dev/pts was gone for no reason.

My current solution is:

ssh root@FC3_host "mkdir /dev/pts;mount /dev/pts"

and everything goes back to normal.


Comment 62 Maria Ellison 2005-06-03 19:53:44 UTC
I'm still plagued by this problem, running FC2 2.6.8.1. The workarounds 
detailed thus far are too user intrusive for my environment. Are there any less 
intrusive workarounds now?  Or is a fix on the near horizon?

If I kill processes associated with any pts's, umount /dev/pts, mount /dev/pts, 
the problem does go away for a while.



Comment 63 Dave Jones 2005-10-06 02:37:25 UTC
is this still a problem in the current rawhide kernel ?

Comment 64 Mihai Ibanescu 2005-10-06 02:59:27 UTC
WORKSFORME on FC3/FC4/Rawhide.

Comment 65 Michal Jaegermann 2005-10-06 03:23:45 UTC
I did not see that for quite a while on various machines around.

Comment 66 Alexandre Oliva 2005-10-06 05:19:22 UTC
I think it's gone, indeed.

Comment 67 Michael L. Artz 2006-06-06 03:00:52 UTC
I'm getting this bug (ssh openpty error) on RHEL4 AS Update 2 ... should it be
fixed by now?

Comment 68 Barry K. Nathan 2006-06-06 09:10:58 UTC
Well, just FWIW, I used to be one of the people who saw this very frequently,
but I haven't ever seen it with RHEL 4 Update 1 or 2. (I don't remember whether
I ever saw it with RHEL before Update 1, but I definitely haven't seen it since
Update 1.)

Comment 69 Michael L. Artz 2006-06-06 13:02:34 UTC
Ah, should have also mentioned that its on x86_64 (SMP Opteron) and the remount
trick didn't work.


Note You need to log in before you can comment on or make changes to this bug.