Bug 128154
Summary: | cannot spawn new psuedo tty (xterm, gnome-terminal, ssh) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | James Laska <jlaska> | ||||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | rawhide | CC: | awm, balay, barryn, benny+bugzilla, byte, dgunchev, dickey, dm, geoff+fedora, george, gghartma, harkness, hongjiu.lu, jturner, ken, liblit, mattdm, mellison, michal, mihai.ibanescu, mishu, nikosx, oliva, oliver, pascal, pfrields, pmatilai, public, redwolfe, rjwalsh, rstrode, slyph, tim, tmraz, twaugh, wtogami | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-10-06 05:23:18 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 123268, 133398, 136451 | ||||||||
Attachments: |
|
Description
James Laska
2004-07-19 13:33:44 UTC
Created attachment 102033 [details]
strace xterm
This may be related to the strange behavior that I see with gnome-terminal for about the past 1.5 weeks while using rawhide. When opening new gnome-terminal windows, or new gnome-terminal tab of an existing session, something would use 100% CPU and the system will appear to be deadlocked for a while (mouse unable to move). After roughly 30-60 seconds the system returns, and CPU usage goes back down. gnome-terminal does open a new terminal or tab, but the shell would fail to start. Attempting to run konsole would be similar, with 100% CPU briefly and failed startup of the shell, but the entire system does not lockup entirely like gnome-terminal. In order to recover, I only need to fully close all gnome-terminal sessions within my GNOME session. Then newly opened terminals work. What seems to trigger this behavior was something that happens during certain RPM upgrades from rawhide, or the daily prelink. Also note that the 100% CPU usage when attempting to start a terminal does not appear on top or ps output. This may be an indication that something is rapidly creating a new process or thread, and it dies quickly. One thing that I have not tried yet: It would be good if we could better isolate the problem by trying the latest FC2 update kernel with rawhide userspace. If it is a kernel problem like we suspect, then this problem should go away completely by using only the FC2 kernel. warren: the problem you described EXACTLY matches what I've been seeing the last week or so from rawhide. I can try rolling back to the latest stable FC2 kernel ... however it will take a bit since this is my primary work machine. I will post back with results... warren: I'm currently running with the FC2 updated kernel 2.6.6-1.435.2.3 and all other packages from rawhide-latest. I will post if I encounter the issue again later today. Try this procedure: 1) Boot the FC2 kernel. 2) Start a gnome-terminal. 3) From a VT, erase /etc/prelink.cache 4) Run /etc/cron.daily/prelink Allow it to complete. At this point I suspect this will trigger the problem. See if behavior of this is any different between FC2 and FC3 kernel. *** Bug 128346 has been marked as a duplicate of this bug. *** Nice reproducer, hitting this, or a similar, problem after making the above prelink changes on 2.6.6-1.435.2.3 (FC2). The gnome-terminal will hang and not screen refresh. Any new gnome-terminals hang as well. However, after several minutes the terminal comes back. This is slightly different from what I am seeing on the rawhide-latest kernel in that when the terminal finally became available, there was no shell on the terminal. So I'm not certain this is the exact problem. I will move back up to FC3 kernel and see if the prelink changes trigger the problem immediately. *** Bug 128558 has been marked as a duplicate of this bug. *** I've even experienced it if I want to log in remotely (ssh) to the box, I get no shell. But the ssh tunnels are there..... my kernel rawhide 2.6.7-1.494smp, and I'm not sure if it is prelink related, prelink is a nightly job, right ? I seems to me as if the number of pty's at a certain moment are all used up ! Trying to start xterm from a terminal will print: xterm: Error 32, errno 2: No such file or directory Reason: get_pty: not enough ptys Stracing it: ... open("/dev/ptyeb", O_RDWR) = -1 ENXIO (No such device or address) open("/dev/ptyec", O_RDWR) = -1 ENXIO (No such device or address) open("/dev/ptyed", O_RDWR) = -1 ENXIO (No such device or address) open("/dev/ptyee", O_RDWR) = -1 ENXIO (No such device or address) open("/dev/ptyef", O_RDWR) = -1 ENXIO (No such device or address) write(2, "xterm: Error 32, errno 2: ", 26xterm: Error 32, errno 2: ) = 26 open("/usr/share/locale/locale.alias", O_RDONLY) = 5 fstat(5, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0 ... Actually, the place where it goes haywire is: close(5) = 0 open("/dev/ptmx", O_RDWR) = -1 EIO (Input/output error) open("/dev/ptyp0", O_RDWR) = -1 ENXIO (No such device or address) open("/dev/ptyp1", O_RDWR) = -1 ENXIO (No such device or address) /dev/ptmx normally opens just fine, and then te codepath is different - it gets /dev/pts and then goes on its merry way. /me keeps searching 2.6.8-rc2-mm1 (and, more recently, 2.6.8-rc2-bk8) have a patch to fix some kind of pty-related leak. I don't know if that's the fix for this bug, but it could be. FWIW, I don't seem to run into this problem with kernel 1.488, but 1.494, 1.499 and 1.501 all fail in the same way as described above. 1.499 and newer have 2.6.8-rc2-bk8, so they should have the patch mentioned by Barry. I'm trying to reproduce the problem here (using a script to log in, do stuff and log out again) but I'm not seeing any use of ptys beyond the needed ones. This is with kernel 1.499 I saw this problem on PowerPC months before seeing it on x86. The PPC machine in question is no longer mine (for the foreseeable future anyway) and is now running Mac OS X for its new duties. But, when I had it, I had this problem with (I think) kernel 1.456. Unfortunately I didn't have a chance to report it at the time. (It's possible that I had the problem with an even earlier kernel, possibly as early as 1.422. I don't remember for sure; what I am absolutely certain about, however, is that the problem appeared before 1.460.) On a related note, comment 11 of bug 127902 may be worth looking at. (Maybe someone should copy-and-paste it into this bug, in fact.) Still happens with kernel 1.499 [misa@abulafia python]$ python -c 'open("/dev/ptmx", "rw")' Traceback (most recent call last): File "<string>", line 1, in ? IOError: [Errno 5] Input/output error: '/dev/ptmx' [misa@abulafia python]$ uname -a Linux abulafia.devel.redhat.com 2.6.7-1.499 #1 Wed Jul 28 12:11:10 EDT 2004 x86_64 x86_64 x86_64 GNU/Linux [misa@abulafia python]$ uptime 16:25:24 up 3 days, 20:22, 29 users, load average: 0.45, 0.49, 0.36 No easy way to reproduce this bug, other than leave the box run for about 3 days. Just confirmed that my other rawhide box running 499 is having the same problem. The weird problem is that the problem goes away after gnome-terminal is restarted. > The weird problem is that the problem goes away after gnome-terminal
> is restarted.
Heh... not for me it doesn't. :( In my case I can certainly kill X and
that doesn't cure the problem. IIRC I can even go from runlevel 5 to 3
(so X doesn't restart) and the pty's will still stay unavailable.
On comment #14, I said I didn't get the problem with 1.488. I lied. Just got it for the first time. *** Bug 126772 has been marked as a duplicate of this bug. *** *** Bug 127902 has been marked as a duplicate of this bug. *** See also bug #127048 with a list of quite detailed reports in it. It looks remarkably similar. In short the only "cure" I found is to get down to logins only on a console, unmount /dev/pts and mount it again; even that is only a temporary solution. On my rig this is really a killer bug but I did not run into it on an x86 installation. Just got bit by this bug with FC3t1 kernel 509 on a rawhide install from 2004-08-07 (clean rawhide install) I've finally gotten the bug to happen while I've got a kernel running with some printk's to try and diagnose this thing. Where ever someone could be returning EIO to tty_open (and thus it passing back to user space) I put in a printk to show what condition was causing the problem. My logs show the following: Aug 8 11:07:47 localhost kernel: pty_open failed with -EIO closed: 0 lock: 0 count: 0 Aug 8 15:45:38 localhost kernel: init_dev returning -EIO because tty->count != 0, instead it was 1 Aug 8 15:45:38 localhost kernel: tty_open init_dev failed with -5 the id was 0 Here is my guess at what is happening: The first thing I have in my logs is that pty_open is returning EIO because tty->link->count is 0 instead of one. I don't recall trying to get a new pty (if I were trying to get one, this whole explanation falls apart, but the conclusions still hold which is kind of strange) at the time that event is logged so lets assume that pty_open was called opening something other than /dev/ptmx (perhaps the slave end which happens whenever a new program is started on a given pseudo terminal. This return value gets back to tty_open which reacts to any failure code by de-allocating the pty id specified in the index variable, even if one was not allocated in this call. On subsequent attempts to open /dev/ptmx idr_get_new happily returns an id of zero since it has been previously freed. init_dev asks the devpts driver if the tty already exists. Since we are dealing with /dev/pts/0 now and I still have my terminal window open it looks at the existing tty object. Pty masters are restricted to only be opened once (otherwise people might somehow be able to inject keystrokes or something like that) and gnome-terminal still has the master for /dev/pts/0 open so the check fails. There are a couple of conclusions here: 1. Something is broken if pty_open is hitting the condition it is. You can only call pty_open on a master pty once in which case tty->link will be the slave and it should only have a count of one from init_dev incrementing the slave's count. If you are pty_opening the slave tty->link is the the master which should only be able to be opened. 2. The part of tty_open that calls idr_remove on a failure should probably have an if (device == MKDEV(TTYAUX_MAJOR)) around it so that it is called only if idr_get_new was also called on that invokation of tty_open. Just doing this should fix the symptom we're seeing, and is necessary since things will also get messed up if for some reason the console device fails to open or something like that. But someone should try and figure out why tty->link->count went to zero. 3. A work around to allow you to open more terminals when you get bit by this bug: Close whatever is using /dev/pts/0. Then when idr_get_new comes up with id 0 it is actually unused and things will work normally until the bug happens again. *** Bug 129416 has been marked as a duplicate of this bug. *** > 3. A work around to allow you to open more terminals when you get
> bit by this bug: Close whatever is using /dev/pts/0.
My empirical observations seem to indicate that this is not enough.
If something is holding open /dev/pts/0 then obviously you cannot
unmount devpts. But closing it is not enough. Only unmounting
/dev/pts and mouting it again clears the condition. At least for
a while. OTOH I could not get anything unusual with 'lsof'; but
all that probably means that something is indeed miscounted and that
this status persists. Just remounting /dev/pts is insufficient -
at least in my case.
Oh yeah, specifically closing /dev/pts/0 was specific to my situation. I just had to close that one, I could leave all my other terminals open and did not have to unmount /dev/pts. Perhaps there is a slightly different but related problem that involves the devpts filesystem. As far as I can tell from the code and my logs the pty that has been mistakenly marked as available needs to be closed. Without having extra logging in the kernel there is no way to know which one this is. To totally clear the problem (at least until it spontaneously happens again) you would have to close all the terminals to be sure to get the one that is hung up. Although closing any terminals before the one that is incorrectly marked as free should allow you to open exactly that many terminals before encountering the problem again. Just to be sure I run 'strace xterm' on x86_64; with kernel 2.6.7-1.509 this time. It is remarkably similar to what James put in an attachment from comment #1. One sees open("/dev/ptmx", O_RDWR) = -1 EIO (Input/output error) and it is downhill from that moment on. So I do not see much point in dropping it here too; but if somebody thinks that this would be a good idea for a completness then give a shout and I will add it to this report. There is something here ftp://ftp.linux.org.uk/pub/people/viro/ptmx-delta Not tested, and I did not even look yet how it fits inot Fedora kernels, but it seems to have the right smell. :-) Of course s/inot/into/ above. Reading before hitting "Commit" has some advantages. The patch mentioned in comment #29 is now in kernel-2.6.8-1.520. (I just looked at several previous kernel SRPMS; the patch is present from 515 forward, but not in 509.) kernel-2.6.8-1.520 is a test kernel for FC2 (i.e. a different compiler) but indeed so far I was not hit by the problem with kernel-2.6.8-1.517. I am afraid that I cannot say one way or another about 2.6.8-1.515. *** Bug 130595 has been marked as a duplicate of this bug. *** With kernel-2.6.8-1.521 I'm getting what looks like it might be the same bug: I'm trying to run code which opens a pty in the following way, which used to work fine: int get_tty(void) { static unsigned char ptyc3[] = "pqrstuvwxyz"; static unsigned char ptyc4[] = "0123456789abcdef"; unsigned char *s3, *s4; int ourpty = -1; for (s3 = ptyc3; *s3 != 0; s3++) { for (s4 = ptyc4; *s4 != 0; s4++) { ptynam[8] = ttynam[8] = *s3; ptynam[9] = ttynam[9] = *s4; if ((ourpty = open(ptynam,O_RDWR)) >= 0) { The call to open in the last line now always fails with errno 6 (No such device or address). Any ideas for a permanent or temporary fix would be much appreciated. Re: comment #34 Are you saying that your problem happens: + always with 521 and + never with previous kernels? (i.e. 521 has a regression for you) Or are you saying: + it's been happening intermittently (or always) with the last few kernels + and 521 fails to fix it? The latter. Following a helpful suggestion Misa sent me, I've since rewritten my code (which e.g. on SuSe 9.0 worked fine) to use the Unix98 setup (opening /dev/ptmx, then calling ptsname) instead of probing for /dev/ptyXY myself. That works, apparently independent of the kernel I use. I've also reverted from 2.6.8-1.521 to the old 2.6.6-1.435 though since ssh was hanging for me when using the 521 kernel and I haven't had time to boil down what the problem is there. I can definitely reproduce this with kernel-2.6.8-1.521; I just upgraded my FC2 server to it, and it seems to happen within a day or so. In particular, running screen and trying to allocate a new pty in it seems to more or less immediately kill the box (well, it's ability to allocate new ptys anyways). People have been mentioning 520 fixes it - is the patch not in 521? I haven't had this problem for a while running 525+. Tried that? *** Bug 131214 has been marked as a duplicate of this bug. *** switching to runlevel 1 and back to 5 is a workaround. Opening one or more terminals/ptys will work again, after having returned to the previous runlevel. Maybe this information helps in finding the problem !? greets I'm running kernel 2.6.8.1 Still having this problem. I found some leads to changing some CONFIG_PTY_COUNTS (or something like that) setting. I couldn't find it in my .config. I'm running Fedora Core 2 with a vanilla kernel 2.6.8.1 The only resolution that I've been able to find is restart x. I had this problem with FC3t1, but it vanished last 3-5 days. I use kernel 2.6.8-1.533 (540 and 541 do not boot on my smp pc, but I gusess I have to mkinitrd for them). Re: comment #41 It might be interesting to see if this problem still happens in 2.6.9-rc2 or later. This has been fixed a while ago as I have noted in Comment #38. vanilla kernels are NOT supported and you are on your own. Could someone please take a look at bug 132617 and bug 132621. At least, one of them looks like exactly the same as this bug. There is a testcase with a patch in each bug report. You can run the testcases to see if the bugs have been fixed or not. > Could someone please take a look at bug 132617 and bug 132621.
Hm, got "not authorized" on both so it is hard to take a look.
Created attachment 104074 [details]
A testcase to show the memory leak and
Do
# gcc x.c -lutil
# ./a.out
machine will lost memory slowly until run out of memory and stop response.
It will also cause machine to refuse ssh and telnet login.
When I rlogin into a machine running 1-584 kernel, kernel reports Sep 23 09:35:39 gnu-64 login: FATAL: can't reopen tty: No such file or directorySep 23 09:35:41 gnu-64 pam_rhosts_auth[2701]: allowed to hjl.intel.com as hjl Sep 23 09:35:41 gnu-64 login: FATAL: can't reopen tty: No such file or directorySep 23 09:35:47 gnu-64 su(pam_unix)[2704]: session opened for user root by hjl(uid=500) Sep 23 09:36:06 gnu-64 pam_rhosts_auth[2725]: allowed to hjl.intel.com as hjl Sep 23 09:36:06 gnu-64 login: FATAL: can't reopen tty: No such file or directorySep 23 09:36:07 gnu-64 pam_rhosts_auth[2727]: allowed to hjl.intel.com as hjl Sep 23 09:36:07 gnu-64 login(pam_unix)[2728]: session opened for user hjl by (uid=0) Sep 23 09:36:07 gnu-64 login -- hjl[2728]: LOGIN ON pts/1 BY hjl FROM gnu-d I can only rlogin after a few tries: gnu-d:pts/12[2]> rlogin gnu-64 ~ rlogin: connection closed. gnu-d:pts/12[2]> rlogin gnu-64 rlogin: connection closed. gnu-d:pts/12[2]> rlogin gnu-64 rlogin: connection closed. gnu-d:pts/12[2]> rlogin gnu-64 Last login: Thu Sep 23 09:35:18 from gnu-d Okay, I'm somewhat conflicted about this, but I'm going to set this report to reassign becuase I think the comment 47 is important enough to make sure the right people at least see that comment which was posted after the bug was marked modified. I ran the test code in comment 47 and i definitely saw memory leaking. This issue probably needs to be addressed, but I'm not sure if this needs to be refiled under glibc or stay as kernel. Considering its just opening/closing pty's it certaintly seems like a related problem to the initially reported symptoms. I'll let someone more knowlegable make a final determination as to what to do next with this report. But I think comment 47 deserves a review. With kernel 451 in normal usage situations I can't produce the initially reported problem. -jef Hi, I have the same problem when I try to ssh. When I straced sshd proces I saw I/O error on /dev/ptmx. Then I ssh machine /bin/bash -i and there was one screen but system seams to forget about his pts, and I fuser -k /dev/pts/1. Then everything worked again. It seams that system forget about some pts-s and when someone try to ssh (or anything which uses pts-s) try to alocate pts which is in use and there we go, I/O error... *** Bug 135051 has been marked as a duplicate of this bug. *** For what it's worth, I'm currently running into this problem with a machine running the 2.6.8-1.521 kernel. The machine is a laptop, and sometimes when I boot it and log in, xterms just won't start. The error in .xsession-errors is "get_pty: not enough ptys". After a reboot, things will be fine. I've been configuring this laptop over the last week or so, and I've rebooted it dozens of times. I've only run into the problem twice, but in both cases the only solution I've found is a reboot. When it's there, the problem seems to be present immediately after booting. (I.e., it's not something that starts happening after I've been logged in for a while.) A solution or reliable work-around would be much appreciated. @ Bryan Wright: I've posted a workaround above. You can switch to runlevel 1 and back to runlevel 5 (or whatever runlevel you have previously been in). Command: "init 1" and "init 5" as root. This is very fast in comparison to a reboot. This just happened on my FC2 release-yummified with devel kernel 2.6.8-1.541 (installed due to comments #38 and #44), so I would definately say that 525+ does NOT solve the problem unless 541 has reintroduced it. /var/log/secure: Oct 20 10:09:56 slugger sshd[6283]: error: openpty: No such file or directory Oct 20 10:09:56 slugger sshd[6285]: error: session_pty_req: session 0 alloc failed The server is close-to-idle meaning it's up, running a range of server services along with Xorg+gnome and such and cruising at load 0.0x. Since I only have remote access during work hours, I couldn't check any status with rgds to open pts' and whatnot. I need access to the the server from work, so I had to ask someone at the location to flip the PSU switch. The person in question had just logged into gnome at the console which may very well be what triggered the problem. Logins through ssh worked fine a few hours ago. I often get this message in messages during boot, after rc5 is complete: Oct 15 23:06:48 slugger init: open(/dev/pts/0): No such file or directory AFAICT the only thing starting up at this point would be X... The uptime of the server had been less than 12 hours due to what I expect was a deadlock issue with cryptoloop, so this bug really blows. (Yes I know cryptoloop blows too, but its the only released crypto alternative, aint it...?) Any solution requiring console access is worthless to me - and anyone else using FC2 in a server setup i recon. Dunno if its related at all, but the server was subject to a series of login(breakin) attempts on ssh during the night. I've been using FC2 in a location where I don't have console access. I've been working around the problem with ssh root@<server> reboot. Apparently, ssh doesn't allocate a pty if it has a command to run. I suspect that I could get a shell by doing the same with /bin/bash instead of reboot. I admit that rebooting the machine twice a day isn't a very good solution... The test code from comment #47 appears to work (without hogging system memory) on kernel-2.6.9-1.639. (run more than 15 min) perhaps this issue is fixed now? <assuming the test code is a proper indicator of the original bug reported> I had a similar problem, after upgrading kudzu, hal, udev, some dependency that was resolved by yum... Please note, that I still ran dev, instead of udev at this time. The newest udev, seems to obsolete dev (please correct me if I'm wrong; But I believe so, since dev was no longer installed after I upgraded...). udev mounted /dev and afterwards the mountpoint /dev/pts was no longer accessible - as you can guess... => ssh login => cannot allocate pty stuff. So for my understading, it might be good to remount /dev/pts... *** Bug 133128 has been marked as a duplicate of this bug. *** FYI, I still see the random rlogin failure under RHEL 4 beta 2 on ia64. I'm seeing something similar to this on RHEL 4 x86_64. After the system has been up for some time (I typically only log in to this from console once every week or so, so dunno how quickly it starts happening) starting new xterm/gnome-terminal fails, yet I can ssh into the system just fine. Strace of gnome-terminal says this: 27044 open("/dev/ptmx", O_RDWR) = 3 27044 statfs("/dev/pts", {f_type="DEVPTS_SUPER_MAGIC", f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0 27044 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 27044 ioctl(3, TIOCGPTN, [1]) = 0 27044 stat("/dev/pts/1", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 1), ...}) = 0 27044 statfs("/dev/pts/1", {f_type="DEVPTS_SUPER_MAGIC", f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0 27044 ioctl(3, TIOCSPTLCK, [0]) = 0 27044 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 27044 ioctl(3, TIOCGPTN, [1]) = 0 27044 stat("/dev/pts/1", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 1), ...}) = 0 27044 open("/dev/pts/1", O_RDWR|O_NOCTTY) = -1 EACCES (Permission denied) ..which seems to be the problem. 'mount -o remount /dev/pts' seems to have cured the thing for now. FC3 for X86_64 was fine for an initial installation, but after upgrading the Openssh this problem showed up after the system is up for several hours. ssh can execute any command on the FC3 host, but dies with the following message for a simple ssh login. Warning: no access to tty (Bad file descriptor). Thus no job control in this shel l. After examination I found that the directory /dev/pts was gone for no reason. My current solution is: ssh root@FC3_host "mkdir /dev/pts;mount /dev/pts" and everything goes back to normal. I'm still plagued by this problem, running FC2 2.6.8.1. The workarounds detailed thus far are too user intrusive for my environment. Are there any less intrusive workarounds now? Or is a fix on the near horizon? If I kill processes associated with any pts's, umount /dev/pts, mount /dev/pts, the problem does go away for a while. is this still a problem in the current rawhide kernel ? WORKSFORME on FC3/FC4/Rawhide. I did not see that for quite a while on various machines around. I think it's gone, indeed. I'm getting this bug (ssh openpty error) on RHEL4 AS Update 2 ... should it be fixed by now? Well, just FWIW, I used to be one of the people who saw this very frequently, but I haven't ever seen it with RHEL 4 Update 1 or 2. (I don't remember whether I ever saw it with RHEL before Update 1, but I definitely haven't seen it since Update 1.) Ah, should have also mentioned that its on x86_64 (SMP Opteron) and the remount trick didn't work. |