Bug 201461
Summary: | ext3 filesystem problems | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Karl O. Pinc <kop> | ||||
Component: | kernel | Assignee: | Eric Sandeen <esandeen> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | petrides | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | athlon | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-09-20 20:28:33 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Karl O. Pinc
2006-08-05 18:22:36 UTC
Created attachment 133696 [details]
find /proc/ide -type f -exec bash -c 'echo {} ; cat {} ; echo' \;
The slow logging-in was due to a bad nameserver elsewhere in the network. Last night this message showed up on the target system when copying from the problem system: Aug 7 04:31:20 jcp-backup rsyncd[23849]: readlink jcp/u/ems/C6.06.241/Editor/Refltr/rltr.rr1.-1.C6.06.241.20Jul06.txt: Input/output error The /jcp/u/ems/C6.06.241/ directory is the one mentioned at the very beginning of this bug report, the directory which contained only a "." and no "..". As of today I see no problems while poking about that part of the filesystem. (There's nothing in the problem system's logs. Clocks are sychronized.) Today I'm able to reproduce some sort of rsync problem when rsync-ing _to_ the box with the problem, using 2.4.21-40.EL kernels on both sides, without having to run out of inodes. (The server side rsync process seems to disappear.) Log messages on the server side (the machine with the reported problem) are: Aug 7 13:05:14 jcp rsyncd[32555]: rsync allowed access on module jcp-backup from jcp-backup.uchicago.edu (128.135.44.143) Aug 7 13:05:14 jcp rsyncd[32555]: rsync to jcp-backup from root.edu (128.135.44.143) Aug 7 13:06:03 jcp rsyncd[32555]: write failed on etc/ld.so.cache : Success Aug 7 13:06:03 jcp rsyncd[32555]: rsync error: error in file IO (code 11) at receiver.c(272) Aug 7 13:06:03 jcp rsyncd[32555]: rsync: connection unexpectedly closed (4255086 bytes read so far) Aug 7 13:06:03 jcp rsyncd[32555]: rsync error: error in rsync protocol data stream (code 12) at io.c(165) Note that this is a completely different filesystem, on the problem machine. Maybe the rsync issue is unrelated as well? ? Or maybe there's something else going on. I've scheduled downtime for a memory test. Ran memtest86+ v 1.65 on both source and destination machine and everything passed. The rsync errors from the last post are from filling up the filesystem (blocks, not inodes). Sorry. I still have no explaination for the original problem, the directory that showed up with just a "." as contents. There may yet be a problem with ext3 when you run out of inodes. My plan now is to switch back to the 2.4.21-47.EL kernel and let you guys worry about possible ext3 problems. So, to summarize: the one actual problem is the missing entries from "ls -lah", and this problem disappears after a remount? The slowness was due to a bad nameserver and the rsync errors due to a full filesystem, right? Do these missing entries happen only after the filesystem runs out of space? Does it always take a remount for the missing entries to re-appear? Is it always the same directory which has this problem? Thanks, -Eric Sorry about the confused bug report, wanted to get in any info that might be relevant. Yes, the one problem is that I did a "ls -lah" and got only ".". The missing entries reappear after reboot and fsck -f (which gave me no errors). I did not just try remounting, sorry. I can't say more as I've not seen the problem re-occur. The only other thing to say is that when the problem occurred there were 2 other unusual occurences. (Niether of which should matter. :) The system that had the "." problem had a full filesystem (out of blocks) on another partitition, not the partition with the "." problem. An rsync was periodically running and failing while trying to put more data on the full partition. (I suppose it's remotely possible that the directory I where I discovered the problem was on the full partition. You know how these things get to be a blur after a while.) Meanwhile, another rsync was periodically running, copying the partition that had the "." problem, to a remote machine -- and the remote machine's partition ran out of inodes. (See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=381486 Three machines are involved here, the one with the problem is a RH AS3, as is the one copying to the problem machine. The third, recieving data from the problem machine, is a Debian box.) So, I don't know what to tell you here. Unlikely as it seems, maybe if the system's busy dealing with a full partition it will somehow, sometimes, show something wierd happening in a directory on another partition? The strange thing is that fsck -f reported no problems. The most likely answer is that I've made some sort of mistake in my reporting. But I really did see a directory with only a "." in it. Given what's being backed-up where, it is possible that he directory with the "." problem was on the full partition. The filesystem path would differ by only 1 component between what I thought I was looking at and the full partition. Still, why no errors reported by fsck? (Next time I'll remember to use script when recovering a system.) I'm willing to answer more questions, but would not blame you if you wanted to close the bug. I'm afraid the system's in production use so I can't run trials filling up the disk. One more thought. I was running rsync with the option that preserves hard links. Maybe it can leave a directory in a strange state when the filesystem fills? I think I'm going to have to close this one - if you see this again, and can come up with a bit clearer path to reproduction, please do reopen. Thanks,d -Eric |