Description of problem: Corrupted filesystem see on a dual xeon box with a 3ware controller. The logs say: Jun 25 09:10:34 x kernel: attempt to access beyond end of device Jun 25 09:10:34 x kernel: sda2: rw=0, want=7534144408, limit=614405925 An rync of the data failed on a file with IO errors, this file was corrupt (old size < 1K, new size over 1Tb). When removing this file the ext3 filesystem remounted readonly (which I guess is a very good thing, saves me from more problems). fsck seems to have fixed the broken file. This problem of getting corrupt files has now occured twice in 10 days. Version-Release number of selected component (if applicable): How reproducible: Server system, which nfs exports the data to linux clients. I haven't isolated when the corruption starts. Actual results: files get corrupt Expected results: Additional info:
I think I've experience a similar problem. This morning my data directory was mounted as read only, even to root. It was not until I rebooted that I could see the Ext3 fs was corrupt. It said it could not repair it. It dropped me to a command line and I had to manually fsck it an allow it to repair. It seems fine so far, but I'm nervous. I was using: Fedora Core 2 (2.6.6.1-435) Samba 2.0.3-5 Ext3 filesystem is /dev/md3 (hda6,hdc6) For further info, see my post on the samba list http://article.gmane.org/gmane.network.samba.general/46564
I'd really need to see full logs from the kernel, from before you started noticing the problem, to have any hope of getting further with this.
Created attachment 101655 [details] /var/log/messages.1 with smbd and nmbd entries removed.
Ok, I've attached the file /var/log/messages.1. I actually used a 'grep -v' to strip out all of the smbd and nmbd messages. I hope this still gives you the info you need. In my /var/log/messages.1 file, I found the following kernel errors: Jul 2 04:02:16 fwinsites logrotate: ALERT exited abnormally with [1] Jul 2 04:04:21 fwinsites kernel: EXT3-fs error (device md3): ext3_find_entry: bad entry in directory #5046464: inode out of bounds - offset=8192, inode=16777216, rec_len=32, name_len=22 Jul 2 04:04:21 fwinsites kernel: Aborting journal on device md3. Jul 2 04:04:21 fwinsites kernel: ext3_abort called. Jul 2 04:04:21 fwinsites kernel: EXT3-fs abort (device md3): ext3_journal_start: Detected aborted journal Jul 2 04:04:21 fwinsites kernel: Remounting filesystem read-only Another concern is that it did not email this info to me. Shouldn't it have let me know there was a problem? Every day I receive a status email, so I know the log notification is working.
I have the same problem and at the same time???? /var/log/messages: Nov 30 19:00:54 localhost kernel: mtrr: type mismatch for f6000000,800000 old: write-back new: write-combining Nov 30 19:00:54 localhost kernel: mtrr: type mismatch for f6000000,800000 old: write-back new: write-combining Dec 1 04:02:02 localhost logrotate: ALERT exited abnormally with [1] Dec 1 11:53:52 localhost login(pam_unix)[2678]: authentication failure; logname= uid=0 euid=0 tty=pts/1 ruser= rhost=10.10.19.73 user=oracle10
ext3_find_entry: bad entry in directory #5046464: inode out of bounds - offset=8192, inode=16777216, rec_len=32, name_len=22 is still just a sign that something went bad. Is there any other sign of anything going bad in the logs? What's the hardware? It's basically impossible to diagnose this sort of thing remotely. 95% of the time or more it's bad hardware that is the root cause. memtest86 is a useful start, as is "dt" to test the disks. But without a reproducible pattern of failure, it's impossible to say why your systems are failing in ways which we never see during extensive testing.
Please reopen if this can be reproduced on FC3.