Description of problem: Running fsck on a disk with sector errors (as reported by smartd) causes the system to hang. Version-Release number of selected component (if applicable): e2fsprogs version 1.40.7 (in Fedora 9 beta) How reproducible: For the filesystem built on top of these errors it is every time. The hang occurs whether running from linux rescue or booting with forcefsck. Fsck runs OK on filesystems where there are no errors. Steps to Reproduce: 1. Create an ext3 filesystem on a logical volume (not sure how crucial the LVM2 aspect is) 2. Create (or wait for) some bad sectors on the disk 3. Run fsck Actual results: Fsck runs OK on filesystems on good disk partitions but hangs the system when running on the filesystem built on a faulty disk partition. Expected results: Fsck should either report a problem it can't handle and exit gracefully or ask for user input. It should not hang the system. Additional info: The hang occurs both when fsck is run in linux rescue mode using the installation DVD and when the system is rebooted after a "touch /forcefsck". Using a combination of tools (smartctl, pvdisplay, lvdisplay etc) I can determine the logical volume that uses the partition with the disk blocks where the smartd errors are reported. Only this logical volume causes a problem with fsck. The logical volume can otherwise be mounted. The disk errors are reported as read errors by a smartctl long test. The short test runs OK. The errors ought to be ones that can be mapped out as bad blocks though I expect to have to recreate the filesystem to do this and have not verified that this is the case.
Further testing/work shows that mkfs.ext3 -c <dev> also hangs the system. So I've tried upgrading e2fsprogs to 1.40.8-2.fc9 (i.e. the latest). mkfs.ext3 -c <dev> still hangs the system (even in run level 1). Needless to say, this doesn't leave many options for fixing the disk problem (which I presume to be a modest one) and replacing the disk will require a complete system reinstall (because other partitions on the disk are more critical) . The bug therefore seems to me to be fairly serious though the problem won't be that common.
Off the top of my head, I would guess that the "hang" is the IO layers in the kernel (re)trying the reads and timing out, not e2fsck itself doing anything wrong. IOW, *any* application trying to do IO to a busted disk will behave this way. You will need to take some other action to repair your IO problems before you can expect to have a usable system. But, if you could strace fsck, (strace -t -o fsck_trace e2fsck /dev/whatever) up to the point where it's been hanging for a "long" time (whatever that might be) and attach the strace here, as well as /var/log/messages w/ timestamps from the same time, we can see exactly what is happening. Thanks, -Eric
Created attachment 303837 [details] strace output of mkfs.ext3 -c The file system no longer exists so I append instead the strace output of a mkfs -c <dev> command. This definitely hangs the system (evidenced by networking stopping and, if in graphics mode, display features stopping). It is not just a case of the command waiting. The messages file has no relevant output whatsoever related to the mkfs command. The last message was at 08:33:25 (before the strace mkfs command was run at 8:34:50) and the system was rebooted at 10:13:32. Messages does contain smartd errors such as the following every half hour. Apr 26 08:21:22 ws1 smartd[3230]: Device: /dev/sdc, 326 Currently unreadable (pe nding) sectors Apr 26 08:21:22 ws1 smartd[3230]: Device: /dev/sdc, 326 Offline uncorrectable se ctors smartctl shows the following. # smartctl -l selftest /dev/sdc smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 30% 11713 415265197 # 2 Short offline Completed without error 00% 11695 - This is a SATA disk and is otherwise operational, i.e. other partitions (the boot partition!) and other logical volumes using unaffected sectors are working OK. Some relevant parts from the system boot logging follow. Apr 26 10:13:33 ws1 kernel: scsi 2:0:0:0: Direct-Access ATA ST3250823AS 3.03 PQ: 0 ANSI: 5 Apr 26 10:13:33 ws1 kernel: ata3: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61 Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB) Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write Protect is off Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB) Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write Protect is off Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 26 10:13:33 ws1 kernel: sdc: sdc1 sdc2 Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Attached SCSI disk My expectation is that I should be able to map out the bad blocks and continue using the disk, regardless of whether the disk is actually failing or has just developed a few bad blocks. I have done this before with a scsi disk on a solaris system (on a mounted filesystem if memory serves me correctly). Even if this is not possible with linux (contrary to my understanding of the purpose of mkfs -c) I don't think the system should be hanging when I attempt to do so.
Well, unfortunately I should have had you add -ff to the strace, the child badblocks command which is invoked didn't get traced. By default this will invoke badblocks in a read-only mode. From your messages above, it appears that when you hit unreadable sectors, the drive itself is hanging up and no longer responding to requests from the rest of the system. There's nothing magical about badblocks; it will just attempt to read or write sectors, keeping track of anything that fails. But if the drive fails out from under it, there's nothing it can do. It looks like that is what's happening. I would suggest manually running badblocks on the problematic partition in write (-w) mode, to give the drive a chance to remap bad sectors (this probably only happens on write) Well, actually I would suggest getting a new disk! :) Especially if the drive is not able to remap when you do the write test.
So to be clearer; I don't consider this to be an e2fsprogs bug, but I do agree that in general, Linux IO error handling needs to be more robust. If a drive has gone south, the OS shouldn't retry endlessly, queue up other IOs behind it, or whatnot... but this is a much larger problem which needs to be addressed an handled in upstream kernel development. Are your messages logging to the same disk which has the bad sectors? I'd bet that the kernel is issuing messages about IO problems but they're being lost to the hung IO. Do you see any messages on the console? I'm not quite sure how to disposition this bug; basically I think this is part of a larger kernel problem which is not well addressed even upstream. Thanks, -Eric
The mystery deepens because the disk is now operational again! I ran badblocks -w -b 4096 -o badblks and it completed. No bad blocks were reported in the file badblks. (?!) I then ran mkfs.ext3 -c -c -b 4096 /dev/VolGroup00/LogVol03 and this also completed. (Note the duplicated -c is deliberate and not a typo.) In the second case anyway the -b option should not have been necessary as it is supposed to be the default but I just wanted to make sure. I then ran mkfs.ext3 -c /dev/VolGroup00/LogVol03 (which failed before) and this also worked. fsck run on the new file system also worked. I have strace -t -ff output for all of these commands but since they were successful I see no value in attaching any. The new uuid was found, fstab modified and the new filesystem successfully mounted. After a couple of hours there have been no smartd errors reported. I shall continue to monitor it. Some comments now follow. 1) The original problem reported was very real. 2) The badblocks -w is probably what cured the disk sector problem, but I can't be certain. 3) Your recommendation to get a new disk is accepted but the fact that this box was used for storing rsync backups and that the faulty disk had the boot partition on it as well as part of another (very large) logical volume did not make this the most attractive first option. 4) You are probably right that the real problem was in lower level IO rather than one of the e2fsprogs. 5) It is quite possible that there was a fundamental issue with the disk itself, as you suggested, because I did hear it make some noise as the system hung. 6) Your comments about possible logging to the same disk are interesting but I don't think apply here. The first strace I did, which failed (and was posted here), was saved to the root filesystem. This also contains the /var/log hierarchy for normal system logging. This filesystem sits on another logical volume whose extents are all on another disk. The later commands which worked had their strace outputs saved to my home directory which is on a separate filesystem on another logical volume. Part of this logical volume does reside on the problem disk. Since the fail-success matching of the filesystems is contrary to your hypothesis I think we can discount this as a possibility. 7) Every time the previous failures occurred, the number of unreadable and uncorrectable sectors reported by smartd increased. I do not know enough about what was really being reported to know whether this is significant. 8) The disks were not physically touched prior to the problem being "resolved". Given that I can no longer reproduce the error and provide data concerning it, and the likelihood that the problem is not actually in e2fsprogs, I would be content if this bug was closed. I would however urge that notice of this problem be passed to the relevant IO developers for some consideration because I do believe that whatever caused the problem the OS should have handled it more gracefully. If the problem re-occurs I would be happy to provide further data to whomever is the most appropriate.
Ok, I'm going to close as INSUFFICIENT_DATA, I think... we really need a message from some level to know for sure what is actually causing the hang. I'm not denying the problem exists ;) If lower levels are failing or infinitely retrying w/o actually issuing any messages, that would be a bug in itself. Thanks, -Eric