Bug 444119 - fsck hangs system when disk sector errors present
Summary: fsck hangs system when disk sector errors present
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: e2fsprogs
Version: rawhide
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-25 07:05 UTC by Paul Bickerstaff
Modified: 2008-04-27 22:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-04-27 22:51:10 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace output of mkfs.ext3 -c (18.67 KB, text/plain)
2008-04-25 23:41 UTC, Paul Bickerstaff
no flags Details

Description Paul Bickerstaff 2008-04-25 07:05:01 UTC
Description of problem:

Running fsck on a disk with sector errors (as reported by smartd) causes the
system to hang.

Version-Release number of selected component (if applicable):

e2fsprogs version 1.40.7 (in Fedora 9 beta)

How reproducible:

For the filesystem built on top of these errors it is every time. The hang
occurs whether running from linux rescue or booting with forcefsck. Fsck runs OK
on filesystems where there are no errors.

Steps to Reproduce:
1. Create an ext3 filesystem on a logical volume (not sure how crucial the LVM2
aspect is)
2. Create (or wait for) some bad sectors on the disk
3. Run fsck
  
Actual results:

Fsck runs OK on filesystems on good disk partitions but hangs the system when
running on the filesystem built on a faulty disk partition.

Expected results:

Fsck should either report a problem it can't handle and exit gracefully or ask
for user input. It should not hang the system.

Additional info:

The hang occurs both when fsck is run in linux rescue mode using the
installation DVD and when the system is rebooted after a "touch /forcefsck".

Using a combination of tools (smartctl, pvdisplay, lvdisplay etc) I can
determine the logical volume that uses the partition with the disk blocks where
the smartd errors are reported. Only this logical volume causes a problem with
fsck. 

The logical volume can otherwise be mounted. 

The disk errors are reported as read errors by a smartctl long test. The short
test runs OK. The errors ought to be ones that can be mapped out as bad blocks
though I expect to have to recreate the filesystem to do this and have not
verified that this is the case.

Comment 1 Paul Bickerstaff 2008-04-25 11:45:38 UTC
Further testing/work shows that mkfs.ext3 -c <dev> also hangs the system.

So I've tried upgrading e2fsprogs to 1.40.8-2.fc9 (i.e. the latest).

mkfs.ext3 -c <dev> still hangs the system (even in run level 1).

Needless to say, this doesn't leave many options for fixing the disk problem
(which I presume to be a modest one) and replacing the disk will require a
complete system reinstall (because other partitions on the disk are more
critical) . The bug therefore seems to me to be fairly serious though the
problem won't be that common.



Comment 2 Eric Sandeen 2008-04-25 12:53:04 UTC
Off the top of my head, I would guess that the "hang" is the IO layers in the
kernel (re)trying the reads and timing out, not e2fsck itself doing anything
wrong.  IOW, *any* application trying to do IO to a busted disk will behave this
way.

You will need to take some other action to repair your IO problems before you
can expect to have a usable system.

But, if you could strace fsck, (strace -t -o fsck_trace e2fsck /dev/whatever) up
to the point where it's been hanging for a "long" time (whatever that might be)
and attach the strace here, as well as /var/log/messages w/ timestamps from the
same time, we can see exactly what is happening.

Thanks,
-Eric

Comment 3 Paul Bickerstaff 2008-04-25 23:41:37 UTC
Created attachment 303837 [details]
strace output of mkfs.ext3 -c

The file system no longer exists so I append instead the strace output of a
mkfs -c <dev> command.

This definitely hangs the system (evidenced by networking stopping and, if in
graphics mode, display features stopping). It is not just a case of the command
waiting.

The messages file has no relevant output whatsoever related to the mkfs
command. The last message was at 08:33:25 (before the strace mkfs command was
run at 8:34:50) and the system was rebooted at 10:13:32.

Messages does contain smartd errors such as the following every half hour.
Apr 26 08:21:22 ws1 smartd[3230]: Device: /dev/sdc, 326 Currently unreadable
(pe
nding) sectors
Apr 26 08:21:22 ws1 smartd[3230]: Device: /dev/sdc, 326 Offline uncorrectable
se
ctors

smartctl shows the following.

# smartctl -l selftest /dev/sdc
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description	 Status 		 Remaining  LifeTime(hours) 
LBA_of_first_error
# 1  Extended offline	 Completed: read failure       30%     11713	    
415265197
# 2  Short offline	 Completed without error       00%     11695	     -


This is a SATA disk and is otherwise operational, i.e. other partitions (the
boot partition!) and other logical volumes using unaffected sectors are working
OK.

Some relevant parts from the system boot logging follow.

Apr 26 10:13:33 ws1 kernel: scsi 2:0:0:0: Direct-Access     ATA     
ST3250823AS	 3.03 PQ: 0 ANSI: 5
Apr 26 10:13:33 ws1 kernel: ata3: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary
0xFFFFFFFF, hw segs 61
Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] 488397168 512-byte hardware
sectors (250059 MB)
Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write Protect is off
Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] 488397168 512-byte hardware
sectors (250059 MB)
Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write Protect is off
Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
Apr 26 10:13:33 ws1 kernel:  sdc: sdc1 sdc2
Apr 26 10:13:33 ws1 kernel: sd 2:0:0:0: [sdc] Attached SCSI disk


My expectation is that I should be able to map out the bad blocks and continue
using the disk, regardless of whether the disk is actually failing or has just
developed a few bad blocks. I have done this before with a scsi disk on a
solaris system (on a mounted filesystem if memory serves me correctly). Even if
this is not possible with linux (contrary to my understanding of the purpose of
mkfs -c) I don't think the system should be hanging when I attempt to do so.

Comment 4 Eric Sandeen 2008-04-26 03:34:28 UTC
Well, unfortunately I should have had you add -ff to the strace, the child
badblocks command which is invoked didn't get traced.

By default this will invoke badblocks in a read-only mode.  From your messages
above, it appears that when you hit unreadable sectors, the drive itself is
hanging up and no longer responding to requests from the rest of the system.

There's nothing magical about badblocks; it will just attempt to read or write
sectors, keeping track of anything that fails.  But if the drive fails out from
under it, there's nothing it can do.  It looks like that is what's happening.

I would suggest manually running badblocks on the problematic partition in write
(-w) mode, to give the drive a chance to remap bad sectors (this probably only
happens on write)

Well, actually I would suggest getting a new disk! :)  Especially if the drive
is not able to remap when you do the write test.


Comment 5 Eric Sandeen 2008-04-26 19:05:26 UTC
So to be clearer; I don't consider this to be an e2fsprogs bug, but I do agree
that in general, Linux IO error handling needs to be more robust.  If a drive
has gone south, the OS shouldn't retry endlessly, queue up other IOs behind it,
or whatnot... but this is a much larger problem which needs to be addressed an
handled in upstream kernel development.

Are your messages logging to the same disk which has the bad sectors?  I'd bet
that the kernel is issuing messages about IO problems but they're being lost to
the hung IO.  Do you see any messages on the console?

I'm not quite sure how to disposition this bug; basically I think this is part
of a larger kernel problem which is not well addressed even upstream.

Thanks,
-Eric

Comment 6 Paul Bickerstaff 2008-04-26 23:47:17 UTC
The mystery deepens because the disk is now operational again!

I ran badblocks -w -b 4096 -o badblks and it completed. No bad blocks were
reported in the file badblks. (?!) 

I then ran mkfs.ext3 -c -c -b 4096 /dev/VolGroup00/LogVol03 and this also
completed. (Note the duplicated -c is deliberate and not a typo.)

In the second case anyway the -b option should not have been necessary as it is
supposed to be the default but I just wanted to make sure.

I then ran mkfs.ext3 -c /dev/VolGroup00/LogVol03 (which failed before) and this
also worked.

fsck run on the new file system also worked.

I have strace -t -ff output for all of these commands but since they were
successful I see no value in attaching any.

The new uuid was found, fstab modified and the new filesystem successfully mounted. 

After a couple of hours there have been no smartd errors reported. I shall
continue to monitor it.

Some comments now follow.

1) The original problem reported was very real.

2) The badblocks -w is probably what cured the disk sector problem, but I can't
be certain.

3) Your recommendation to get a new disk is accepted but the fact that this box
was used for storing rsync backups and that the faulty disk had the boot
partition on it as well as part of another (very large) logical volume did not
make this the most attractive first option.

4) You are probably right that the real problem was in lower level IO rather
than one of the e2fsprogs.

5) It is quite possible that there was a fundamental issue with the disk itself,
as you suggested, because I did hear it make some noise as the system hung.

6) Your comments about possible logging to the same disk are interesting but I
don't think apply here. The first strace I did, which failed (and was posted
here), was saved to the root filesystem. This also contains the /var/log
hierarchy for normal system logging. This filesystem sits on another logical
volume whose extents are all on another disk. The later commands which worked
had their strace outputs saved to my home directory which is on a separate
filesystem on another logical volume. Part of this logical volume does reside on
the problem disk. Since the fail-success matching of the filesystems is contrary
to your hypothesis I think we can discount this as a possibility.

7) Every time the previous failures occurred, the number of unreadable and
uncorrectable sectors reported by smartd increased. I do not know enough about
what was really being reported to know whether this is significant.

8) The disks were not physically touched prior to the problem being "resolved".

Given that I can no longer reproduce the error and provide data concerning it,
and the likelihood that the problem is not actually in e2fsprogs, I would be
content if this bug was closed.

I would however urge that notice of this problem be passed to the relevant IO
developers for some consideration because I do believe that whatever caused the
problem the OS should have handled it more gracefully.

If the problem re-occurs I would be happy to provide further data to whomever is
the most appropriate.


Comment 7 Eric Sandeen 2008-04-27 22:51:10 UTC
Ok, I'm going to close as INSUFFICIENT_DATA, I think... we really need a message
from some level to know for sure what is actually causing the hang.  I'm not
denying the problem exists ;)  If lower levels are failing or infinitely
retrying w/o actually issuing any messages, that would be a bug in itself.

Thanks,
-Eric


Note You need to log in before you can comment on or make changes to this bug.