Red Hat Bugzilla – Bug 435358
iscsi root device never gets fscked when there's an i/o error
Last modified: 2013-09-02 02:24:02 EDT
Description of problem:
When a system with iscsi devices experiences a network glitch, and the system is
rebooted. On reboot, rc.sysinit does not force an fsck for the iscsi devices as
it normally would for other locally attached disks. As a result, the user must
manually run fsck after the system boots into read-only mode.
Version-Release number of selected component (if applicable):
Only tested once
Steps to Reproduce:
[Feb 28 14:50:30] < pjones> | 1) i/o error happens for whatever reason
(broken cable, act of god, etc.)
[Feb 28 14:50:38] < pjones> | 2) reboot happens
[Feb 28 14:50:50] < pjones> | 3) initrd mounts fs read-only
[Feb 28 14:51:06] < pjones> | 4) kernel sees that it's in an error state,
marks it as needing fsck, fixes the journal
[Feb 28 14:51:28] < pjones> | 5) initscripts runs "fsck -A", which doesn't
touch it because of _netdev in the options (which we can't take out for other
[Feb 28 14:51:53] < pjones> | 6) initscripts remounts it read-write
[Feb 28 14:52:10] < pjones> | 7) kernel sees the error still and forces it
back to read-only
[Feb 28 14:52:37] < pjones> | 8) netfs doesn't fsck it because it can't
touch the lockfile in /var/run so it never actually runs correctly
[Feb 28 14:55:51] < pjones> | (also lots of other stuff fails between 7 and 8)
[Feb 28 14:56:41] < pjones> | / *absolutely must* get fscked in
rc.sysinit, not a separate initscript.
- filesystem mounted in read-only mode, manual fsck required
- rc.sysinit should trigger an fsck upon boot.
This isn't really anything new - this would be the case even in 5.0 GA if you
have a network root block device (iSCSI, GFS2, NBD) - they would all run into
Possible solutions, of varying quality:
- Remove _netdev from fstab for /. This would, however, break shutdown.
- Remove the case from rc.sysinit so that _netdev devices are fscked. This
would, however, break booting with *any* non-root network block devices, as they
wouldn't be found when fsck runs, including existing installations. (Also,
- Run fsck on / from the initrd. *ducks*
- Introduce Yet Another Magic Flag, honored by shutdown as 'root is a network
device', but different from _netdev so it would be fscked. Would be a rather
ridiculous hack, but may work.
As for introducing Yet Another Magic flag - _netdev is handled specifically by
mount(8) - so if we went that route to fix, it would require changes to (at a
minimum) initscripts, anaconda, and util-linux.
Another alternative is root-causing why fsck from netfs fails for /, and fixing
notting: I'm not sure how that'll help -- the fs is already failed to RO mode by
the kernel at that point, and can't ever be put back in RW mode.
At any rate, I'm pretty sure the reason netfs doesn't work is that
/var/lock/subsys/netfs is still present.
(In reply to comment #6)
> At any rate, I'm pretty sure the reason netfs doesn't work is that
> /var/lock/subsys/netfs is still present.
Oh, right. And at that point you can't get to the FS to fix it and behave
reliably. So we're back to the add-another-flag hack.
Just to document some of the stuff pjones & I looked at....
for the -t option to fsck, "opt=" and "noopt=" options are specified as
comma-delimited, including the "opt/noopt" part.
i.e. -t opt=foo,noopt=bar
And they are cumulative; a filesystem must match each option (or no-option)
specification to be checked. Above, only filesystems with option foo *and*
without option bar will be checked.
... although at this point I guess it looks like we won't need to combine the
option specifiers in this way... but just in case.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
cmake-2.4.8-2.fc8 has been submitted as an update for Fedora 8