Bug 435358 - iscsi root device never gets fscked when there's an i/o error
iscsi root device never gets fscked when there's an i/o error
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: initscripts (Show other bugs)
5.2
All Linux
low Severity low
: beta
: ---
Assigned To: initscripts Maintenance Team
Brock Organ
:
Depends On:
Blocks: 435716 435717 435722
  Show dependency treegraph
 
Reported: 2008-02-28 15:26 EST by James Laska
Modified: 2013-09-02 02:24 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2008-0443
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 13:24:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description James Laska 2008-02-28 15:26:54 EST
Description of problem:

When a system with iscsi devices experiences a network glitch, and the system is
rebooted.  On reboot, rc.sysinit does not force an fsck for the iscsi devices as
it normally would for other locally attached disks.  As a result, the user must
manually run fsck after the system boots into read-only mode.

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080225.2 (initscripts-8.45.18.EL-1)

How reproducible:
Only tested once

Steps to Reproduce:
[Feb 28 14:50:30] <      pjones> | 1) i/o error happens for whatever reason
(broken cable, act of god, etc.)
[Feb 28 14:50:38] <      pjones> | 2) reboot happens
[Feb 28 14:50:50] <      pjones> | 3) initrd mounts fs read-only
[Feb 28 14:51:06] <      pjones> | 4) kernel sees that it's in an error state,
marks it as needing fsck, fixes the journal
[Feb 28 14:51:28] <      pjones> | 5) initscripts runs "fsck -A", which doesn't
touch it because of _netdev in the options (which we can't take out for other
reasons)
[Feb 28 14:51:53] <      pjones> | 6) initscripts remounts it read-write
[Feb 28 14:52:10] <      pjones> | 7) kernel sees the error still and forces it
back to read-only
[Feb 28 14:52:37] <      pjones> | 8) netfs doesn't fsck it because it can't
touch the lockfile in /var/run so it never actually runs correctly
[Feb 28 14:55:51] <      pjones> | (also lots of other stuff fails between 7 and 8)
[Feb 28 14:56:41] <      pjones> |  / *absolutely must* get fscked in
rc.sysinit, not a separate initscript.  

Actual results:
 - filesystem mounted in read-only mode, manual fsck required

Expected results:
 - rc.sysinit should trigger an fsck upon boot.

Additional info:
Comment 2 Bill Nottingham 2008-02-28 23:33:15 EST
This isn't really anything new - this would be the case even in 5.0 GA if you
have a network root block device (iSCSI, GFS2, NBD) - they would all run into
this issue.

Possible solutions, of varying quality:
- Remove _netdev from fstab for /. This would, however, break shutdown.
(Obviously bad.)
- Remove the case from rc.sysinit so that _netdev devices are fscked. This
would, however, break booting with *any* non-root network block devices, as they
wouldn't be found when fsck runs, including existing installations. (Also,
obviously bad.)
- Run fsck on / from the initrd. *ducks*
- Introduce Yet Another Magic Flag, honored by shutdown as 'root is a network
device', but different from _netdev so it would be fscked. Would be a rather
ridiculous hack, but may work.
Comment 4 Bill Nottingham 2008-02-29 12:35:21 EST
As for introducing Yet Another Magic flag - _netdev is handled specifically by
mount(8) - so if we went that route to fix, it would require changes to (at a
minimum) initscripts, anaconda, and util-linux.
Comment 5 Bill Nottingham 2008-02-29 13:37:57 EST
Another alternative is root-causing why fsck from netfs fails for /, and fixing
that issue.
Comment 6 Peter Jones 2008-02-29 15:19:32 EST
notting: I'm not sure how that'll help -- the fs is already failed to RO mode by
the kernel at that point, and can't ever be put back in RW mode.

At any rate, I'm pretty sure the reason netfs doesn't work is that
/var/lock/subsys/netfs is still present.
Comment 8 Bill Nottingham 2008-02-29 18:52:24 EST
(In reply to comment #6)
> At any rate, I'm pretty sure the reason netfs doesn't work is that
> /var/lock/subsys/netfs is still present.

Oh, right. And at that point you can't get to the FS to fix it and behave
reliably. So we're back to the add-another-flag hack.
Comment 9 Eric Sandeen 2008-02-29 19:09:03 EST
Just to document some of the stuff pjones & I looked at....

for the -t option to fsck, "opt=" and "noopt=" options are specified as
comma-delimited, including the "opt/noopt" part.

i.e. -t opt=foo,noopt=bar

And they are cumulative; a filesystem must match each option (or no-option)
specification to be checked.  Above, only filesystems with option foo *and*
without option bar will be checked.

... although at this point I guess it looks like we won't need to combine the
option specifiers in this way... but just in case.

-Eric
Comment 10 Bill Nottingham 2008-03-03 11:33:11 EST
8.45.19.EL-1 built.
Comment 14 errata-xmlrpc 2008-05-21 13:24:15 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0443.html
Comment 15 Fedora Update System 2008-06-05 16:33:22 EDT
cmake-2.4.8-2.fc8 has been submitted as an update for Fedora 8

Note You need to log in before you can comment on or make changes to this bug.