Bug 188657

Summary: When booting, each partition should be fsck'ed then mounted in turn
Product: [Fedora] Fedora Reporter: JW <ohtmvyyn>
Component: initscriptsAssignee: Bill Nottingham <notting>
Status: CLOSED WONTFIX QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: dwysocha, mitr, rvokal
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-06-06 00:07:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description JW 2006-04-12 00:41:58 UTC
Description of problem:
During system boot if, for example, a disk drive has been removed, then boot
will fail.  But during rc.sysinit, when dropping to a shell, *all* filesystems
will still be mounted r/o even though the problem will generally relate to only
a single disk or partition.

Version-Release number of selected component (if applicable):
initscripts-8.31.1-1

How reproducible:
Every time

Steps to Reproduce:
1. shutdown
2. remove a disk drive
3. power up
  
Actual results:
ef2sk returns error (8 usually).
User given option of a shell for repair.
Unfortunately, unless the user knows "mount -n -o remount,rw /", the poor
hapless user will be unable to recover from this recoverable error.

Expected results:
As each partition is fsck'ed it should be mounted.
There is no point fsck'ing all the partitions en-mass before mounting any of
them.  Root fs should be checked, then mounted rw, and then each partition
should be checked, and mounted if ok.  Only after all of these fscks and
mounts have completed should user be given option of repairing ... and it
shouldn't always be necessary to reboot.

Additional info:
See above.

Comment 1 Miloslav Trmač 2006-06-05 23:10:00 UTC
Mounting a few partitions read-write won't help the unexperienced administrator
much; they'll still have to be able to cope with the basic command line.
Running fsck for all filesystems at once is actually important: it allows running
fsck in parallel on multiple physical devices, which can save a lot of time.

As for "it shouldn't be always necessary to reboot", the initscripts have no way
to determine when it is safe not to reboot.  After all, the time spent rebooting
is at most comparable to the time necessary to manually fix the problems, and is
usually much smaller than that.


Comment 2 JW 2006-06-05 23:43:01 UTC
(In reply to comment #1)
> Mounting a few partitions read-write won't help the unexperienced administrator
> much;

Yes it would.

> they'll still have to be able to cope with the basic command line.

But if a disk drive is physically removed why should fsck even run?
A non-existent disk drive cannot be corrected by dropping to command-line!

> Running fsck for all filesystems at once is actually important: it allows running
> fsck in parallel on multiple physical devices, which can save a lot of time.

But that is not at odds with running fsck then mounting in turn.
In other words:
    foreach filesystem
    ( fsck $fs; mount $fs) &

Instead of:
    foreach filesystem
    (fsck $fs) &
    foreach filesystem
    mount $fs

> 
> As for "it shouldn't be always necessary to reboot", the initscripts have no way
> to determine when it is safe not to reboot.

That is an incorrect statement. Fsck returns all sorts of error codes
to assist initscripts in making such a determination.
If fsck returned correct error code for "missing disk drive" then initscripts
could do something more appropriate.

>  After all, the time spent rebooting
> is at most comparable to the time necessary to manually fix the problems, and is
> usually much smaller than that.
> 

It isn't really a matter of saving time.


Comment 3 Miloslav Trmač 2006-06-06 00:07:00 UTC
(In reply to comment #2)
> But if a disk drive is physically removed why should fsck even run?
> A non-existent disk drive cannot be corrected by dropping to command-line!
The OS can't determine whether the block device is missing because it was
intentionally removed and the administrator only forgot to update /etc/fstab,
or because the cable is loose, or the SAN is disconnected.  It has to delegate
the decision to a human.

Booting a system with some partitions unmounted can make the situation even
worse, e.g. writing to /home/foo when the /home partition is not mounted
will make the data inaccessible after the problem is fixed and /home is mounted
again.


> > Running fsck for all filesystems at once is actually important: it allows
running
> > fsck in parallel on multiple physical devices, which can save a lot of time.
> 
> But that is not at odds with running fsck then mounting in turn.
> In other words:
>     foreach filesystem
>     ( fsck $fs; mount $fs) &
> 
> Instead of:
>     foreach filesystem
>     (fsck $fs) &
>     foreach filesystem
>     mount $fs
First, that's not at all what is happening; fsck contains code to determine
which file systems are stored on a single block device and runs checks in
parallel only on separate block devices; running two checks on a single
device would decrease performance instead of increasing it.

(fsck -A) also automatically collects the exit codes of its child processes;
doing the same within rc.sysinit would be complicated and error-prone, and
there is no clean way to write a similar top-level process handler for the
"mount one by one" case because several actions need to run between the
filesystem check and mounting.

> > As for "it shouldn't be always necessary to reboot", the initscripts have no way
> > to determine when it is safe not to reboot.
> 
> That is an incorrect statement. Fsck returns all sorts of error codes
> to assist initscripts in making such a determination.
> If fsck returned correct error code for "missing disk drive" then initscripts
> could do something more appropriate.
Even if fsck returned such an error code, initscripts would still have to
drop to a shell, as described above.  Then it would be necessary to somehow
determine whether all necessary filesystems are correctly mounted; rebooting
is a much simpler way to achieve guaranteed clean state, and the cost is
quite small.

> >  After all, the time spent rebooting
> > is at most comparable to the time necessary to manually fix the problems, and is
> > usually much smaller than that.
> It isn't really a matter of saving time.
In that case rebooting is the right thing because it guarantees the system
state doesn't depend on whether the system has required administrator
intervention during bootup.


initscripts really can't special-case every possible administrator error;
the code needs to be reliable, and therefore as simple as possible.


Comment 5 Dave Wysochanski 2007-01-17 15:21:16 UTC
*** Bug 222286 has been marked as a duplicate of this bug. ***