Bug 649089
Summary: | anaconda should not disable automatic filesystem checks on journaled ext3/4 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | James Ralston <ralston> |
Component: | anaconda | Assignee: | David Lehman <dlehman> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | esandeen, jasonmc, jcm, jonathan, oliver.henshaw, sct, vanmeeuwen+fedora |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-02-15 22:19:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
James Ralston
2010-11-02 22:39:10 UTC
I am inclined to think that anaconda should not be in the business of overriding default filesystem settings. It doesn't matter at all to me how long we have been doing it this way. Stephen, what do you think? This tune2fs call to remove the forced fsck was added per your request in August of 2001. There are two completely different questions here... what should the default be, and how should we set the default. mke2fs.conf was only added in 2006, so our changes before that predate the ability to set meaningful system-wide defaults in the config file. In general, mke2fs.conf would seem to be a more appropriate place to be doing this these days. As for what the default is, current behaviour is widely expected and easy enough to override if users want to. Online fs checking via snapshots is possible but can be slow; online correction is not yet possible, of course. And desktop users are not immune from the "too slow to boot" issue mentioned above for servers; these days desktops with TB-capacity disks are common, and unexpected slow boot can still be a serious problem (eg. booting a laptop to run a presentation and finding it takes half an hour to fsck... not good!) Seems like the sort of change that would be better discussed on the fedora lists, though, as it's likely to garner a wide variety of opinions. Eric, I'm adding you in since you maintain e2fsprogs. We have code in anaconda that calls 'tune2fs -c0 -i0' on all new ext[234] filesystems. It doesn't make sense for anaconda to override filesystem defaults, so I'm going to remove it (from rawhide and F15). If you think this should be default behavior for new ext[234] filesystems, please add something to /etc/mke2fs.conf or wherever you think is appropriate. Please don't make the change until we've at least had a chance to discuss it more widely, it's a huge impact to the end user if we end up with the tuning gone from anaconda but not added to the mke2fs.conf. Sorry for the late reply, was travelling a lot. Let me try to tackle some of these ..
I'm sympathetic to the argument that anaconda shouldn't be overriding defaults; that's a good guiding principle.
To be honest I'd like to remove the forced fsck upstream as well, and have talked with Ted about it. The rationale in the original comment on this bug includes things like:
> Thus, the strength of a journaled filesystem (avoiding an exhaustive filesystem
> check if a filesystem isn't unmounted cleanly) is also its weakness: in the
> event that a crash is hardware-related, you absolutely, positively want to
> perform an exhaustive filesystem check, because it is the only way to find
> corruption.
and I totally agree - but it doesn't follow that therefore extN should be your nanny and (eventually) do it for you. In the case above you probably want to -immediately- run fsck, not wait until 6 months or 30 mounts have expired.
As for the creeping corruption argument, extN should be good at finding corruption runtime; for example we exhaustively check htree directories on every access - almost too often I think, at a performance penalty. If we don't catch existing on-disk data corruption on access, then we have a filesystem bug.
I'm not sold on the notion that semi-random forced full filesystem checks of journaling filesystems are a good thing. extN is the only one I know of which has this interesting feature.
Another note:
> This is why periodic forced filesystem checks are critically important for
> journaled filesystems: since a journaled filesystem will NEVER be marked as
> dirty, periodic forced filesystem checks are the only possible mechanism to
> find corruption in the filesystem before it can cascade.
This isn't quit correct. Any error which would trip the error handling (i.e. errors=remount-ro) behavior (for example the aforementioned directory tree consistency checking) will mark the fs as being in an error state, and the next fsck -will- do a full run.
A far better user experience would be to pop-up a notification (or whatever will replace that in gnome-shell...) telling the user that they might want to do an fsck and offering them an option. You could create a flag file in /boot that indicates on next boot a full fsck should be performed. There. No need to check on some random interval or whatever, just if the user chooses to do so. Jon, that'd be as awesome as Windows XP constantly asking me if I really want to do <insert many random things here> ;) If the fs detects corruption it'll shut down and fsck on next reboot. Why do we need more than this? One of these days I'm going to go ahead and remove this from anaconda. If you guys want to get something else in place before then, I'd suggest you get to it. Ted doesn't want to drop it, so I guess it is what it is. Is it feasible to add lvcheck (for regular cron-scheduled checking of snapshots) and reverting to the default fsck intervals? That way, boot-time fscks are indefinitely delayed by successful snapshot-time fscks; a failed snapshot fsck triggers a real fsck on te next boot; finally, a boot-time fsck eventually happens even if the cron-scheduled fscks repeatedly fail to start, or complete. I have committed and pushed a patch for rawhide (not F15) that removes all non-default setting of options via tune2fs. The non-default settings we applied were disabling time- and mount-based fsck intervals and enabling posix acls and user-defined xattrs. (In reply to comment #11) > Is it feasible to add lvcheck (for regular cron-scheduled checking of > snapshots) and reverting to the default fsck intervals? That way, boot-time > fscks are indefinitely delayed by successful snapshot-time fscks; a failed > snapshot fsck triggers a real fsck on te next boot; finally, a boot-time fsck > eventually happens even if the cron-scheduled fscks repeatedly fail to start, > or complete. lvcheck is not a viable solution since not all filesystems are on lvm storage. Perhaps I misunderstand or the name is misleading? It'd work on any snapshottable storage, in theory, or could be expanded to do so, but it is only useful with snapshots. It's not a complete solution. Oliver, lvcheck in fedora would be nice; someone should champion it a a feature in a future release, hint hint... I agree that the lvcheck script is a very nice idea and is worthy of inclusion in Fedora -- it just doesn't fill _all_ of our fscking needs. |