Red Hat Bugzilla – Bug 165051
mkfs creates too large reserved for root space.
Last modified: 2015-01-04 17:21:18 EST
by default, it seems that mkfs creates filesystems with huge amounts of disk
space 'reserved for root', (According to the man page, default of -m is 5%).
Given the size of todays disks, reserving a percentage of the disk size is a bad
idea, as it wastes a huge amount of space that is never used. It would be better
to by default clip the range at a certain size.
Having to do this by hand with tune2fs afterwards is a tedious step, that could
be prevented. The installer team would prefer the defaults get fixed in mkfs
rather than kludge around this in the installer.
What magical number does the installer team think is right for this?
good question. Jeremy, any ideas ?
How much space is really 'useful' to be reserved ? I don't think I've ever been
in a situation where I've _needed_ that reservation, though I'm sure that some
folks have found it useful.
Dave was the one suggesting there be a magic number ;-)
As for what should be right, my gut tells me that more than 100 megs is probably
(in most cases) overkill. I'm more than willing to be convinced otherwise and
am not tied to anything in particular.
My main objection was that sticking the magic number in the installer is
entirely the wrong place given the number of other ways people can end up
creating filesystems these days (think system-config-lvm, etc). Rather than
change *every* utility we ship, if we instead fix what they all call (mke2fs)
then we're more consistent.
The "reserved for root" thing is not really supposed to be reserved "for root";
the original purpose was to protect against excessive fragmentation, and the
"for root" is just a safety-valve that lets root processes get access to the
extra space in case of emergency.
So it's not a question of how much space is useful, it's a question of working
out all the performance fragmentation dynamics of really really large
filesystems. And that is _scary_.
My gut tells me that once your 2TB fs has less than 100 megs free, your disk
allocate/free performance and your fragmentation is going to be insanely bad.
The point about magic in the installer is well taken, though; we already create
confusion by setting the fsck count/elapsed time fields in the installer, which
causes unexpected fscks on manually-created filesystems. mke2fs is definitely
the right place to make the change; I am just nervous about making such a change
unilaterally, for performance reasons.
I was unaware of the performance reasons of this reservation.
Feel free to close this if you think it isn't worth changing, I can live with
manually doing a tune2fs after installations..
It probably _is_ worth changing; it's an upstream discussion that needs to
occur, though. I'll hang on to it for a while; if I don't get a chance we can
just DEFER it.
Wow, four years has flown by.
So now that we have 1TB drives (and even 2TB if you look hard enough) showing up in desktops, this is getting extremely wasteful.
Sigh, well, at least the filing predates my ownership for almost 3 of those years ;)
This has been talked about but alas, no action by me or by upstream. I'll try to push it along. We'll have to figure out what the "right" value is, esp. in light of the new allocation behavior. (personally, I'd rather just set it to 0 to be honest, but I doubt that'll fly)
Looking for some immediate traction here. I've asked our TAM to raise this up as well. On a 500GB LUN we are now wasting 33GB. LVM wastes 7GB and then mkfs wastes the rest.
Red Hat's current response indicates a performance impact could occur by not providing any space for fragmentation so I'm sure some is needed but its doubtful that this much is needed.
(In reply to comment #9)
> Looking for some immediate traction here. I've asked our TAM to raise this up
> as well. On a 500GB LUN we are now wasting 33GB. LVM wastes 7GB and then mkfs
> wastes the rest.
In the meantime, you are welcome to specify something smaller on the mkfs commandline, of course.
> Red Hat's current response indicates a performance impact could occur by not
> providing any space for fragmentation so I'm sure some is needed but its
> doubtful that this much is needed.
So much of this is aging- and workload-dependent, unfortunately, it'll probably be an exercise in handwaving, in the end.
One thing people have asked for is to allow a fractional percent on the mkfs commandline, since 1% of huge is still pretty big. I'll get a patch for that upstream as a start.
Eric, thanks for your note but we really need some guidance here. I don't want to start changing this on the command line w/out some clear cut performance recommendation. If there is programmatically some way to determine a "safe" percentage that is what we are looking for. It's also important to think about this from a consistency perspective. I really don't want to have to manage all the differences between every file system nor the differences between 100's of servers. For this, it is preferred that mkfs determine the "right" reservation instead of us having to manage them all separately. Would also like tune2fs to report this percentage.
If it could be done programatically at this point, it would be in e2fsprogs already. :)
I understand that we need consistency, but for now none of the developers have a good answer for what's necessary, esp. in the ext4 world where the allocator behavior is completely different.
In addition to a change to accept fractional percentages, a change to allow this to be configured at least system-wide in mke2fs.conf would be helpful.
However, I just don't have guidance for you at this point on what % would be recommended, other than what we have today.
Upstream will stay at 5% by default. Running a filesystem within 5% full is never really recommended, as allocator performance, and therefore filesystem performance, will degrade badly.
The reserved % is now tunable in less-than-full-percent increments, i.e. it accepts a decimal. So an administrator who wants to use right up to 99.83% full can do so if they choose.
Sorry for leaving this one open so long, but there's no reason to think at this point that the default will be changing upstream, and fine-tuning by an admin is quite possible when desired.