2011104 – statfs reports wrong free space for small quotas

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2011104 - statfs reports wrong free space for small quotas

Summary: statfs reports wrong free space for small quotas

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	---
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Yogesh Mane
Docs Contact:	John Wilkins
URL:
Whiteboard:
Depends On:
Blocks:	2007283
TreeView+	depends on / blocked

Reported:	2021-10-06 01:27 UTC by Patrick Donnelly
Modified:	2024-12-20 21:18 UTC (History)
CC List:	8 users (show)
Fixed In Version:	kernel-4.18.0-372.1.1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-10 15:02:05 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/rhel/src/kernel rhel-8 merge_requests 2065	None	None	None	2022-02-04 18:11:58 UTC
Red Hat Issue Tracker	RHELPLAN-98943	None	None	None	2021-10-06 01:27:48 UTC
Red Hat Product Errata	RHSA-2022:1988	None	None	None	2022-05-10 15:02:43 UTC

Description Patrick Donnelly 2021-10-06 01:27:28 UTC

Description of problem:

See bz2007283. Reproduced easily with:

> $ ceph fs subvolume create a foo --size $((1<<20))
> $ ceph fs subvolume create a bar --size $((1<<25))
> $ sudo mount.ceph  :$(ceph fs subvolume getpath a foo) mnt.0 -o name=admin
> $ df -h ~/mnt/mnt.0
>Filesystem                                                                                                  Size  Used Avail Use% Mounted on
>127.0.0.1:40626,127.0.0.1:40628,127.0.0.1:40630:/volumes/_nogroup/foo/6b7c22fe-0543-42df-9f0a-91002ac9bfad   99G     0   99G   0% /home/pdonnell/mnt/mnt.0
> $ sudo mount.ceph  :$(ceph fs subvolume getpath a bar) ~/mnt/mnt.1 -o name=admin
> $ df -h ~/mnt/mnt.1/
>Filesystem                                                                                                  Size  Used Avail Use% Mounted on
>127.0.0.1:40626,127.0.0.1:40628,127.0.0.1:40630:/volumes/_nogroup/bar/97f97a5d-1141-40fa-b8d7-6f00c94791a2   32M     0   32M   0% /home/pdonnell/mnt/mnt.1

Comment 1 Jeff Layton 2021-10-06 13:46:35 UTC

I think I see the issue: when max_bytes is less than 4M (1 << CEPH_BLOCK_SHIFT), ceph_quota_update_statfs will end up returning false, and we fall back to using the info in ceph_statfs structure instead of the quota info. Is it sane to allow subvolumes that are that small?

One thing you could do is change the f_bsize/f_frsize to be smaller in the cases where the maxsize is less than 4M, and recalculate f_blocks/f_used based on that blocksize.

Comment 4 Jeff Layton 2021-10-26 13:23:52 UTC

We need to consider how to handle this situation.

Currently we report a 4M blocksize in statfs, but that doesn't work for subvols that are smaller than that, so we'll need to use a smaller blocksize in those cases (at least). We can either just switch to a smaller blocksize when the quota is very small, or we could just use a smaller blocksize everywhere.

Which would be preferred? What smaller blocksize should we use? 1M? 1K? What's the smallest quota you can set?

Patrick, thoughts?

Comment 5 Patrick Donnelly 2021-10-26 17:38:13 UTC

(In reply to Jeff Layton from comment #4)
> We need to consider how to handle this situation.
> 
> Currently we report a 4M blocksize in statfs, but that doesn't work for
> subvols that are smaller than that, so we'll need to use a smaller blocksize
> in those cases (at least). We can either just switch to a smaller blocksize
> when the quota is very small, or we could just use a smaller blocksize
> everywhere.
> 
> Which would be preferred? What smaller blocksize should we use? 1M? 1K?
> What's the smallest quota you can set?
> 
> Patrick, thoughts?

Changing the block size could have unintended consequences I think. I do agree we should (with 20/20 hindsight) not allow such small quotas but that ship has sailed (although we could change it!). Is it possible to keep the 4MB blocksize but just have a very binary use/free: either you've used the full block or the whole block is free?

Comment 6 Jeff Layton 2021-10-26 18:04:50 UTC

(In reply to Patrick Donnelly from comment #5)
> 
> Changing the block size could have unintended consequences I think. I do
> agree we should (with 20/20 hindsight) not allow such small quotas but that
> ship has sailed (although we could change it!). Is it possible to keep the
> 4MB blocksize but just have a very binary use/free: either you've used the
> full block or the whole block is free?

It could have unintended consequences, but not much actually depends on the blocksize (particularly not with a netfs like ceph). We probably could change it and just see how it goes.

We could just go with the binary used/free like you suggest and leave the blocksize alone. It might make for weird accounting if you have a bunch of tiny quotas, but I guess it'd still be technically "correct".

Comment 7 Kotresh HR 2021-10-28 10:58:01 UTC

I think I didn't get the benefit of going with binary use/free for subvolumes less than blocksize (4M)?

The quota calculation and validation happens at bytes level and it works correctly. The issue is
solely related to 'statfs' report. If we decide to go by binary use/free of full block, the
'total=used=free=0' for subvolumes less than block size (i.e. 4M). But the 'df' tool doesn't
report mounts with 0 size. Is that the expectation?

Comment 8 Jeff Layton 2021-10-28 12:06:11 UTC

(In reply to Kotresh HR from comment #7)
> Is that the expectation?

FWIW, this problem is probably one of those where coding the fix won't be hard at all, but figuring out what we should code up is.

Good point about df not reporting those filesystems. That will certainly be the result and that's probably not what we want. Also, a binary use/free like that is probably not what users want either. If they're declaring quotas that small, then they probably want to be able to see that reflected in "df" or similar applications.

Let's step back and think about what applications really want here. statfs comes from a simpler time when all UNIX had to deal with was block-based filesystems. All netfs's fake this up to some degree. In practice most applications don't care about the blocksize at all -- they just use it to multiply the other values from statfs(2) to get values in bytes.

There is no reason we have to report the same f_frsize everywhere. We could keep a 4M blocksize for "reasonably sized" volumes and report a smaller blocksize for inodes with tiny quotas.

Just looking around, NFS uses these values:

    #define NFS_MAX_FILE_IO_SIZE    (1048576U)
    #define NFS_DEF_FILE_IO_SIZE    (4096U)
    #define NFS_MIN_FILE_IO_SIZE    (1024U)


Maybe what we should do is switch to reporting a 4k blocksize in ceph_quota_update_statfs when ci->i_max_bytes is below some threshold. That threshold could be 4M, or we could make it break at a larger value (64M? 1G?).

One caveat -- avoid using a blocksize that is so small that it causes other "count" values to overflow a 32-bit value. That can result in old 32-bit programs getting an EOVERFLOW from glibc when the values don't fit in a legacy statfs struct.

Comment 9 Patrick Donnelly 2021-10-29 13:15:12 UTC

(In reply to Kotresh HR from comment #7)
> I think I didn't get the benefit of going with binary use/free for
> subvolumes less than blocksize (4M)?
> 
> The quota calculation and validation happens at bytes level and it works
> correctly. The issue is
> solely related to 'statfs' report. If we decide to go by binary use/free of
> full block, the
> 'total=used=free=0' for subvolumes less than block size (i.e. 4M).

No, it would be total=free=4MB, used=0 for subvolumes with a quota <= 4MB that is not fully exhausted. If the quota is full, then total=used=4MB, free=0.

Comment 33 errata-xmlrpc 2022-05-10 15:02:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1988

Note You need to log in before you can comment on or make changes to this bug.