1115201 – [xfs] can't create inodes in newly added space after xfs_growfs

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1115201 - [xfs] can't create inodes in newly added space after xfs_growfs

Summary: [xfs] can't create inodes in newly added space after xfs_growfs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Sandeen
QA Contact:	Zorro Lang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1149912 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-01 20:59 UTC by Boris Ranto
Modified:	2019-05-20 11:13 UTC (History)
CC List:	13 users (show)
Fixed In Version:	kernel-3.10.0-210.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-05 12:26:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:0290	0	normal	SHIPPED_LIVE	Important: kernel security, bug fix, and enhancement update	2015-03-05 16:13:58 UTC

Description Boris Ranto 2014-07-01 20:59:23 UTC

Description of problem:
If there is a writer running in the background and I grow the file system while it is running, the inode space is not refreshed.

Version-Release number of selected component (if applicable):
kernel-3.10.0-123.el7.x86_64
xfsprogs-3.2.0-0.10.alpha2.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a small xfs fs (128 MB is enough)
2. Run empty files worker in the background (to exhaust the inode space)
3. Grow the file system

Actual results:
The worker will run out of inode space as if the fs was not grown.

Expected results:
The worker will continue to create new files for the grown fs.

Additional info:
I've tried these scenarios:

First, if I run the worker, wait for it to finish (run out of inode space) and then grow the fs (there are no opened fds) then I can run the worker again and it will continue to fill the fs. This works as expected.

However, if I run the worker in background and grow the fs in the meantime (there is high probability of an open fd) then the worker will stop as if the fs was not grown and I can't create any new files. Running 'mount -o remount /mnt/point' helps and I can continue to grow the fs (so there is a workaround).

This seems to be a bug in xfs_growfs (or kernel) which probably does the remount but fails if the fs can't be remounted (there are opened fds).

The maxpct=0 option does not help (the inode space is bigger but it is still not refreshed).

Comment 1 Eric Sandeen 2014-07-01 21:07:40 UTC

Thanks for the details, I'll take a look.

I have a hunch this is related to incore superblock counters.

What is the "empty files worker?"

If you change it from "empty files" to "1 byte files" does the behavior change?

Comment 2 Eric Sandeen 2014-07-01 21:08:39 UTC

Another question: does "stat -f /mnt/point" also get it going again?

Comment 3 Dan Mick 2014-07-01 21:11:37 UTC

Out of curiosity, how does one distinguish "ENOSPC inodes" vs. "ENOSPC blocks"?

Comment 4 Eric Sandeen 2014-07-01 21:14:11 UTC

fd = open(O_CREAT) = ENOSPC -> no inodes
write(fd, buf, size) = ENOSPC -> no blocks

Comment 5 Boris Ranto 2014-07-01 21:18:11 UTC

Hi Eric,

the empty files worker is just something like this

(i=0;while touch /mnt/point/$i; do i=$(($i+1));done) &

but it could be reproduced with untarred kernel sources so 1 byte files probably would not help.

I'll have to retest with the stat -f call.

btw: the reproducer script is fairly easy, just something like this:

lvcreate -L 128M -n test some_vg
mkfs.xfs /dev/mapper/some_vg-test
mount /dev/mapper/some_vg-test /mnt/point/
(i=0;while touch /mnt/point/$i; do i=$(($i+1));done) &
lvextend -L 1280M /dev/mapper/some_vg-test
xfs_growfs /dev/mapper/some_vg-test
# wait for the background worker to run out of inode space
# then also the following fails
touch /mnt/point/file
# remount will update the inode space
mount -o remount /mnt/point
# Now, we can continue to create new files
(i=0;while touch /mnt/point/$i; do i=$(($i+1));done) &

Comment 6 Boris Ranto 2014-07-01 21:34:24 UTC

I've retested with the stat -f call and that did not get it going again.

Comment 7 Eric Sandeen 2014-07-01 21:37:49 UTC

Ok, different problem than one I've encountered before, then.

Thanks,
-Eric

Comment 8 Dan Mick 2014-07-01 21:56:47 UTC

When we saw similar allocation failures in our testing of ICE, I noticed that sometimes I could get a fail-to-touch (inode allocation failure, I assume), and then some time later get a success for the same path, with no reboot/remount.  Of course root is not static, and some background process might have been freeing inodes with file ops, but....just another data point.

Comment 9 Dan Mick 2014-07-02 23:20:03 UTC

I understand Eric has a bead on this issue and is working a fix.

Comment 10 Eric Sandeen 2014-07-02 23:27:25 UTC

Yes, it looks like it's a problem in rhel6 as well as in rhel7.

Comment 11 Dan Mick 2014-07-03 04:10:38 UTC

FWIW, closing a loop: "mount -o remount /" also solves the problem on my VM testcase.

Comment 12 Dan Mick 2014-07-03 04:12:45 UTC

Sorry, incorrect; it looked like it helped, but no.

Comment 13 Eric Sandeen 2014-07-03 04:20:41 UTC

I bet if you do:

# mount -o remount,inode64

it'll fix it.

Comment 14 Eric Sandeen 2014-07-03 05:04:17 UTC

Sent a patch to the upstream list.

Comment 15 Dan Mick 2014-07-09 01:14:15 UTC

Finally managed to confirm that "remount" does not fix, "remount,inode64" does, in my test scenario.

Comment 16 Eric Sandeen 2014-07-09 21:52:31 UTC

(I take it back, it's not an issue in RHEL6)

Dan, is this some thing that needs attention & fixing prior to RHEL7.1's release, or should we just include the fix along with all the other RHEL7.1 xfs updates?)

Comment 17 Dan Mick 2014-07-09 22:08:18 UTC

Well, our particular test case is "provision a VM from a cloud image using cloud-init", which is not going to be a core use case; however, I don't know how many other "provision from the minimal image" situations might matter more.

I'm pretty sure the original root image is distributed by RedHat for cloud-provisioning purposes, and, given that its root is 6GB, it would not surprise me if most people that use it are growfs'ing root (for simplicity; who wants an evanescent cloud image with multiple disks).

In short I guess I don't *know* of the various business segments that might get hit with this, but I feel like there are probably more than my sorta-weird case
(because the only thing really weird is the use of the cloud-init package itself, cloud-init being an Ubuntu-created utility).

Comment 18 Dan Mick 2014-07-09 22:20:33 UTC

Asking around here, I guess cloud-init is sorta integral to OpenStack and thus RHEL-OSP, so that is probably a vote for this being more important.

Comment 19 Eric Sandeen 2014-07-09 22:22:00 UTC

It's not just a matter of growing it, but growing it *and* running creating many more inodes, right?  Is the original image so tight that you can barely create any new inodes right out of the gate?

Anyway, the question comes down to whether you can document this away for now ("after growing this image, unmount and remount it to ensure that all space is available; this will be fixed in RHEL7.1") or if we need to ship a z-stream kernel with the fix before RHEL7.1GA...

-Eric

Comment 20 Dan Mick 2014-07-09 22:26:15 UTC

Yes, it involves some inode creation, and I don't know how much; in my use case it was "three Ceph OSDs", which do create quite a deep filesystem hierarchy even by default, so it could be that it's more intense than some other uses.  But the failing number of inodes was reported as "1%" (presumably of the new size) by df -i, so I was assuming it wasn't all that many.

As for shipping urgency, I can't really speak to that.

Comment 21 Eric Sandeen 2014-07-09 22:30:28 UTC

Ok, well - barring information to the contrary, we'll just ship it with RHEL7.1.

-Eric

Comment 22 Dan Mick 2014-07-12 04:31:22 UTC

This is a failing state:


$ df -i
Filesystem        Inodes IUsed     IFree IUse% Mounted on
/dev/vda1      104856400 61504 104794896    1% /

Comment 23 Dan Mick 2014-07-12 04:33:01 UTC

and after the remount,inode64, and retrying the failed operation, this is how many were being requested by the failing operation:

]$ df -i 
Filesystem        Inodes IUsed     IFree IUse% Mounted on
/dev/vda1      104856400 61551 104794849    1% /

Comment 25 Eric Sandeen 2014-10-08 22:39:41 UTC

*** Bug 1149912 has been marked as a duplicate of this bug. ***

Comment 28 Leif Maxfield 2014-11-18 21:52:13 UTC

I can't tell if this is the same bug as described in the XFS FAQ, here, and was wondering if anyone could comment: http://xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_receive_No_space_left_on_device_after_xfs_growfs.3F

Excerpt:
'Unfortunately, v3.7 also added a bug present from kernel v3.7 to v3.17 which caused new allocation groups added by growfs to be unavailable for inode allocation. This was fixed by commit 9de67c3b xfs: allow inode allocations in post-growfs disk space. in kernel v3.17. Without that commit, the problem can be worked around by doing a "mount -o remount,inode64" after the growfs operation.'

Since we are on kernel 3.10 of RHEL7, we fall into the range where the above bug occurs.

The filesystem in question was /usr, and I'm not aware that it was using an abnormally large number of inodes _prior_ to the growfs, which makes me question whether or not that is required to repro, or if the FAQ growfs bug simply differs from the bug being discussed here. What I do know is that after growing the filesystem, mkdir and touch would both fail in that filesystem with "no space left on device." I ran the remount as described in the FAQ, which appeared to successfully work around the problem. Unfortunately, I failed to make a note of inode stats before and after I ran the remount.

Comment 29 Eric Sandeen 2014-11-18 22:14:26 UTC

It is the same bug, fixed with the same patch.  This bug addresses the problem, and the next released kernel will have the fix.

It's not just that it's a large number of inodes used prior - if all the space is used up for any reason (i.e. data in files!) and growfs doesn't present new space as available for new inodes, inode allocation will fail.

Comment 30 Leif Maxfield 2014-11-19 21:07:22 UTC

Thanks, Eric. Do you happen to know the ETA for the RHEL7 kernel release that will include the bug fix for this?

Comment 31 Eric Sandeen 2014-11-19 21:08:18 UTC

It's slated for RHEL7.1 but I don't know if I can divulge schedules.

Do you have a support/partner contact?  If it's critical, could request a z-stream update.

Comment 32 Jarod Wilson 2014-11-25 13:20:45 UTC

Patch(es) available on kernel-3.10.0-210.el7

Comment 35 Zorro Lang 2015-01-05 08:42:21 UTC

test by run xfs/015, reproduced on kernel 200, test passed on kernel 220

Comment 36 Dominique Martinet 2015-02-16 14:39:09 UTC

Duplicate bug (1149912) talks about commit 9de67c3ba9ea961ba420573d56479d09d33a7587 (xfs: allow inode allocations in post-growfs disk space), but I got pointed on IRC that it might not be enough.

Was commit 7a1df1561609c14ac457d65d9a4a2b6c0f4204ad (xfs: fix premature enospc on inode allocation) planned for inclusion as well?
(I've hit the same problem overall on 7.0, but honestly can't say if original fix is enough)

Comment 37 Eric Sandeen 2015-02-16 15:01:02 UTC

The 2nd commit is not yet included in RHEL7; at the time this bug was filed and fixed that upstream commit didn't exist.

Comment 39 errata-xmlrpc 2015-03-05 12:26:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html

Note You need to log in before you can comment on or make changes to this bug.