Bug 1115201

Summary:	[xfs] can't create inodes in newly added space after xfs_growfs
Product:	Red Hat Enterprise Linux 7	Reporter:	Boris Ranto <branto>
Component:	kernel	Assignee:	Eric Sandeen <esandeen>
Status:	CLOSED ERRATA	QA Contact:	Zorro Lang <zlang>
Severity:	high	Docs Contact:
Priority:	medium
Version:	7.1	CC:	bfoster, dchinner, dmick, eguan, esandeen, g.fhnrunznrqeqf, hamiller, icolle, jkt, leif, pasteur, szhao, vikumar
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-3.10.0-210.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-03-05 12:26:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Boris Ranto 2014-07-01 20:59:23 UTC

Description of problem:
If there is a writer running in the background and I grow the file system while it is running, the inode space is not refreshed.

Version-Release number of selected component (if applicable):
kernel-3.10.0-123.el7.x86_64
xfsprogs-3.2.0-0.10.alpha2.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a small xfs fs (128 MB is enough)
2. Run empty files worker in the background (to exhaust the inode space)
3. Grow the file system

Actual results:
The worker will run out of inode space as if the fs was not grown.

Expected results:
The worker will continue to create new files for the grown fs.

Additional info:
I've tried these scenarios:

First, if I run the worker, wait for it to finish (run out of inode space) and then grow the fs (there are no opened fds) then I can run the worker again and it will continue to fill the fs. This works as expected.

However, if I run the worker in background and grow the fs in the meantime (there is high probability of an open fd) then the worker will stop as if the fs was not grown and I can't create any new files. Running 'mount -o remount /mnt/point' helps and I can continue to grow the fs (so there is a workaround).

This seems to be a bug in xfs_growfs (or kernel) which probably does the remount but fails if the fs can't be remounted (there are opened fds).

The maxpct=0 option does not help (the inode space is bigger but it is still not refreshed).

Comment 1 Eric Sandeen 2014-07-01 21:07:40 UTC

Thanks for the details, I'll take a look.

I have a hunch this is related to incore superblock counters.

What is the "empty files worker?"

If you change it from "empty files" to "1 byte files" does the behavior change?

Comment 2 Eric Sandeen 2014-07-01 21:08:39 UTC

Another question: does "stat -f /mnt/point" also get it going again?

Comment 3 Dan Mick 2014-07-01 21:11:37 UTC

Out of curiosity, how does one distinguish "ENOSPC inodes" vs. "ENOSPC blocks"?

Comment 4 Eric Sandeen 2014-07-01 21:14:11 UTC

fd = open(O_CREAT) = ENOSPC -> no inodes
write(fd, buf, size) = ENOSPC -> no blocks

Comment 5 Boris Ranto 2014-07-01 21:18:11 UTC

Hi Eric,

the empty files worker is just something like this

(i=0;while touch /mnt/point/$i; do i=$(($i+1));done) &

but it could be reproduced with untarred kernel sources so 1 byte files probably would not help.

I'll have to retest with the stat -f call.

btw: the reproducer script is fairly easy, just something like this:

lvcreate -L 128M -n test some_vg
mkfs.xfs /dev/mapper/some_vg-test
mount /dev/mapper/some_vg-test /mnt/point/
(i=0;while touch /mnt/point/$i; do i=$(($i+1));done) &
lvextend -L 1280M /dev/mapper/some_vg-test
xfs_growfs /dev/mapper/some_vg-test
# wait for the background worker to run out of inode space
# then also the following fails
touch /mnt/point/file
# remount will update the inode space
mount -o remount /mnt/point
# Now, we can continue to create new files
(i=0;while touch /mnt/point/$i; do i=$(($i+1));done) &

Comment 6 Boris Ranto 2014-07-01 21:34:24 UTC

I've retested with the stat -f call and that did not get it going again.

Comment 7 Eric Sandeen 2014-07-01 21:37:49 UTC

Ok, different problem than one I've encountered before, then.

Thanks,
-Eric

Comment 8 Dan Mick 2014-07-01 21:56:47 UTC

When we saw similar allocation failures in our testing of ICE, I noticed that sometimes I could get a fail-to-touch (inode allocation failure, I assume), and then some time later get a success for the same path, with no reboot/remount.  Of course root is not static, and some background process might have been freeing inodes with file ops, but....just another data point.

Comment 9 Dan Mick 2014-07-02 23:20:03 UTC

I understand Eric has a bead on this issue and is working a fix.

Comment 10 Eric Sandeen 2014-07-02 23:27:25 UTC

Yes, it looks like it's a problem in rhel6 as well as in rhel7.

Comment 11 Dan Mick 2014-07-03 04:10:38 UTC

FWIW, closing a loop: "mount -o remount /" also solves the problem on my VM testcase.

Comment 12 Dan Mick 2014-07-03 04:12:45 UTC

Sorry, incorrect; it looked like it helped, but no.

Comment 13 Eric Sandeen 2014-07-03 04:20:41 UTC

I bet if you do:

# mount -o remount,inode64

it'll fix it.

Comment 14 Eric Sandeen 2014-07-03 05:04:17 UTC

Sent a patch to the upstream list.

Comment 15 Dan Mick 2014-07-09 01:14:15 UTC

Finally managed to confirm that "remount" does not fix, "remount,inode64" does, in my test scenario.

Comment 16 Eric Sandeen 2014-07-09 21:52:31 UTC

(I take it back, it's not an issue in RHEL6)

Dan, is this some thing that needs attention & fixing prior to RHEL7.1's release, or should we just include the fix along with all the other RHEL7.1 xfs updates?)

Comment 17 Dan Mick 2014-07-09 22:08:18 UTC

Well, our particular test case is "provision a VM from a cloud image using cloud-init", which is not going to be a core use case; however, I don't know how many other "provision from the minimal image" situations might matter more.

I'm pretty sure the original root image is distributed by RedHat for cloud-provisioning purposes, and, given that its root is 6GB, it would not surprise me if most people that use it are growfs'ing root (for simplicity; who wants an evanescent cloud image with multiple disks).

In short I guess I don't *know* of the various business segments that might get hit with this, but I feel like there are probably more than my sorta-weird case
(because the only thing really weird is the use of the cloud-init package itself, cloud-init being an Ubuntu-created utility).

Comment 18 Dan Mick 2014-07-09 22:20:33 UTC

Asking around here, I guess cloud-init is sorta integral to OpenStack and thus RHEL-OSP, so that is probably a vote for this being more important.

Comment 19 Eric Sandeen 2014-07-09 22:22:00 UTC

It's not just a matter of growing it, but growing it *and* running creating many more inodes, right?  Is the original image so tight that you can barely create any new inodes right out of the gate?

Anyway, the question comes down to whether you can document this away for now ("after growing this image, unmount and remount it to ensure that all space is available; this will be fixed in RHEL7.1") or if we need to ship a z-stream kernel with the fix before RHEL7.1GA...

-Eric

Comment 20 Dan Mick 2014-07-09 22:26:15 UTC

Yes, it involves some inode creation, and I don't know how much; in my use case it was "three Ceph OSDs", which do create quite a deep filesystem hierarchy even by default, so it could be that it's more intense than some other uses.  But the failing number of inodes was reported as "1%" (presumably of the new size) by df -i, so I was assuming it wasn't all that many.

As for shipping urgency, I can't really speak to that.

Comment 21 Eric Sandeen 2014-07-09 22:30:28 UTC

Ok, well - barring information to the contrary, we'll just ship it with RHEL7.1.

-Eric

Comment 22 Dan Mick 2014-07-12 04:31:22 UTC

This is a failing state:


$ df -i
Filesystem        Inodes IUsed     IFree IUse% Mounted on
/dev/vda1      104856400 61504 104794896    1% /

Comment 23 Dan Mick 2014-07-12 04:33:01 UTC

and after the remount,inode64, and retrying the failed operation, this is how many were being requested by the failing operation:

]$ df -i 
Filesystem        Inodes IUsed     IFree IUse% Mounted on
/dev/vda1      104856400 61551 104794849    1% /

Comment 25 Eric Sandeen 2014-10-08 22:39:41 UTC

*** Bug 1149912 has been marked as a duplicate of this bug. ***

Comment 28 Leif Maxfield 2014-11-18 21:52:13 UTC

I can't tell if this is the same bug as described in the XFS FAQ, here, and was wondering if anyone could comment: http://xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_receive_No_space_left_on_device_after_xfs_growfs.3F

Excerpt:
'Unfortunately, v3.7 also added a bug present from kernel v3.7 to v3.17 which caused new allocation groups added by growfs to be unavailable for inode allocation. This was fixed by commit 9de67c3b xfs: allow inode allocations in post-growfs disk space. in kernel v3.17. Without that commit, the problem can be worked around by doing a "mount -o remount,inode64" after the growfs operation.'

Since we are on kernel 3.10 of RHEL7, we fall into the range where the above bug occurs.

The filesystem in question was /usr, and I'm not aware that it was using an abnormally large number of inodes _prior_ to the growfs, which makes me question whether or not that is required to repro, or if the FAQ growfs bug simply differs from the bug being discussed here. What I do know is that after growing the filesystem, mkdir and touch would both fail in that filesystem with "no space left on device." I ran the remount as described in the FAQ, which appeared to successfully work around the problem. Unfortunately, I failed to make a note of inode stats before and after I ran the remount.

Comment 29 Eric Sandeen 2014-11-18 22:14:26 UTC

It is the same bug, fixed with the same patch.  This bug addresses the problem, and the next released kernel will have the fix.

It's not just that it's a large number of inodes used prior - if all the space is used up for any reason (i.e. data in files!) and growfs doesn't present new space as available for new inodes, inode allocation will fail.

Comment 30 Leif Maxfield 2014-11-19 21:07:22 UTC

Thanks, Eric. Do you happen to know the ETA for the RHEL7 kernel release that will include the bug fix for this?

Comment 31 Eric Sandeen 2014-11-19 21:08:18 UTC

It's slated for RHEL7.1 but I don't know if I can divulge schedules.

Do you have a support/partner contact?  If it's critical, could request a z-stream update.

Comment 32 Jarod Wilson 2014-11-25 13:20:45 UTC

Patch(es) available on kernel-3.10.0-210.el7

Comment 35 Zorro Lang 2015-01-05 08:42:21 UTC

test by run xfs/015, reproduced on kernel 200, test passed on kernel 220

Comment 36 Dominique Martinet 2015-02-16 14:39:09 UTC

Duplicate bug (1149912) talks about commit 9de67c3ba9ea961ba420573d56479d09d33a7587 (xfs: allow inode allocations in post-growfs disk space), but I got pointed on IRC that it might not be enough.

Was commit 7a1df1561609c14ac457d65d9a4a2b6c0f4204ad (xfs: fix premature enospc on inode allocation) planned for inclusion as well?
(I've hit the same problem overall on 7.0, but honestly can't say if original fix is enough)

Comment 37 Eric Sandeen 2015-02-16 15:01:02 UTC

The 2nd commit is not yet included in RHEL7; at the time this bug was filed and fixed that upstream commit didn't exist.

Comment 39 errata-xmlrpc 2015-03-05 12:26:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html