Bug 1565844

Summary: 65000 file heal limit on ext* file systems
Product: [Community] GlusterFS Reporter: Jaco Kroon <jaco>
Component: selfhealAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, jaco, joe, pkarampu, ravishankar, srangana
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-28 18:20:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jaco Kroon 2018-04-10 22:12:00 UTC
Description of problem:

When the underlying bricks are ext4 formatted, then it's impossible for the heal process to have more than 65000 (64999?) entries queued for heal.

This is (presumably) due to the max link-count in ext* being 65000 (and one link is xattrop-${SOME_UUID} which presumably is used for linking against).  The link-count in the ext structure itself is 16-bit, and I can't remember why 65000 was used as the hard limit but I'm sure by design that looked sane.

I found that all files in the .glusterfs/indices/xattrop references the same inode, and that a stat of any of those files shows a link-count of near 65000 (at least 64000) at any point in time.

I've got a 2x2 replicate-distribute cluster with a total of approximately 9.8m files on (df -i).  Monday morning I was forced into a situation where I had to replace the partition on which my bricks was residing (disk failure, neither pvmove nor rsync could get the data off onto the replacement drive).  As such I was forced into a reset-brick situation to get the bricks to rebuild.  This immediately triggered a heal, and I could see directory structures and the like getting recreated.  I also noted that many (most) of the files got created owned root:root and no content.  I discovered this after noticing that in terms of disk usage the size used (df -h) grew much slower than the inode count (df -i), the former grew to approximately 6% of that on the healty brick in the same time as the inode count grew to 26%.  Currently these are at 127G/1.4T (~9%) and 3.5M/9.8M (~36%).

How reproducible:

Not the first time I saw root:root owned, 0-size files, previously fixed it by using find to rm them all (including the gfid hard-linked file and then running stat on the file via fuse mount.

Steps to Reproduce:

1.  Set up glusterfs with ext4 backing the bricks (using replicate).
2.  Create a directory structure containing more than 65k files (possibly a lot more).  Files should not be root: owned, and should not be empty.
3.  Down a brick, reformat and recover using reset-brick.
4.  watch the heal count increase, up to 65k.
5.  find /path/to/brick -user root -size 0

Step 5 will hopefully reveal some files, which would be a problem.  This should not happen.

Possible fixes:

For files, it may be possible to hard-link the gfid that should be healed instead.  This just migrates the risk to that file reaching the hard-link limit but it should be less likely.

For folders (not sure if they end up in xattrop for healing) the current strategy may be fine.

One could if the xattrop-${UUID} file reaches max count unlink it and create a new file (inode) and start using that, but that would cause for much harder calculation of the heal-count since it's no longer a simple stat on this file to get that count.

Comment 1 Joe Julian 2018-04-10 22:28:23 UTC
The hard limit is defined in https://github.com/torvalds/linux/blob/master/fs/ext4/ext4.h#L233

Comment 2 Shyamsundar 2018-06-20 18:24:56 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.

Comment 3 Jaco Kroon 2018-06-21 05:31:23 UTC
That's really a "convenient" way to just close stuff and really sends a "we don't care" message.

gluster.org is down at the moment so I can't check the release notes, however, the fact that other than simply closing this ticket I've not seen any progress on this, so I'm going to assume for now that this is persisting.

Comment 4 Shyamsundar 2018-06-26 19:28:15 UTC
(In reply to Jaco Kroon from comment #3)
> That's really a "convenient" way to just close stuff and really sends a "we
> don't care" message.

Bug reports lingering on EOL releases does not help, having said that, triage them would help and paying attention would help which was not done.

> 
> gluster.org is down at the moment so I can't check the release notes,
> however, the fact that other than simply closing this ticket I've not seen
> any progress on this, so I'm going to assume for now that this is persisting.

The bug is not part of any release notes, as it is either not fixed or addressed in any manner.

Changing the version of this to mainline, as that way it stays open till some attention is given to the same.

Comment 5 Jaco Kroon 2018-06-27 07:20:48 UTC
Hi,

(In reply to Shyamsundar from comment #4)
> (In reply to Jaco Kroon from comment #3)
> > That's really a "convenient" way to just close stuff and really sends a "we
> > don't care" message.
> 
> Bug reports lingering on EOL releases does not help, having said that,
> triage them would help and paying attention would help which was not done.

I filed against 4.0 since that was the version I found the issue in.  I expect this is pretty standard behaviour.  I think I can grasp why bugs against EOL versions simply get closed, but I do still feel that that process should involve some kind of verification from the side of the developers.  In other words - some process that produce a probably can be closed list of bugs, and someone just work through that, and "unsure" bugs gets into a state where the people involved gets  days to respond or the bug gets automatically closed.

I did not file against mainline since that would, for me at least, imply I tested against the git repository, which I did not.

Not sure if there is anything additional that I should (could) have done to avoid the ticket simply being closed.

Thank you for your response to my previous message, and for the additional information.

Kind Regards,
Jaco

Comment 6 Ravishankar N 2018-06-28 15:49:22 UTC
Jaco, 

Pranith had made a fix for this via https://review.gluster.org/#/c/19754/. I think the backport went in for the 4.0.2 release. Can you check if it occurs after upgrading to that version or later?

Thanks,
Ravi

Comment 7 Jaco Kroon 2018-06-28 18:20:39 UTC
Fix looks simple enough and the test case is an adequate test, so if that test case passes I'm happy that the bug should be fixed.  Not sure when I'll be able to rig a test of my own but based on code overview I'm confident that should fix the problem.  Thank you for looking at this.