1136810 – Inode leaks upon repeated touch/rm of files in a replicated volume

Bug 1136810 - Inode leaks upon repeated touch/rm of files in a replicated volume

Summary: Inode leaks upon repeated touch/rm of files in a replicated volume

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	posix
Sub Component:
Version:	3.4.2
Hardware:	i686
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-09-03 11:15 UTC by Anirban Ghoshal
Modified:	2015-10-07 13:50 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-10-07 13:50:53 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Anirban Ghoshal 2014-09-03 11:15:50 UTC

Description of problem:
Suppose you have a replica 2 volume /testfs. You repeatedly do the following:
a) touch /testfs/tmp_file
b) rm /testfs/tmp_file

You will notice that the IUse% of the volume (and underlying bricks) fills up. This may be due to the fact that file refs in the brick within the .glusterfs directory do not clean out nicely when the file gets removed.

Version-Release number of selected component (if applicable):
Glusterfs version: 3.4.2
Linux version 2.6.34

How reproducible:
100%

Steps to Reproduce:
1. Create replica 2 volume testfs (preferably small, say, 32 MB?) and mount it (mount.glusterfs) at /testfs
2. check `df -ih /testfs/`
3. while true; do touch /test1/tmp_file; rm -f /test1/tmp_file; done
4. check `df -ih /testfs/` again.  

Actual results:

The IUse% will have increased between the touch/rm operations.

Expected results:

If you create a set of files, and then delete them, then the IUse% should ideally revert to what the value was before the files were created.

Additional info:
None.

Comment 1 Anirban Ghoshal 2014-09-03 11:19:20 UTC

Sorry, for some typo on the 'Steps to Reproduce'. Here is the correct form:

Steps to Reproduce:
1. Create replica 2 volume testfs (preferably small, say, 32 MB?) and mount it (mount.glusterfs) at /testfs
2. check `df -ih /testfs/`
3. while true; do touch /testfs/tmp_file; rm -f /testfs/tmp_file; done
4. check `df -ih /testfs/` again.

Comment 2 Anirban Ghoshal 2014-09-03 13:33:19 UTC

One more thing, I feel I ought to mention here - my bricks are XFS. Sorry if that's too many edits :-|

Comment 3 Anirban Ghoshal 2014-09-15 14:50:02 UTC

Upon further investigation, I find that the number of inodes 'leaked' in this manner cannot be indefinitely high. I came to understand the following: suppose we have a file whose gfid is 0x3b335d3e49a34fad8655cb08ed2802b8. So, the metadata against the file store in each brick will be so:

.glusterfs/3b/33/3b335d3e-49a3-4fad-8655-cb08ed2802b8 -> original_file

Here, the link is generally hard in nature. The entire path within the .glusterfs directory will get created on the fly when original_file is created. Now, when I delete original_file from the glusterfs volume, the link 3b335d3e-49a3-4fad-8655-cb08ed2802b8 will also disappear. But the directories 3b/ and 33/ will remain. Anytime on subsequent creation of new files, if their gfid begins with 3b33, then it will not be necessary to create said directories again. But, a simple touch+rm of a file may cause an increase in the overall number of file-systems

Since the two directories are just 2 hex characters long each, there can only be 65536 such directories in a brick, and so the maximum inodes 'leaked' will never exceed this. For large file-systems (even those that are just a few GB's in size) this will not be too restrictive in any way. But, for smaller file-systems (the one I was working with was just 19 MB), this may lead to complete inode exhaustion at some point. And yes, such small volumes of type replicate may be used in practical systems, if mirroring of data at runtime over the network is desired.

Thus, I downgraded the priority from high to low, but I think it is better not to close it altogether? Maybe some kind of heuristics need to be introduced, that if the volume size falls below a certain threshold, then maybe we ought to delete the metadata directories as well (if empty) so that inode use is optimal?

Comment 4 Niels de Vos 2015-05-17 21:59:04 UTC

GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 5 Kaleb KEITHLEY 2015-10-07 13:50:53 UTC

GlusterFS 3.4.x has reached end-of-life.\                                                   \                                                                               If this bug still exists in a later release please reopen this and change the version or open a new bug.

Note You need to log in before you can comment on or make changes to this bug.