Description of problem:
Delete operation doesn't free up the space from brick. Deleted files goes into .glusterfs/unlink directory and continue to occupy space after deletion.
Tested the scenario with the a replicate and distributed volume and it is always reproducible.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Setup a RHGS 3.1.3 cluster
2. Create a volume and and export it using nfs-ganesha
3. Mount the share from the client using nfs
4. Create some random files(with size 1GB or so. Easily noticeable after deletion)
5. Delete the file and check the .glusterfs/unlink directory
Files are not getting deleted permanently.
[root@dhcp7-83 nfsclient1]# ll
[root@dhcp7-83 nfsclient1]# df -hT .
Filesystem Type Size Used Avail Use% Mounted on
nfs 1.9G 1.1G 906M 54% /nfsclient1
Server Brick directory
[root@dhcp7-24 unlink]# pwd
[root@dhcp7-24 unlink]# ll -h
-rw-r--r--. 1 root root 1000M Sep 8 12:33 67b3055f-2fbf-47b0-893a-4f6b7d8f087c
Files should be removed completely from the volume after a delete operation.
Reproducible with both the below volumes
Volume Name: testgnsha1
Volume ID: a1b4bb75-5838-4a5f-8c7b-a691eafcbff1
Number of Bricks: 2
Volume Name: testgnsha2
Volume ID: 3d9b95d4-24f6-4bd7-9589-80a31b50fadd
Number of Bricks: 1 x 2 = 2
This issue was reported in gluster-devel too. The details are mentioned in the below mail thread -
There was an fd leak when a file is created using gfapi handleops (which
NFS-Ganesha uses) and FWIU, if there is an open fd, glusterfs-server
moves the file being removed to ".glusterfs/unlink" folder unless its
inode entry gets purged when the inode table which it maintains gets
full or the brick process is restarted.
The fix for "glfd" leak is already merged in master -
The fix is merged in upstream gluster releases and shall be available in RHGS 3.2 release.
The fix is available from glusterfs-3.7.13 version (bug1351877). The work-around is to restart brick process i.e, volume to delete those files under .unlink folder.
Upstream mainline : http://review.gluster.org/14532
Upstream 3.8 : http://review.gluster.org/14820
And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
I will consider this hotfix as not approved.
Please let me know if you defer from it?
We will wait for BZ #1379329 fix as well.
(In reply to Bipin Kunal from comment #17)
> I will consider this hotfix as not approved.
> Please let me know if you defer from it?
> We will wait for BZ #1379329 fix as well.
Considering the fact that the issue #1379329 is seen for a very specific test related to locks and will not be seen in normal scenarios, i would say we are good with the hotfix even if we defer #1379329 for next release.
However, this can be confirmed once we have rca for the issue.
@Soumya can give more details on this.
As mentioned by Shashank above, we see a leak only in case of below scenario -
1) lockA is taken on a file
2) either lockA is being upgraded/downgraded with same owner or
3) lockB (with same owner) overlapping with lockA range is issued.
A fix for this issue is updated in BZ #1379329 . But please note that this fix is not applicable to current nfs-ganesha upstream codebase 2.4 i.e, to RHGS 3.2 as well. Hence it may be worth to check with the customer if the above mentioned scenarios are applicable to their workload before further evaluating the additional time needed for this fix review and testing required.
Hotfix available at  is qe verified as per comments 16 to 23.
Considering the bug fix for https://bugzilla.redhat.com/show_bug.cgi?id=1379329
verified the fix in build,
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.