Description of problem: Delete operation doesn't free up the space from brick. Deleted files goes into .glusterfs/unlink directory and continue to occupy space after deletion. Tested the scenario with the a replicate and distributed volume and it is always reproducible. Version-Release number of selected component (if applicable): RHGS 3.1.3 glusterfs-server-3.7.9-12.el7rhgs.x86_64 nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64 nfs-ganesha-2.3.1-8.el7rhgs.x86_64 glusterfs-ganesha-3.7.9-12.el7rhgs.x86_64 How reproducible: Always. Steps to Reproduce: 1. Setup a RHGS 3.1.3 cluster 2. Create a volume and and export it using nfs-ganesha 3. Mount the share from the client using nfs 4. Create some random files(with size 1GB or so. Easily noticeable after deletion) 5. Delete the file and check the .glusterfs/unlink directory Actual results: Files are not getting deleted permanently. Client side ------------ [root@dhcp7-83 nfsclient1]# ll total 0 [root@dhcp7-83 nfsclient1]# df -hT . Filesystem Type Size Used Avail Use% Mounted on 10.65.7.20:/testgnsha2 nfs 1.9G 1.1G 906M 54% /nfsclient1 Server Brick directory ------------ [root@dhcp7-24 unlink]# pwd /brick2/brick2/.glusterfs/unlink [root@dhcp7-24 unlink]# ll -h total 1001M -rw-r--r--. 1 root root 1000M Sep 8 12:33 67b3055f-2fbf-47b0-893a-4f6b7d8f087c Expected results: Files should be removed completely from the volume after a delete operation. Additional info: Reproducible with both the below volumes Volume Name: testgnsha1 Type: Distribute Volume ID: a1b4bb75-5838-4a5f-8c7b-a691eafcbff1 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: dhcp7-24:/brick/brick1 Brick2: dhcp7-23:/brick/brick1 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable Volume Name: testgnsha2 Type: Replicate Volume ID: 3d9b95d4-24f6-4bd7-9589-80a31b50fadd Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: dhcp7-24:/brick2/brick2 Brick2: dhcp7-23:/brick2/brick2 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on nfs.disable: on performance.readdir-ahead: on nfs-ganesha: enable cluster.enable-shared-storage: enable
This issue was reported in gluster-devel too. The details are mentioned in the below mail thread - http://www.gluster.org/pipermail/gluster-devel/2016-July/049954.html >>>> There was an fd leak when a file is created using gfapi handleops (which NFS-Ganesha uses) and FWIU, if there is an open fd, glusterfs-server moves the file being removed to ".glusterfs/unlink" folder unless its inode entry gets purged when the inode table which it maintains gets full or the brick process is restarted. The fix for "glfd" leak is already merged in master - "http://review.gluster.org/#/c/14532/" <<<< The fix is merged in upstream gluster releases and shall be available in RHGS 3.2 release.
The fix is available from glusterfs-3.7.13 version (bug1351877). The work-around is to restart brick process i.e, volume to delete those files under .unlink folder.
Upstream mainline : http://review.gluster.org/14532 Upstream 3.8 : http://review.gluster.org/14820 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
I will consider this hotfix as not approved. Please let me know if you defer from it? We will wait for BZ #1379329 fix as well.
(In reply to Bipin Kunal from comment #17) > I will consider this hotfix as not approved. > > Please let me know if you defer from it? > > We will wait for BZ #1379329 fix as well. Considering the fact that the issue #1379329 is seen for a very specific test related to locks and will not be seen in normal scenarios, i would say we are good with the hotfix even if we defer #1379329 for next release. However, this can be confirmed once we have rca for the issue. @Soumya can give more details on this.
As mentioned by Shashank above, we see a leak only in case of below scenario - 1) lockA is taken on a file 2) either lockA is being upgraded/downgraded with same owner or 3) lockB (with same owner) overlapping with lockA range is issued. A fix for this issue is updated in BZ #1379329 . But please note that this fix is not applicable to current nfs-ganesha upstream codebase 2.4 i.e, to RHGS 3.2 as well. Hence it may be worth to check with the customer if the above mentioned scenarios are applicable to their workload before further evaluating the additional time needed for this fix review and testing required.
Hotfix available at [1] is qe verified as per comments 16 to 23. [1]: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=514294
Considering the bug fix for https://bugzilla.redhat.com/show_bug.cgi?id=1379329 verified the fix in build, glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html