1374166 – [GSS]deleted file from nfs-ganesha export goes in to .glusterfs/unlink in RHGS 3.1.3

Bug 1374166 - [GSS]deleted file from nfs-ganesha export goes in to .glusterfs/unlink in RHGS 3.1.3

Summary: [GSS]deleted file from nfs-ganesha export goes in to .glusterfs/unlink in RHG...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Soumya Koduri
QA Contact:	Arthy Loganathan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1351530
TreeView+	depends on / blocked

Reported:	2016-09-08 07:18 UTC by Riyas Abdulrasak
Modified:	2020-04-15 14:39 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	Bug Fix
Doc Text:	A file descriptor leak meant that a file that was removed from a volume exported using NFS-Ganesha was not removed from the underlying storage. The file descriptor leak has been corrected so that files removed from a mounted volume are also removed from the underlying storage.
Clone Of:
Environment:
Last Closed:	2017-03-23 05:46:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Riyas Abdulrasak 2016-09-08 07:18:55 UTC

Description of problem:

Delete operation doesn't free up the space from brick. Deleted files goes into .glusterfs/unlink directory and continue to occupy space after deletion. 

Tested the scenario with the a replicate and distributed volume and it is always reproducible. 

Version-Release number of selected component (if applicable):

RHGS 3.1.3
glusterfs-server-3.7.9-12.el7rhgs.x86_64
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
nfs-ganesha-2.3.1-8.el7rhgs.x86_64
glusterfs-ganesha-3.7.9-12.el7rhgs.x86_64


How reproducible:
Always. 

Steps to Reproduce:
1. Setup a RHGS 3.1.3 cluster
2. Create a volume and and export it using nfs-ganesha
3. Mount the share from the client using nfs
4. Create some random files(with size 1GB or so. Easily noticeable after deletion)
5. Delete the file and check the .glusterfs/unlink directory 

Actual results:

Files are not getting deleted permanently. 

Client side
------------
[root@dhcp7-83 nfsclient1]# ll
total 0
[root@dhcp7-83 nfsclient1]# df -hT .
Filesystem           Type  Size  Used Avail Use% Mounted on
10.65.7.20:/testgnsha2
                     nfs   1.9G  1.1G  906M  54% /nfsclient1


Server Brick directory
------------

[root@dhcp7-24 unlink]# pwd
/brick2/brick2/.glusterfs/unlink
[root@dhcp7-24 unlink]# ll -h
total 1001M
-rw-r--r--. 1 root root 1000M Sep  8 12:33 67b3055f-2fbf-47b0-893a-4f6b7d8f087c


Expected results:

Files should be removed completely from the volume after a delete operation. 

Additional info:

Reproducible with both the below volumes
Volume Name: testgnsha1
Type: Distribute
Volume ID: a1b4bb75-5838-4a5f-8c7b-a691eafcbff1
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: dhcp7-24:/brick/brick1
Brick2: dhcp7-23:/brick/brick1
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: testgnsha2
Type: Replicate
Volume ID: 3d9b95d4-24f6-4bd7-9589-80a31b50fadd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dhcp7-24:/brick2/brick2
Brick2: dhcp7-23:/brick2/brick2
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

Comment 3 Soumya Koduri 2016-09-09 06:17:42 UTC

This issue was reported in gluster-devel too. The details are mentioned in the below mail thread -

http://www.gluster.org/pipermail/gluster-devel/2016-July/049954.html

>>>>
There was an fd leak when a file is created using gfapi handleops (which
NFS-Ganesha uses) and FWIU, if there is an open fd, glusterfs-server
moves the file being removed to ".glusterfs/unlink" folder unless its
inode entry gets purged when the inode table which it maintains gets
full or the brick process is restarted.

The fix for "glfd" leak is already merged in master -
"http://review.gluster.org/#/c/14532/"
<<<<

The fix is merged in upstream gluster releases and shall be available in RHGS 3.2 release.

Comment 4 Soumya Koduri 2016-09-14 09:05:48 UTC

The fix is available from glusterfs-3.7.13 version (bug1351877). The work-around is to restart brick process i.e, volume to delete those files under .unlink folder.

Comment 6 Atin Mukherjee 2016-09-19 13:13:03 UTC

Upstream mainline : http://review.gluster.org/14532
Upstream 3.8 : http://review.gluster.org/14820

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 17 Bipin Kunal 2016-09-26 13:53:21 UTC

I will consider this hotfix as not approved.

Please let me know if you defer from it?

We will wait for BZ #1379329 fix as well.

Comment 18 Shashank Raj 2016-09-26 15:07:55 UTC

(In reply to Bipin Kunal from comment #17)
> I will consider this hotfix as not approved.
> 
> Please let me know if you defer from it?
> 
> We will wait for BZ #1379329 fix as well.

Considering the fact that the issue #1379329 is seen for a very specific test related to locks and will not be seen in normal scenarios, i would say we are good with the hotfix even if we defer #1379329 for next release.

However, this can be confirmed once we have rca for the issue.

@Soumya can give more details on this.

Comment 19 Soumya Koduri 2016-09-27 10:47:51 UTC

As mentioned by Shashank above, we see a leak only in case of below scenario -

1) lockA is taken on a file
2) either lockA is being upgraded/downgraded with same owner or 
3) lockB (with same owner) overlapping with lockA range is issued.

A fix for this issue is updated in BZ #1379329 . But please note that this fix is not applicable to current nfs-ganesha upstream codebase 2.4 i.e, to RHGS 3.2 as well. Hence it may be worth to check with the customer if the above mentioned scenarios are applicable to their workload before further evaluating the additional time needed for this fix review and testing required.

Comment 25 Shashank Raj 2016-09-29 11:39:22 UTC

Hotfix available at [1] is qe verified as per comments 16 to 23. 

[1]: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=514294

Comment 29 Arthy Loganathan 2016-11-29 10:04:49 UTC

Considering the bug fix for https://bugzilla.redhat.com/show_bug.cgi?id=1379329

verified the fix in build,
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64

Comment 38 errata-xmlrpc 2017-03-23 05:46:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.