Bug 1631356

Summary: glusterfsd keeping fd open in index xlator after stop the volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Mohit Agrawal <moagrawa>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED WONTFIX QA Contact: Rahul Hinduja <rhinduja>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.4CC: amukherj, apaladug, ndevos, pkarampu, rcyriac, rhs-bugs, sanandpa, sankarshan, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1631357 1631372 (view as bug list) Environment:
Last Closed: 2018-10-04 12:31:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1631357    
Bug Blocks:    

Description Mohit Agrawal 2018-09-20 12:05:31 UTC
Description of problem:
glusterfsd keeping fd open in index xlator after stop the volume

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Enable brick_mux 
2.Create 100 volumes(test1..test100) (1x3) environment
3.Start all the volumes
4.Stop volumes test2..test100
5.After stop the volume check in proc for brick_pid
  ls -lrth /proc/<brick_pid>/fd | grep ".glusterfs"

Actual results:
After stop the volume proc is showing .glusterfs is still consumed
for a brick that is already stopped

Expected results:
No internal directory should be consumed for a stopped brick 

Additional info:

Comment 2 Atin Mukherjee 2018-09-20 12:40:15 UTC
upstream patch : https://review.gluster.org/21235

Comment 5 Pranith Kumar K 2018-09-24 13:28:53 UTC
Steps to re-create the issue:

There seems to be a race where by the time fd_destroy() is called, the graph is already cleaned up. Because of this, the fds are not closed because xlator_release()/xlator_releasedir() functions don't get called.

I was able to consistently re-create the issue with the following change:

18:46:22 :) ⚡ git diff
diff --git a/xlators/protocol/server/src/server-helpers.c b/xlators/protocol/server/src/server-helpers.c
index c492ab164..29af4a946 100644
--- a/xlators/protocol/server/src/server-helpers.c
+++ b/xlators/protocol/server/src/server-helpers.c
@@ -249,6 +249,7 @@ server_connection_cleanup_flush_cbk (call_frame_t *frame, void *cookie,
         fd = frame->local;
         client = frame->root->client;
 
+        sleep (5);
         fd_unref (fd);
         frame->local = NULL;

Steps:
1) start glusterd and set brick-mux to on
2) Create 2 plain replicate volumes and set open-behind off on the volume
3) Mount one of the volumes and on the mount execute "exec >a"
4) confirm that the file is opened on the bricks
5) execute "gluster volume stop <volname>" 
6) Wait for a minute just to be on safer side and check "ls /proc/<pid-of-brick>/fd" It shows the file 'a'