Description of problem: glusterfsd keeping fd open in index xlator after stop the volume Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Enable brick_mux 2.Create 100 volumes(test1..test100) (1x3) environment 3.Start all the volumes 4.Stop volumes test2..test100 5.After stop the volume check in proc for brick_pid ls -lrth /proc/<brick_pid>/fd | grep ".glusterfs" Actual results: After stop the volume proc is showing .glusterfs is still consumed for a brick that is already stopped Expected results: No internal directory should be consumed for a stopped brick Additional info:
upstream patch : https://review.gluster.org/21235
Steps to re-create the issue: There seems to be a race where by the time fd_destroy() is called, the graph is already cleaned up. Because of this, the fds are not closed because xlator_release()/xlator_releasedir() functions don't get called. I was able to consistently re-create the issue with the following change: 18:46:22 :) ⚡ git diff diff --git a/xlators/protocol/server/src/server-helpers.c b/xlators/protocol/server/src/server-helpers.c index c492ab164..29af4a946 100644 --- a/xlators/protocol/server/src/server-helpers.c +++ b/xlators/protocol/server/src/server-helpers.c @@ -249,6 +249,7 @@ server_connection_cleanup_flush_cbk (call_frame_t *frame, void *cookie, fd = frame->local; client = frame->root->client; + sleep (5); fd_unref (fd); frame->local = NULL; Steps: 1) start glusterd and set brick-mux to on 2) Create 2 plain replicate volumes and set open-behind off on the volume 3) Mount one of the volumes and on the mount execute "exec >a" 4) confirm that the file is opened on the bricks 5) execute "gluster volume stop <volname>" 6) Wait for a minute just to be on safer side and check "ls /proc/<pid-of-brick>/fd" It shows the file 'a'