Bug 1005469

Summary: AFR: fd not closed on one of the brick even after fd opened for writes is closed from mount.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: replicateAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED EOL QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: pkarampu, rhs-bugs, storage-qa-internal, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:22:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description spandura 2013-09-07 10:30:30 UTC
Description of problem:
===========================
On a 1 x 2 replicate volume, even after file is closed from the mount, a fd is still opened on the file from one of the bricks. On other brick the fd is closed. 

Version-Release number of selected component (if applicable):
===============================================================
glusterfs 3.4.0.32rhs built on Sep  6 2013 10:27:55

How reproducible:
====================
Often

Steps to Reproduce:
====================
1. Create a replicate volume (1 x 2). Start the volume. 

2. Create fuse,nfs,cifs mount. 

3. From all the mount points execute the following script: (pass different file names from each mount point)

test_script.sh <filename>
=========================
#!/bin/bash

pwd=`pwd`
filename="${pwd}/$1"
(
	echo "Time before flock : `date`"
	flock -x 200
	echo "Time after flock : `date`"
	echo -e "\nWriting to file : $filename"
	for i in `seq 1 100`; do echo "Hello $i" >&200 ; sleep 1; done
	echo "Time after the writes are successful : `date`"
)200>>$filename

4. while the writes are in progress, set any of the volume options that changes client graph (gluster volume set <vol_name> write-behind off)

5. Immediately interrupt the script with "ctrl-c" on all the mount points. 

Actual results:
===================
Observed a fd which is opened for file writes from fuse mount is not closed on one of the brick. But it is closed on the other brick as soon as the script is interrupted. 

Executed the same test case on files "file4", "file5" and "file7".Following is the output of "ls -l /proc/<brick_pid>/fd" from both the bricks: 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
brick-0 : 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
root@fan [Sep-07-2013-10:22:21] > ls -l /proc/`cat /var/lib/glusterd/vols/vol_dis_1_rep_2/run/fan.lab.eng.blr.redhat.com-rhs-bricks-vol_dis_1_rep_2_b0.pid`/fd
total 0
lr-x------ 1 root root 64 Sep  7 09:34 0 -> /dev/null
l-wx------ 1 root root 64 Sep  7 09:34 1 -> /dev/null
lrwx------ 1 root root 64 Sep  7 09:34 10 -> socket:[3980470]
lr-x------ 1 root root 64 Sep  7 09:34 11 -> /dev/urandom
lr-x------ 1 root root 64 Sep  7 09:34 12 -> /rhs/bricks/vol_dis_1_rep_2_b0
lrwx------ 1 root root 64 Sep  7 09:34 13 -> socket:[3980766]
lrwx------ 1 root root 64 Sep  7 09:34 15 -> socket:[3980829]
lrwx------ 1 root root 64 Sep  7 09:34 16 -> socket:[3980834]
lrwx------ 1 root root 64 Sep  7 09:36 18 -> socket:[4034855]
l-wx------ 1 root root 64 Sep  7 09:34 2 -> /dev/null
lrwx------ 1 root root 64 Sep  7 09:34 22 -> socket:[4037124]
l-wx------ 1 root root 64 Sep  7 09:56 23 -> socket:[4037125]
l-wx------ 1 root root 64 Sep  7 09:56 24 -> /rhs/bricks/vol_dis_1_rep_2_b0/testdir_gluster/file5
lrwx------ 1 root root 64 Sep  7 09:34 3 -> anon_inode:[eventpoll]
l-wx------ 1 root root 64 Sep  7 09:34 4 -> /var/log/glusterfs/bricks/rhs-bricks-vol_dis_1_rep_2_b0.log
lrwx------ 1 root root 64 Sep  7 09:34 5 -> /var/lib/glusterd/vols/vol_dis_1_rep_2/run/fan.lab.eng.blr.redhat.com-rhs-bricks-vol_dis_1_rep_2_b0.pid
lrwx------ 1 root root 64 Sep  7 09:34 6 -> socket:[3980454]
lrwx------ 1 root root 64 Sep  7 09:34 7 -> socket:[3980475]
lrwx------ 1 root root 64 Sep  7 09:34 8 -> socket:[3980463]
lrwx------ 1 root root 64 Sep  7 09:34 9 -> socket:[3980653]


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
brick-1 : 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
root@mia [Sep-07-2013-10:17:53] > ls -l /proc/`cat /var/lib/glusterd/vols/vol_dis_1_rep_2/run/mia.lab.eng.blr.redhat.com-rhs-bricks-vol_dis_1_rep_2_b1.pid`/fd 
total 0
lr-x------ 1 root root 64 Sep  7 09:34 0 -> /dev/null
l-wx------ 1 root root 64 Sep  7 09:34 1 -> /dev/null
lrwx------ 1 root root 64 Sep  7 09:34 10 -> socket:[4026895]
lr-x------ 1 root root 64 Sep  7 09:34 11 -> /dev/urandom
lr-x------ 1 root root 64 Sep  7 09:34 12 -> /rhs/bricks/vol_dis_1_rep_2_b1
lrwx------ 1 root root 64 Sep  7 09:34 13 -> socket:[4065750]
lrwx------ 1 root root 64 Sep  7 09:34 14 -> socket:[4026940]
lrwx------ 1 root root 64 Sep  7 09:34 15 -> socket:[4026941]
lrwx------ 1 root root 64 Sep  7 09:34 16 -> socket:[4026949]
lrwx------ 1 root root 64 Sep  7 09:34 17 -> socket:[4026957]
l-wx------ 1 root root 64 Sep  7 09:34 2 -> /dev/null
l-wx------ 1 root root 64 Sep  7 09:34 20 -> /rhs/bricks/vol_dis_1_rep_2_b1/testdir_gluster/file4
lrwx------ 1 root root 64 Sep  7 09:34 21 -> socket:[4039993]
lrwx------ 1 root root 64 Sep  7 09:34 22 -> /rhs/bricks/vol_dis_1_rep_2_b1/testdir_gluster/file7
l-wx------ 1 root root 64 Sep  7 09:36 24 -> socket:[4078528]
lrwx------ 1 root root 64 Sep  7 09:57 25 -> socket:[4078532]
lrwx------ 1 root root 64 Sep  7 09:34 3 -> anon_inode:[eventpoll]
l-wx------ 1 root root 64 Sep  7 09:34 4 -> /var/log/glusterfs/bricks/rhs-bricks-vol_dis_1_rep_2_b1.log
lrwx------ 1 root root 64 Sep  7 09:34 5 -> /var/lib/glusterd/vols/vol_dis_1_rep_2/run/mia.lab.eng.blr.redhat.com-rhs-bricks-vol_dis_1_rep_2_b1.pid
lrwx------ 1 root root 64 Sep  7 09:34 6 -> socket:[4026879]
l-wx------ 1 root root 64 Sep  7 09:35 7 -> socket:[4078543]
lrwx------ 1 root root 64 Sep  7 09:34 8 -> socket:[4026888]
lrwx------ 1 root root 64 Sep  7 09:34 9 -> socket:[4026926]


Expected results:
==================
fd should be closed on both the bricks. 


Additional info:
===================
root@mia [Sep-07-2013-10:22:34] >gluster v info
 
Volume Name: vol_dis_1_rep_2
Type: Replicate
Volume ID: f5c43519-b5eb-4138-8219-723c064af71c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fan.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b0
Brick2: mia.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b1
Options Reconfigured:
server.allow-insecure: on
performance.stat-prefetch: off
performance.write-behind: off
cluster.self-heal-daemon: on


root@fan [Sep-07-2013-10:26:20] >gluster v status
Status of volume: vol_dis_1_rep_2
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fan.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_
rep_2_b0						49152	Y	15259
Brick mia.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_
rep_2_b1						49152	Y	15997
NFS Server on localhost					2049	Y	15274
Self-heal Daemon on localhost				N/A	Y	15290
NFS Server on mia.lab.eng.blr.redhat.com		2049	Y	15593
Self-heal Daemon on mia.lab.eng.blr.redhat.com		N/A	Y	15599
 
There are no active volume tasks
root@fan [Sep-07-2013-10:26:24] >

Comment 1 spandura 2013-09-07 10:41:50 UTC
SOS Reports and statedumps : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1005469/

++++++++++++++++++++++++++++++++++
Mount process info:
++++++++++++++++++++++++++++++++++
root@darrel [Sep-07-2013-10:37:57] >ps -ef | grep glusterfs
root      2335     1  0 07:35 ?        00:00:02 /usr/sbin/glusterfs --volfile-id=/vol_dis_1_rep_2 --volfile-server=mia /mnt/gm1
root      5526  4502  0 10:38 pts/1    00:00:00 grep glusterfs
root@darrel [Sep-07-2013-10:38:05] >

Comment 3 Vivek Agarwal 2015-12-03 17:22:13 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.