Description of problem: ======================= In a replicate volume ( 1 x 2 ) a brick is replaced by bringing the brick process offline, un-mounting , formatting , remounting the brick directory and bringing the brick online. "heal full" is triggered on the volume to self-heal the files/dirs. Heal is successfully completed. From mount point when we try to write data to a file , the write succeeds on both the bricks. ( writes on replaced brick are performed by anonymous fds until the file is reopened on the replaced brick. The file is reopened only after 1024 op's on the file on the replaced brick. Refer to patch http://review.gluster.org/#/c/4358/ ) Even after the successful reopen of file, the anonymous fd is not closed. This bug is found while testing the bug 853684 Version-Release number of selected component (if applicable): ============================================================= root@king [Aug-01-2013-17:41:35] >rpm -qa | grep glusterfs-server glusterfs-server-3.4.0.14rhs-1.el6rhs.x86_64 root@king [Aug-01-2013-17:41:41] >gluster --version glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36 How reproducible: ================== Often Steps to Reproduce: =================== 1. Create replica volume 1 x 2 2. Start the volume 3. Create a fuse mount 4. From fuse mount execute : "exec 5>>test_file" ( to close the fd use : exec 5>>&- ) 5. Kill all gluster process on storage_node1 (killall glusterfs glusterfsd glusterd) 6. Get the extended attribute of the brick1 directory on storage_node1 (getfattr -d -e hex -m . <path_to_brick1>) 7. Remove the brick1 directory on storage_node1(rm -rf <path_to_brick1>) 8. Create the brick1 directory on storage_node1(mkdir <path_to_brick1>) 9. Set the extended attribute "trusted.glusterfs.volume-id" to the value captured at step 7 for the brick1 on storage_node1. 10. Start glusterd on storage_node1. (service glusterd start) 11. Execute: "gluster volume heal <volume_name> full" from any of the storage_node. This will self-heal the file "test_file" from brick0 to brick1 12. From mount point execute: for i in `seq 1 1024` ; do echo "Hello World" >&5" ; done 13. ls -l /proc/<brick_pid>/fd on both the bricks Actual results: ================== anonymous fd is still open brick1. Storage_node1 output: ======================= root@king [Aug-01-2013-17:43:42] >ls -liht1 --full-time /proc/22818/fd total 0 3187540 lr-x------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 0 -> /dev/null 3187541 l-wx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 1 -> /dev/null 3187550 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 10 -> socket:[3185619] 3187551 lr-x------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 11 -> /dev/urandom 3187552 lr-x------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 12 -> /rhs/bricks/b0 3187553 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 13 -> socket:[3255495] 3187554 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 14 -> socket:[3264738] 3187555 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 15 -> socket:[3264740] 3187556 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 16 -> socket:[3220066] 3187557 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 17 -> /rhs/bricks/b0/test_file 3187542 l-wx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 2 -> /dev/null 3187543 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 3 -> anon_inode:[eventpoll] 3187544 l-wx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 4 -> /var/log/glusterfs/bricks/rhs-bricks-b0.log 3187545 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 5 -> /var/lib/glusterd/vols/vol_rep/run/king-rhs-bricks-b0.pid 3187546 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 6 -> socket:[3185603] 3187547 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 7 -> socket:[3185630] 3187548 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 8 -> socket:[3185612] 3187549 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 9 -> socket:[3255396] Storage_node2 output: ===================== root@hicks [Aug-01-2013-17:43:51] > ls -liht1 --full-time /proc/26126/fd total 0 3523876 lrwx------ 1 root root 64 2013-08-01 16:49:46.069001405 +0530 17 -> /rhs/bricks/b1/.glusterfs/23/47/23473c17-8776-43f4-9ee3-9a26e3a6c982 3523877 lrwx------ 1 root root 64 2013-08-01 16:49:46.069001405 +0530 18 -> /rhs/bricks/b1/test_file 3523417 lr-x------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 0 -> /dev/null 3523418 l-wx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 1 -> /dev/null 3523427 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 10 -> socket:[3523043] 3523428 lr-x------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 11 -> /dev/urandom 3523429 lr-x------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 12 -> /rhs/bricks/b1 3523430 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 13 -> socket:[3523262] 3523431 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 14 -> socket:[3523263] 3523432 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 15 -> socket:[3523265] 3523433 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 16 -> socket:[3523289] 3523419 l-wx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 2 -> /dev/null 3523420 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 3 -> anon_inode:[eventpoll] 3523421 l-wx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 4 -> /var/log/glusterfs/bricks/rhs-bricks-b1.log 3523422 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 5 -> /var/lib/glusterd/vols/vol_rep/run/hicks-rhs-bricks-b1.pid 3523423 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 6 -> socket:[3522950] 3523424 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 7 -> socket:[3523122] 3523425 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 8 -> socket:[3522986] 3523426 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 9 -> socket:[3523247] root@king [Aug-01-2013-17:45:15] >gluster v status Status of volume: vol_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick king:/rhs/bricks/b0 49152 Y 22818 Brick hicks:/rhs/bricks/b1 49152 Y 26126 NFS Server on localhost 2049 Y 26570 Self-heal Daemon on localhost N/A Y 26580 NFS Server on hicks 2049 Y 26135 Self-heal Daemon on hicks N/A Y 26139 root@king [Aug-01-2013-17:45:17] >gluster v info Volume Name: vol_rep Type: Replicate Volume ID: c449b61f-f57d-4114-ac22-777d9d7f8e44 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: king:/rhs/bricks/b0 Brick2: hicks:/rhs/bricks/b1 Options Reconfigured: cluster.self-heal-daemon: on root@king [Aug-01-2013-17:45:35] Expected results: ================= anonymous fd's should be closed.
Problem: Client xlator issues finodelk using anon-fd when the fd is not opened on the file. This can also happen between attempts to re-open the file after client disconnects. It can so happen that lock is taken using anon-fd and the file is now re-opened and unlock would come with re-opened fd. This will lead to leak in lk-table entry, which also holds reference to fd which leads to fd-leak on the brick. Fix: Don't check for fds to be equal for tracking finodelks. Since inodelk is identified by (gfid, connection, lk-owner). Fd equality is not needed.
https://code.engineering.redhat.com/gerrit/11684
Verified the fix on build:- ========================= glusterfs 3.4.0.33rhs built on Sep 8 2013 13:20:26 Bug is fixed. Moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html