991021 – server: fds are not closed even after the file is reopened

Bug 991021 - server: fds are not closed even after the file is reopened

Summary: server: fds are not closed even after the file is reopened

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	993247
TreeView+	depends on / blocked

Reported:	2013-08-01 12:17 UTC by spandura
Modified:	2013-09-23 22:35 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.4.0.30rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	993247 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:35:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description spandura 2013-08-01 12:17:26 UTC

Description of problem:
=======================
In a replicate volume ( 1 x 2 ) a brick is replaced by bringing the brick process offline, un-mounting , formatting , remounting the brick directory and bringing the brick online. "heal full" is triggered on the volume to self-heal the files/dirs. Heal is successfully completed. 

From mount point when we try to write data to a file , the  write succeeds on 
both the bricks. 

( writes on replaced brick are performed by anonymous fds until the file is reopened on the replaced brick. The file is reopened only after 1024 op's on the file on the replaced brick. Refer to patch http://review.gluster.org/#/c/4358/ )

Even after the successful reopen of file, the anonymous fd is not closed. 

This bug is found while testing the bug 853684

Version-Release number of selected component (if applicable):
=============================================================
root@king [Aug-01-2013-17:41:35] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.14rhs-1.el6rhs.x86_64

root@king [Aug-01-2013-17:41:41] >gluster --version
glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36


How reproducible:
==================
Often

Steps to Reproduce:
===================
1. Create replica volume 1 x 2

2. Start the volume

3. Create a fuse mount

4. From fuse mount execute : "exec 5>>test_file" ( to close the fd use : exec 5>>&- ) 

5. Kill all gluster process on storage_node1 (killall glusterfs glusterfsd glusterd)

6. Get the extended attribute of the brick1 directory on storage_node1 (getfattr -d -e hex -m . <path_to_brick1>)

7. Remove the brick1 directory on storage_node1(rm -rf <path_to_brick1>)

8. Create the brick1 directory on storage_node1(mkdir <path_to_brick1>)

9. Set the extended attribute "trusted.glusterfs.volume-id" to the value captured at step 7 for the brick1 on storage_node1. 

10. Start glusterd on storage_node1. (service glusterd start)

11. Execute: "gluster volume heal <volume_name> full" from any of the storage_node. This will self-heal the file "test_file" from brick0 to brick1

12. From mount point execute: for i in `seq 1 1024` ; do echo "Hello World" >&5" ; done

13. ls -l /proc/<brick_pid>/fd on both the bricks

Actual results:
==================
anonymous fd is still open brick1. 

Storage_node1 output:
=======================
root@king [Aug-01-2013-17:43:42] >ls -liht1 --full-time /proc/22818/fd
total 0
3187540 lr-x------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 0 -> /dev/null
3187541 l-wx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 1 -> /dev/null
3187550 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 10 -> socket:[3185619]
3187551 lr-x------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 11 -> /dev/urandom
3187552 lr-x------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 12 -> /rhs/bricks/b0
3187553 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 13 -> socket:[3255495]
3187554 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 14 -> socket:[3264738]
3187555 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 15 -> socket:[3264740]
3187556 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 16 -> socket:[3220066]
3187557 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 17 -> /rhs/bricks/b0/test_file
3187542 l-wx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 2 -> /dev/null
3187543 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 3 -> anon_inode:[eventpoll]
3187544 l-wx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 4 -> /var/log/glusterfs/bricks/rhs-bricks-b0.log
3187545 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 5 -> /var/lib/glusterd/vols/vol_rep/run/king-rhs-bricks-b0.pid
3187546 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 6 -> socket:[3185603]
3187547 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 7 -> socket:[3185630]
3187548 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 8 -> socket:[3185612]
3187549 lrwx------ 1 root root 64 2013-08-01 12:31:53.081000682 +0530 9 -> socket:[3255396]

Storage_node2 output:
=====================
root@hicks [Aug-01-2013-17:43:51] > ls -liht1 --full-time /proc/26126/fd
total 0
3523876 lrwx------ 1 root root 64 2013-08-01 16:49:46.069001405 +0530 17 -> /rhs/bricks/b1/.glusterfs/23/47/23473c17-8776-43f4-9ee3-9a26e3a6c982
3523877 lrwx------ 1 root root 64 2013-08-01 16:49:46.069001405 +0530 18 -> /rhs/bricks/b1/test_file
3523417 lr-x------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 0 -> /dev/null
3523418 l-wx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 1 -> /dev/null
3523427 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 10 -> socket:[3523043]
3523428 lr-x------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 11 -> /dev/urandom
3523429 lr-x------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 12 -> /rhs/bricks/b1
3523430 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 13 -> socket:[3523262]
3523431 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 14 -> socket:[3523263]
3523432 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 15 -> socket:[3523265]
3523433 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 16 -> socket:[3523289]
3523419 l-wx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 2 -> /dev/null
3523420 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 3 -> anon_inode:[eventpoll]
3523421 l-wx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 4 -> /var/log/glusterfs/bricks/rhs-bricks-b1.log
3523422 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 5 -> /var/lib/glusterd/vols/vol_rep/run/hicks-rhs-bricks-b1.pid
3523423 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 6 -> socket:[3522950]
3523424 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 7 -> socket:[3523122]
3523425 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 8 -> socket:[3522986]
3523426 lrwx------ 1 root root 64 2013-08-01 16:48:26.779001162 +0530 9 -> socket:[3523247]

root@king [Aug-01-2013-17:45:15] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick king:/rhs/bricks/b0				49152	Y	22818
Brick hicks:/rhs/bricks/b1				49152	Y	26126
NFS Server on localhost					2049	Y	26570
Self-heal Daemon on localhost				N/A	Y	26580
NFS Server on hicks					2049	Y	26135
Self-heal Daemon on hicks				N/A	Y	26139

root@king [Aug-01-2013-17:45:17] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: c449b61f-f57d-4114-ac22-777d9d7f8e44
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: king:/rhs/bricks/b0
Brick2: hicks:/rhs/bricks/b1
Options Reconfigured:
cluster.self-heal-daemon: on
root@king [Aug-01-2013-17:45:35] 

Expected results:
=================
anonymous fd's should be closed.

Comment 2 Pranith Kumar K 2013-08-05 17:00:00 UTC

    Problem:
    Client xlator issues finodelk using anon-fd when the fd is not
    opened on the file. This can also happen between attempts to re-open
    the file after client disconnects. It can so happen that lock is taken
    using anon-fd and the file is now re-opened and unlock would come with
    re-opened fd. This will lead to leak in lk-table entry, which also
    holds reference to fd which leads to fd-leak on the brick.
    
    Fix:
    Don't check for fds to be equal for tracking finodelks.
    Since inodelk is identified by (gfid, connection, lk-owner).
    Fd equality is not needed.

Comment 3 Pranith Kumar K 2013-08-21 09:39:41 UTC

https://code.engineering.redhat.com/gerrit/11684

Comment 4 spandura 2013-09-10 10:53:56 UTC

Verified the fix on build:-
=========================
glusterfs 3.4.0.33rhs built on Sep  8 2013 13:20:26

Bug is fixed. Moving bug to verified state.

Comment 5 Scott Haines 2013-09-23 22:35:58 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.