Bug 854656

Summary:	glusterfs process not terminated after unmount
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Vidya Sakar <vinaraya>
Component:	glusterfs	Assignee:	Amar Tumballi <amarts>
Status:	CLOSED ERRATA	QA Contact:	spandura
Severity:	high	Docs Contact:
Priority:	medium
Version:	2.0	CC:	gluster-bugs, rfortier, rhs-bugs, sdharane, shwetha.h.panduranga, vbellur, vraman
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	826975	Environment:
Last Closed:	2013-09-23 22:33:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	826975
Bug Blocks:

Description Vidya Sakar 2012-09-05 13:49:10 UTC

+++ This bug was initially created as a clone of Bug #826975 +++

Description of problem:
----------------------
The glusterfs client process is not terminated for every unmount executed on the volume. 

Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.oqa45

How reproducible:
-----------------
Often

Steps to Reproduce:
---------------------
1.Create a distribute-replicate volume (3X3)

2.From Node1 and Node2 continuously mount and unmount to the volume with type "fuse/nfs"

Actual results:
-----------------
[05/31/12 - 05:54:30 root@ARF-Client1 ~]# mount
/dev/mapper/vg_dhcp159180-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/vda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/vdb1 on /opt/export type xfs (rw)

[05/31/12 - 05:54:33 root@ARF-Client1 ~]# ps -ef | grep gluster
root      2968     1 11 05:26 ?        00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root      3155     1 11 05:27 ?        00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root      3247     1 12 05:28 ?        00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root      3340     1 12 05:29 ?        00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root      3941     1 15 05:34 ?        00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root      4753     1  0 05:40 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root      4815  4794  0 05:54 pts/0    00:00:00 grep gluster
[05/31/12 - 05:54:40 root@ARF-Client1 ~]# 

Node2:-
-------
[05/31/12 - 05:56:17 root@AFR-Client2 ~]# mount
/dev/mapper/vg_dhcp159192-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/vda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[05/31/12 - 05:56:19 root@AFR-Client2 ~]# ps -ef | grep gluster
root     13157     1  0 05:24 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root     13354     1  0 05:25 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root     13827     1  0 05:29 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root     13923     1  0 05:30 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root     14148     1  0 05:33 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root     14543     1  0 05:36 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root     14687     1  0 05:37 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1
root     14782 14759  0 05:56 pts/0    00:00:00 grep gluster

Expected results:
For every unmount, the glusterfs process should be terminated. 

Additional info:
------------------

[05/31/12 - 06:13:34 root@AFR-Server1 ~]# gluster v info
 
Volume Name: dstore
Type: Distributed-Replicate
Volume ID: ebb5f2a8-b35c-4583-855b-65814c5a1b6e
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 10.16.159.184:/export_b1/dir1
Brick2: 10.16.159.188:/export_b1/dir1
Brick3: 10.16.159.196:/export_b1/dir1
Brick4: 10.16.159.184:/export_c1/dir1
Brick5: 10.16.159.188:/export_c1/dir1
Brick6: 10.16.159.196:/export_c1/dir1
Brick7: 10.16.159.184:/export_d1/dir1
Brick8: 10.16.159.188:/export_d1/dir1
Brick9: 10.16.159.196:/export_d1/dir1

--- Additional comment from amarts on 2012-05-31 08:15:56 EDT ---

interesting.. Shwetha, can you attach to one of this process and see where it is hung? 

gdb -p <PID>; 'gdb) thread apply all bt full'

That will help to corner the issue.

--- Additional comment from shwetha.h.panduranga on 2012-05-31 08:35:04 EDT ---

Lets consider 2 nodes. Node1 and Node2.On volume auth.allow is set to Node1. 

Mount from node1 succeeds. mount from node2 fails and error message is also reported but glusterfs process is started.

 Steps to recreate the issue:-
----------------------------

[05/31/12 - 08:20:02 root@AFR-Server1 ~]# gluster v create vol1 10.16.159.184:/export11
Creation of volume vol1 has been successful. Please start the volume to access data.

[05/31/12 - 08:23:40 root@AFR-Server1 ~]# gluster v set vol1 auth.allow 10.16.159.180
Set volume successful

[05/31/12 - 08:23:58 root@AFR-Server1 ~]# gluster v info
 
Volume Name: vol1
Type: Distribute
Volume ID: f90a7384-f5d7-4f13-970f-6db6a01afce6
Status: Created
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.16.159.184:/export11
Options Reconfigured:
auth.allow: 10.16.159.180

[05/31/12 - 08:28:05 root@AFR-Server1 ~]# gluster v start vol1
Starting volume vol1 has been successful

Client1 :- 10.16.159.180
--------------------------
[05/31/12 - 08:28:19 root@ARF-Client1 ~]# mount -t glusterfs 10.16.159.184:/vol1 /mnt/gfsc1
[05/31/12 - 08:28:26 root@ARF-Client1 ~]# ps -ef | grep gluster
root     15141     1  0 08:28 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/vol1 --volfile-server=10.16.159.184 /mnt/gfsc1
root     15154  4794  0 08:28 pts/0    00:00:00 grep gluster


Client2 :- 10.16.159.192
-------------------------
[05/31/12 - 08:28:33 root@AFR-Client2 ~]# mount -t glusterfs 10.16.159.184:/vol1 /mnt/gfsc1
Mount failed. Please check the log file for more details.
[05/31/12 - 08:28:40 root@AFR-Client2 ~]# ps -ef | grep gluster
root     23120     1  0 08:28 ?        00:00:00 /usr/local/sbin/glusterfs --volfile-id=/vol1 --volfile-server=10.16.159.184 /mnt/gfsc1
root     23134 14759  0 08:28 pts/0    00:00:00 grep gluster

--- Additional comment from sgowda on 2012-07-06 01:42:14 EDT ---

Can you please attach gdb to any one of these processes and provide the bt? Also a statedump of any one of these processes would help

Comment 4 spandura 2013-07-09 11:14:50 UTC

Verified whether the bug exist on the build:

root@king [Jul-09-2013-16:43:15] >rpm -qa | grep glusterfs
glusterfs-fuse-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-devel-3.4.0.12rhs.beta3-1.el6rhs.x86_64


root@king [Jul-09-2013-16:43:16] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18


This issue is not happening any more.

Comment 5 Scott Haines 2013-09-23 22:33:19 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html