Description of problem: ---------------------- The glusterfs client process is not terminated for every unmount executed on the volume. Version-Release number of selected component (if applicable): ------------------------------------------------------------ 3.3.oqa45 How reproducible: ----------------- Often Steps to Reproduce: --------------------- 1.Create a distribute-replicate volume (3X3) 2.From Node1 and Node2 continuously mount and unmount to the volume with type "fuse/nfs" Actual results: ----------------- [05/31/12 - 05:54:30 root@ARF-Client1 ~]# mount /dev/mapper/vg_dhcp159180-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/vda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/vdb1 on /opt/export type xfs (rw) [05/31/12 - 05:54:33 root@ARF-Client1 ~]# ps -ef | grep gluster root 2968 1 11 05:26 ? 00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 3155 1 11 05:27 ? 00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 3247 1 12 05:28 ? 00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 3340 1 12 05:29 ? 00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 3941 1 15 05:34 ? 00:03:13 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 4753 1 0 05:40 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 4815 4794 0 05:54 pts/0 00:00:00 grep gluster [05/31/12 - 05:54:40 root@ARF-Client1 ~]# Node2:- ------- [05/31/12 - 05:56:17 root@AFR-Client2 ~]# mount /dev/mapper/vg_dhcp159192-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/vda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) [05/31/12 - 05:56:19 root@AFR-Client2 ~]# ps -ef | grep gluster root 13157 1 0 05:24 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 13354 1 0 05:25 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 13827 1 0 05:29 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 13923 1 0 05:30 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 14148 1 0 05:33 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 14543 1 0 05:36 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 14687 1 0 05:37 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/dstore --volfile-server=10.16.159.184 /mnt/gfsc1 root 14782 14759 0 05:56 pts/0 00:00:00 grep gluster Expected results: For every unmount, the glusterfs process should be terminated. Additional info: ------------------ [05/31/12 - 06:13:34 root@AFR-Server1 ~]# gluster v info Volume Name: dstore Type: Distributed-Replicate Volume ID: ebb5f2a8-b35c-4583-855b-65814c5a1b6e Status: Started Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: 10.16.159.184:/export_b1/dir1 Brick2: 10.16.159.188:/export_b1/dir1 Brick3: 10.16.159.196:/export_b1/dir1 Brick4: 10.16.159.184:/export_c1/dir1 Brick5: 10.16.159.188:/export_c1/dir1 Brick6: 10.16.159.196:/export_c1/dir1 Brick7: 10.16.159.184:/export_d1/dir1 Brick8: 10.16.159.188:/export_d1/dir1 Brick9: 10.16.159.196:/export_d1/dir1
interesting.. Shwetha, can you attach to one of this process and see where it is hung? gdb -p <PID>; 'gdb) thread apply all bt full' That will help to corner the issue.
Lets consider 2 nodes. Node1 and Node2.On volume auth.allow is set to Node1. Mount from node1 succeeds. mount from node2 fails and error message is also reported but glusterfs process is started. Steps to recreate the issue:- ---------------------------- [05/31/12 - 08:20:02 root@AFR-Server1 ~]# gluster v create vol1 10.16.159.184:/export11 Creation of volume vol1 has been successful. Please start the volume to access data. [05/31/12 - 08:23:40 root@AFR-Server1 ~]# gluster v set vol1 auth.allow 10.16.159.180 Set volume successful [05/31/12 - 08:23:58 root@AFR-Server1 ~]# gluster v info Volume Name: vol1 Type: Distribute Volume ID: f90a7384-f5d7-4f13-970f-6db6a01afce6 Status: Created Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.16.159.184:/export11 Options Reconfigured: auth.allow: 10.16.159.180 [05/31/12 - 08:28:05 root@AFR-Server1 ~]# gluster v start vol1 Starting volume vol1 has been successful Client1 :- 10.16.159.180 -------------------------- [05/31/12 - 08:28:19 root@ARF-Client1 ~]# mount -t glusterfs 10.16.159.184:/vol1 /mnt/gfsc1 [05/31/12 - 08:28:26 root@ARF-Client1 ~]# ps -ef | grep gluster root 15141 1 0 08:28 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/vol1 --volfile-server=10.16.159.184 /mnt/gfsc1 root 15154 4794 0 08:28 pts/0 00:00:00 grep gluster Client2 :- 10.16.159.192 ------------------------- [05/31/12 - 08:28:33 root@AFR-Client2 ~]# mount -t glusterfs 10.16.159.184:/vol1 /mnt/gfsc1 Mount failed. Please check the log file for more details. [05/31/12 - 08:28:40 root@AFR-Client2 ~]# ps -ef | grep gluster root 23120 1 0 08:28 ? 00:00:00 /usr/local/sbin/glusterfs --volfile-id=/vol1 --volfile-server=10.16.159.184 /mnt/gfsc1 root 23134 14759 0 08:28 pts/0 00:00:00 grep gluster
Can you please attach gdb to any one of these processes and provide the bt? Also a statedump of any one of these processes would help
Shwetha, not happening anymore in upstream testing. Please re-open if seen again.