Description of problem: ------------------------ On Stress Setup,I was doing MTSH + lots of writes on EC over Ganesha mounts(v3/v4). Almost after 10 hours of workload,one one of the clients,all new writes started to fail and the FS became read only all of a sudden. [root@gqac025 /]# mount -t nfs -o vers=4 192.168.97.121://testvol E [root@gqac025 /]# cd E [root@gqac025 E]# touch G touch: cannot touch ‘G’: Read-only file system [root@gqac025 E]# Mount shows RW access : 192.168.97.121://testvol on /E type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.97.185,local_lock=none,addr=192.168.97.121) Version-Release number of selected component (if applicable): ------------------------------------------------------------- nfs-ganesha-2.4.1-6.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-12.el7rhgs.x86_64 How reproducible: ----------------- 1/1 Actual results: --------------- FS became read only. Expected results: ---------------- Additional info: --------------- Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks1/brick1 49158 0 Y 29725 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks1/brick 49164 0 Y 24750 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks1/brick 49164 0 Y 24867 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks1/brick 49164 0 Y 25931 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks3/brick 49165 0 Y 25201 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks3/brick 49165 0 Y 24769 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks2/brick1 49153 0 Y 29771 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks2/brick 49166 0 Y 24788 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks2/brick 49165 0 Y 24886 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks2/brick 49165 0 Y 25950 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks3/brick 49166 0 Y 24905 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks3/brick 49166 0 Y 25969 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks4/brick1 49154 0 Y 29827 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks4/brick 49167 0 Y 24807 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks4/brick 49167 0 Y 25988 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks4/brick 49167 0 Y 24924 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks5/brick 49168 0 Y 25258 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks5/brick 49168 0 Y 24826 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks6/brick 49169 0 Y 25277 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks6/brick 49169 0 Y 24845 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks6/brick 49168 0 Y 26007 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks6/brick 49168 0 Y 24943 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks5/brick 49169 0 Y 24962 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks5/brick 49169 0 Y 26026 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks7/brick1 49155 0 Y 29909 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks7/brick 49170 0 Y 24864 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks7/brick 49170 0 Y 26045 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks7/brick 49170 0 Y 24981 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks8/brick 49171 0 Y 24883 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks8/brick 49171 0 Y 25315 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks9/brick 49172 0 Y 25336 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks9/brick 49172 0 Y 24902 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks9/brick 49171 0 Y 26064 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks9/brick 49171 0 Y 25000 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks8/brick 49172 0 Y 25019 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks8/brick 49172 0 Y 26083 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks10/brick 49173 0 Y 25355 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks10/brick 49173 0 Y 24921 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks10/brick 49173 0 Y 26102 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks10/brick 49173 0 Y 25038 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks11/brick1 49156 0 Y 30009 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks11/brick 49174 0 Y 24940 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks12/brick 49175 0 Y 25393 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks12/brick 49175 0 Y 24959 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks12/brick 49174 0 Y 26121 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks12/brick 49174 0 Y 25057 Brick gqas014.sbu.lab.eng.bos.redhat.com:/b ricks11/brick 49175 0 Y 25076 Brick gqas015.sbu.lab.eng.bos.redhat.com:/b ricks11/brick 49175 0 Y 26140 Self-heal Daemon on localhost N/A N/A Y 17960 Self-heal Daemon on gqas010.sbu.lab.eng.bos .redhat.com N/A N/A Y 13756 Self-heal Daemon on gqas015.sbu.lab.eng.bos .redhat.com N/A N/A Y 17415 Self-heal Daemon on gqas014.sbu.lab.eng.bos .redhat.com N/A N/A Y 17200 Task Status of Volume testvol ------------------------------------------------------------------------------ Task : Rebalance ID : 60d3e3e4-661c-4520-9f5f-482d95d81a82 Status : in progress [root@gqas009 ~]#
Setup is in same state in case someone wants to take a look.
The application side is impacted again with this. Setting blocker flag.
On server node - [root@gqas009 ~]# showmount -e Export list for gqas009.sbu.lab.eng.bos.redhat.com: /replicate (everyone) On the client - 192.168.97.121://testvol on /E type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.97.185,local_lock=none,addr=192.168.97.121) The volume which was mounted on the client-side is not exported. Hence because of NFSv4 pseudoFS behaviour, mount is still accessible but with read-only access. NFS-Ganesha servers on other nodes seem to be still exporting testvol. [root@gqas014 ~]# showmount -e Export list for gqas014.sbu.lab.eng.bos.redhat.com: /testvol (everyone)
Soumya, What would cause an unexport on only one node? Is it coz of the same rpc issue/network disconnects( as in https://bugzilla.redhat.com/show_bug.cgi?id=1411219) that I am hitting here?
Could be. Not sure why the volume got unexported. Its not admin intervention for sure as there is still UP_THREAD running for that volume. (gdb) f 1 #1 0x00007f0275198add in GLUSTERFSAL_UP_Thread (Arg=0x7f0270006e50) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:152 152 rc = glfs_h_poll_upcall(glfsexport->gl_fs, &cbk); (gdb) p *glfsexport $5 = {gl_fs = 0x7f0270007d40, mount_path = 0x7f0270002c90 "/testvol", export_path = 0x7f0270006dc0 "/", saveduid = 0, savedgid = 0, export = {exports = { next = 0x7f02753a13a0 <GlusterFS+128>, prev = 0x7f0270007a50}, fsal = 0x7f02753a1390 <GlusterFS+112>, up_ops = 0x7f02700d23a0, exp_ops = { get_name = 0x7f03020ef5d0 <get_name>, unexport = 0x7f03020ef5e0 <export_unexport>, release = 0x7f0275192680 <export_release>, lookup_path = 0x7f02751927e0 <lookup_path>, lookup_junction = 0x7f03020ef600 <lookup_junction>, extract_handle = 0x7f0275192600 <extract_handle>, create_handle = 0x7f0275192440 <create_handle>, get_fs_dynamic_info = 0x7f02751922c0 <get_dynamic_info>, fs_supports = 0x7f02751922a0 <fs_supports>, fs_maxfilesize = 0x7f0275192280 <fs_maxfilesize>, fs_maxread = 0x7f0275192260 <fs_maxread>, fs_maxwrite = 0x7f0275192240 <fs_maxwrite>, fs_maxlink = 0x7f0275192220 <fs_maxlink>, fs_maxnamelen = 0x7f0275192200 <fs_maxnamelen>, fs_maxpathlen = 0x7f02751921e0 <fs_maxpathlen>, fs_lease_time = 0x7f02751921c0 <fs_lease_time>, fs_acl_support = 0x7f02751921a0 <fs_acl_support>, fs_supported_attrs = 0x7f0275192170 <fs_supported_attrs>, fs_umask = 0x7f0275192150 <fs_umask>, fs_xattr_access_rights = 0x7f0275192130 <fs_xattr_access_rights>, check_quota = 0x7f03020ef6d0 <check_quota>, get_quota = 0x7f03020ef9f0 <get_quota>, set_quota = 0x7f03020ef990 <set_quota>, getdevicelist = 0x7f03020ef6e0 <getdevicelist>, fs_layouttypes = 0x7f03020ef6f0 <fs_layouttypes>, fs_layout_blocksize = 0x7f03020ef700 <fs_layout_blocksize>, fs_maximum_segments = 0x7f03020ef710 <fs_maximum_segments>, fs_loc_body_size = 0x7f03020ef720 <fs_loc_body_size>, get_write_verifier = 0x7f03020ef860 <global_verifier>, alloc_state = 0x7f0275192af0 <glusterfs_alloc_state>, free_state = 0x7f03020f1430 <free_state>}, sub_export = 0x0, super_export = 0x7f02700d2270}, acl_enable = false, pnfs_ds_enabled = true, pnfs_mds_enabled = false, destroy_mode = 0 '\000', up_thread = 139647710433920} (gdb) c Continuing. 22/01/2017 00:03:07 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvol exported at : '/' 22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] posix2fsal_error :FSAL :CRIT :Mapping 77(default) to ERR_FSAL_SERVERFAULT 22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with File descriptor in bad state 22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Undefined server error 22/01/2017 05:13:11 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume replicate exported at : '/' [2017-01-23 03:09:18.811044] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-7: Mismatching GFID's in loc [2017-01-23 03:09:18.811110] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 10-testvol-dht: Found anomalies in /file_srcdir/gqac010.sbu.lab.eng.bos.redhat.com/thrd_04/d_002/d_001/d_008 (gfid = 6d71d7bc-00b7-4d4e-ac82-347519ea31e6). Holes=1 overlaps=0 [2017-01-23 03:09:18.811155] W [MSGID: 109005] [dht-selfheal.c:2116:dht_selfheal_directory] 10-testvol-dht: Directory selfheal failed : 1 subvolumes have unrecoverable errors. path = /file_srcdir/gqac010.sbu.lab.eng.bos.redhat.com/thrd_04/d_002/d_001/d_008, gfid = 6d71d7bc-00b7-4d4e-ac82-347519ea31e6 [2017-01-23 03:09:18.822643] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 10-testvol-dht: no subvolume [root@gqas009 ~]# grep -i "2017-01-23 05:" /var/log/ganesha-gfapi.log | grep -i "Not Connected" [2017-01-23 05:57:11.799096] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-replicate-client-1: remote operation failed [Transport endpoint is not connected] [2017-01-23 05:57:11.805384] I [socket.c:3439:socket_submit_request] 0-replicate-client-1: not connected (priv->connected = 0) The message "E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-replicate-client-1: remote operation failed [Transport endpoint is not connected]" repeated 231 times between [2017-01-23 05:57:11.799096] and [2017-01-23 05:57:11.862054] [root@gqas009 ~]# [root@gqas009 ~]# grep -E "[2017-01-23 05:5" /var/log/ganesha-gfapi.log | grep -i "Not Connected"^C [root@gqas009 ~]# [root@gqas009 ~]# [root@gqas009 ~]# vim /var/log/ganesha-gfapi.log [root@gqas009 ~]# [root@gqas009 ~]# [root@gqas009 ~]# [root@gqas009 ~]# grep -i "2017-01-23 05:" /var/log/ganesha-gfapi.log | grep -i "disconnect" [2017-01-23 05:43:28.107190] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-15: disconnected from testvol-client-15. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:28.107256] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-14: disconnected from testvol-client-14. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:28.107272] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-13: disconnected from testvol-client-13. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:28.107352] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-17: disconnected from testvol-client-17. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:31.178745] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-39: disconnected from testvol-client-39. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:31.178788] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-38: disconnected from testvol-client-38. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:31.178824] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-37: disconnected from testvol-client-37. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:31.242689] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-41: disconnected from testvol-client-41. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:32.138710] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-35: disconnected from testvol-client-35. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-23 05:43:32.138719] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-32: disconnected from testvol-client-32. Client process will keep trying to connect to glusterd until brick's port is available As you have mentioned there are Disconnect messages for all the sub-volumes. Do you happen to have noted the time at which the I/Os started throwing error? If not it semms difficult to match with the exact error messages in the logs.
(In reply to Soumya Koduri from comment #8) > Could be. Not sure why the volume got unexported. Its not admin intervention > for sure as there is still UP_THREAD running for that volume. > > > (gdb) f 1 > #1 0x00007f0275198add in GLUSTERFSAL_UP_Thread (Arg=0x7f0270006e50) at > /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:152 > 152 rc = glfs_h_poll_upcall(glfsexport->gl_fs, &cbk); > (gdb) p *glfsexport > $5 = {gl_fs = 0x7f0270007d40, mount_path = 0x7f0270002c90 "/testvol", > export_path = 0x7f0270006dc0 "/", saveduid = 0, savedgid = 0, export = > {exports = { > next = 0x7f02753a13a0 <GlusterFS+128>, prev = 0x7f0270007a50}, fsal = > 0x7f02753a1390 <GlusterFS+112>, up_ops = 0x7f02700d23a0, exp_ops = { > get_name = 0x7f03020ef5d0 <get_name>, unexport = 0x7f03020ef5e0 > <export_unexport>, release = 0x7f0275192680 <export_release>, > lookup_path = 0x7f02751927e0 <lookup_path>, lookup_junction = > 0x7f03020ef600 <lookup_junction>, extract_handle = 0x7f0275192600 > <extract_handle>, > create_handle = 0x7f0275192440 <create_handle>, get_fs_dynamic_info = > 0x7f02751922c0 <get_dynamic_info>, fs_supports = 0x7f02751922a0 > <fs_supports>, > fs_maxfilesize = 0x7f0275192280 <fs_maxfilesize>, fs_maxread = > 0x7f0275192260 <fs_maxread>, fs_maxwrite = 0x7f0275192240 <fs_maxwrite>, > fs_maxlink = 0x7f0275192220 <fs_maxlink>, fs_maxnamelen = > 0x7f0275192200 <fs_maxnamelen>, fs_maxpathlen = 0x7f02751921e0 > <fs_maxpathlen>, > fs_lease_time = 0x7f02751921c0 <fs_lease_time>, fs_acl_support = > 0x7f02751921a0 <fs_acl_support>, fs_supported_attrs = 0x7f0275192170 > <fs_supported_attrs>, > fs_umask = 0x7f0275192150 <fs_umask>, fs_xattr_access_rights = > 0x7f0275192130 <fs_xattr_access_rights>, check_quota = 0x7f03020ef6d0 > <check_quota>, > get_quota = 0x7f03020ef9f0 <get_quota>, set_quota = 0x7f03020ef990 > <set_quota>, getdevicelist = 0x7f03020ef6e0 <getdevicelist>, > fs_layouttypes = 0x7f03020ef6f0 <fs_layouttypes>, fs_layout_blocksize > = 0x7f03020ef700 <fs_layout_blocksize>, > fs_maximum_segments = 0x7f03020ef710 <fs_maximum_segments>, > fs_loc_body_size = 0x7f03020ef720 <fs_loc_body_size>, > get_write_verifier = 0x7f03020ef860 <global_verifier>, alloc_state = > 0x7f0275192af0 <glusterfs_alloc_state>, free_state = 0x7f03020f1430 > <free_state>}, > sub_export = 0x0, super_export = 0x7f02700d2270}, acl_enable = false, > pnfs_ds_enabled = true, pnfs_mds_enabled = false, destroy_mode = 0 '\000', > up_thread = 139647710433920} > (gdb) c > Continuing. > > [root@gqas009 ~]# showmount -e Export list for gqas009.sbu.lab.eng.bos.redhat.com: /replicate (everyone) Dang/Frank/Matt, Could you please comment on the possible reasons for a volume entry to be removed from the exports list without any admin intervention (i.e without unexport operation being done explicitly)
Unexport only seems to happen on shudown or as a result of a DBUS operation.
(In reply to Daniel Gryniewicz from comment #10) > Unexport only seems to happen on shudown or as a result of a DBUS operation. Hmm Okay.. But unexport should be calling FSAL's export release() right? In that case we shouldn't be having UP_THREAD still running for that unexported entry.
There seem to volume stop performed from node 009. @Ambarish, Request you to update the gluster operations (any and in the sequence) as well you may perform while running these tests in the bug. It took almost several hours before we found out that the volume was tried to be stopped at one point. 233 gluster v set testvol cluster.shd-wait-qlength: 1000000000000000000000000000000000 234 gluster v set testvol cluster.shd-wait-qlength 1000000000000000000000000000000000 235 gluster v set testvol cluster.shd-wait-qlength 1000000000000000000000000000 236 gluster v set testvol cluster.shd-wait-qlength 10000000000000 237 gluster v stop testvol 238 gluster v rebalance testvol status 271 gluster v statedump testvol all Since the volume stop operation shall be performed on the local host first before executing on peer nodes, as part of stop, the volume got unexported. But on other nodes, volume 'stop' seem to have failed with below errors- [2017-01-23 05:50:28.822386] I [MSGID: 106474] [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found Hostname is gqas009.sbu.lab.eng.bos.redhat.com [2017-01-23 05:50:29.786886] W [MSGID: 106285] [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management: rebalance session is in progress for the volume 'testvol' [2017-01-23 05:50:29.786961] E [MSGID: 106301] [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : rebalance session is in progress for the volume 'testvol' Which the reason why other nodes still seem to be exporting the volume. There doesn't seem to be any issue with how nfs-ganesha is behaving. Request @Atin to confirm if this is expected behaviour of glusterd or if it needs to execute vol stop on other nodes post rebalance operation is done. Current rebalance status shows - [root@gqas009 ~]# gluster v rebalance testvol status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 12 11.5KB 3082934 0 4 completed 5:27:37 gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 7 0 failed 0:15:38 gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 completed 3:25:34 gqas010.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 7 0 failed 0:13:18 volume rebalance: testvol: success [root@gqas009 ~]#
> > Since the volume stop operation shall be performed on the local host first > before executing on peer nodes, as part of stop, the volume got unexported. > But on other nodes, volume 'stop' seem to have failed with below errors- > > [2017-01-23 05:50:28.822386] I [MSGID: 106474] > [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found > Hostname is gqas009.sbu.lab.eng.bos.redhat.com > [2017-01-23 05:50:29.786886] W [MSGID: 106285] > [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management: > rebalance session is in progress for the volume 'testvol' > [2017-01-23 05:50:29.786961] E [MSGID: 106301] > [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of > operation 'Volume Stop' failed on localhost : rebalance session is in > progress for the volume 'testvol' > A correction here. Vol stop seem to have failed on localhost itself but after the volume in unexported. The check for rebalance is post unexport. I am not sure what should be the right order. We need wider audience and discussion to decide the order (if at all possible) in which operations need to be performed for volume stop (or any other vol operation)
Soumya, My sincere apologies for this. The vol stop was done as a part of the setup,not during the tests,so I din't mention about those in the bug..But I should have known better :/ Atin/Soumya, Ideally,CLI should have thrown error right?If I try to stop a volume when rebalance is in progress? And if vol stop itself failed on the localhost,why did the volume get unexported on that node in the first place?
(In reply to Soumya Koduri from comment #12) > There seem to volume stop performed from node 009. > > @Ambarish, > > Request you to update the gluster operations (any and in the sequence) as > well you may perform while running these tests in the bug. It took almost > several hours before we found out that the volume was tried to be stopped at > one point. > > > 233 gluster v set testvol cluster.shd-wait-qlength: > 1000000000000000000000000000000000 > 234 gluster v set testvol cluster.shd-wait-qlength > 1000000000000000000000000000000000 > 235 gluster v set testvol cluster.shd-wait-qlength > 1000000000000000000000000000 > 236 gluster v set testvol cluster.shd-wait-qlength 10000000000000 > 237 gluster v stop testvol > 238 gluster v rebalance testvol status > 271 gluster v statedump testvol all > > > Since the volume stop operation shall be performed on the local host first > before executing on peer nodes, as part of stop, the volume got unexported. > But on other nodes, volume 'stop' seem to have failed with below errors- > > [2017-01-23 05:50:28.822386] I [MSGID: 106474] > [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found > Hostname is gqas009.sbu.lab.eng.bos.redhat.com > [2017-01-23 05:50:29.786886] W [MSGID: 106285] > [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management: > rebalance session is in progress for the volume 'testvol' > [2017-01-23 05:50:29.786961] E [MSGID: 106301] > [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of > operation 'Volume Stop' failed on localhost : rebalance session is in > progress for the volume 'testvol' > > Which the reason why other nodes still seem to be exporting the volume. > There doesn't seem to be any issue with how nfs-ganesha is behaving. > > Request @Atin to confirm if this is expected behaviour of glusterd or if it > needs to execute vol stop on other nodes post rebalance operation is done. volume stop command should fail here given the staging operation in one of the node has failed. We wouldn't be even reaching to commit phase where the volume status is changed to stopped, so the volume should still be in started state. Now coming to the exporting/unexporting part, does this happen at the staging part? If yes, that's the issue, any commits (change to the state of the volume) should be done at the commit where as the validation should take place at the staging phase only. > > Current rebalance status shows - > > [root@gqas009 ~]# gluster v rebalance testvol status > Node Rebalanced-files size > scanned failures skipped status run time in h:m:s > --------- ----------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 12 11.5KB > 3082934 0 4 completed 5:27:37 > gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes > 0 7 0 failed 0:15:38 > gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes > 0 0 0 completed 3:25:34 > gqas010.sbu.lab.eng.bos.redhat.com 0 0Bytes > 0 7 0 failed 0:13:18 > volume rebalance: testvol: success > [root@gqas009 ~]#
(In reply to Ambarish from comment #14) > Soumya, > > My sincere apologies for this. > > The vol stop was done as a part of the setup,not during the tests,so I din't > mention about those in the bug..But I should have known better :/ > > Atin/Soumya, > Ideally,CLI should have thrown error right?If I try to stop a volume when > rebalance is in progress? Yes, CLI should fail as I mentioned in comment 15. > > And if vol stop itself failed on the localhost,why did the volume get > unexported on that node in the first place? Please refer to my earlier comment, this is a flaw on how the ganesha (un)export functionality has been coded. Given this has been done at the staging and GlusterD doesn't have any rollback mechanism, this inconsistency can be seen.
(In reply to Atin Mukherjee from comment #16) > (In reply to Ambarish from comment #14) > > Soumya, > > > > My sincere apologies for this. > > > > The vol stop was done as a part of the setup,not during the tests,so I din't > > mention about those in the bug..But I should have known better :/ > > > > Atin/Soumya, > > Ideally,CLI should have thrown error right?If I try to stop a volume when > > rebalance is in progress? > > Yes, CLI should fail as I mentioned in comment 15. > > > > > And if vol stop itself failed on the localhost,why did the volume get > > unexported on that node in the first place? > > Please refer to my earlier comment, this is a flaw on how the ganesha > (un)export functionality has been coded. Given this has been done at the > staging and GlusterD doesn't have any rollback mechanism, this inconsistency > can be seen. The reason it was done that way is because unexporting volume via ganesha is a 2-step process. 1) Unexport the volume from the nfs-ganesha servers on all the nodes 2) Only after that remove the volume entry from the shared_storage. (@Jiffin, please correct me) - But from what I know of, the only way we can perform these two steps one after the other using glusterd is to perform step1 in stage phase and 2nd one in commit phase - https://review.gluster.org/#/q/topic:bug-1355956 https://review.gluster.org/#/c/14908/ Could you please let us know if there is any other way this could be achieved using glusterd? We can then plan to fix it in the next release if possible.
(In reply to Soumya Koduri from comment #17) > (In reply to Atin Mukherjee from comment #16) > > (In reply to Ambarish from comment #14) > > > Soumya, > > > > > > My sincere apologies for this. > > > > > > The vol stop was done as a part of the setup,not during the tests,so I din't > > > mention about those in the bug..But I should have known better :/ > > > > > > Atin/Soumya, > > > Ideally,CLI should have thrown error right?If I try to stop a volume when > > > rebalance is in progress? > > > > Yes, CLI should fail as I mentioned in comment 15. > > > > > > > > And if vol stop itself failed on the localhost,why did the volume get > > > unexported on that node in the first place? > > > > Please refer to my earlier comment, this is a flaw on how the ganesha > > (un)export functionality has been coded. Given this has been done at the > > staging and GlusterD doesn't have any rollback mechanism, this inconsistency > > can be seen. > > > The reason it was done that way is because unexporting volume via ganesha is > a 2-step process. > > 1) Unexport the volume from the nfs-ganesha servers on all the nodes > 2) Only after that remove the volume entry from the shared_storage. > > (@Jiffin, please correct me) - > But from what I know of, the only way we can perform these two steps one > after the other using glusterd is to perform step1 in stage phase and 2nd > one in commit phase - > https://review.gluster.org/#/q/topic:bug-1355956 > https://review.gluster.org/#/c/14908/ Soumya you are right. Current behavior is this. We rely on export file for the export id of the volume. We can do it another way (not in this release). Check the details of export entry with ganesha using dbus message and then unexport the entry instead of relying export file > Could you please let us know if there is any other way this could be > achieved using glusterd? We can then plan to fix it in the next release if > possible.
Thanks Atin. (In reply to Jiffin from comment #18) > > Soumya you are right. Current behavior is this. We rely on export file for > the export id of the volume. We can do it another way (not in this release). > Check the details of export entry with ganesha using dbus message and then > unexport the entry instead of relying export file > Okay. For just a volume export/unexport, that seems plausible but there could be cases where in volume path can be changed to sub-dir path or multiple sub-dirs could be exported. We need to consider all those cases and check which approach fits in the most. But yes this is something which we cant be addressed at this point of the release.