Bug 1415630 - [Stress] : FS became read only on one of the Ganesha mounts on Stress setup.
Summary: [Stress] : FS became read only on one of the Ganesha mounts on Stress setup.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Kaleb KEITHLEY
QA Contact: Ambarish
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-23 10:00 UTC by Ambarish
Modified: 2017-01-25 13:24 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-25 13:09:50 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Ambarish 2017-01-23 10:00:04 UTC
Description of problem:
------------------------

On Stress Setup,I was doing MTSH + lots of writes on EC over Ganesha mounts(v3/v4).

Almost after 10 hours of workload,one one of the clients,all new writes started to fail and the FS became read only all of a sudden.

[root@gqac025 /]# mount -t nfs -o vers=4 192.168.97.121://testvol E

[root@gqac025 /]# cd E

[root@gqac025 E]# touch G
touch: cannot touch ‘G’: Read-only file system
[root@gqac025 E]# 

Mount shows RW access :

192.168.97.121://testvol on /E type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.97.185,local_lock=none,addr=192.168.97.121)



Version-Release number of selected component (if applicable):
-------------------------------------------------------------

nfs-ganesha-2.4.1-6.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-12.el7rhgs.x86_64


How reproducible:
-----------------

1/1


Actual results:
---------------

FS became read only.

Expected results:
----------------

Additional info:
---------------

 
Status of volume: testvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick1                               49158     0          Y       29725
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49164     0          Y       24750
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49164     0          Y       24867
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49164     0          Y       25931
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49165     0          Y       25201
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49165     0          Y       24769
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick1                               49153     0          Y       29771
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49166     0          Y       24788
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49165     0          Y       24886
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49165     0          Y       25950
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49166     0          Y       24905
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49166     0          Y       25969
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick1                               49154     0          Y       29827
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick                                49167     0          Y       24807
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick                                49167     0          Y       25988
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks4/brick                                49167     0          Y       24924
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49168     0          Y       25258
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49168     0          Y       24826
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49169     0          Y       25277
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49169     0          Y       24845
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49168     0          Y       26007
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks6/brick                                49168     0          Y       24943
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49169     0          Y       24962
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks5/brick                                49169     0          Y       26026
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick1                               49155     0          Y       29909
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick                                49170     0          Y       24864
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick                                49170     0          Y       26045
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks7/brick                                49170     0          Y       24981
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49171     0          Y       24883
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49171     0          Y       25315
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49172     0          Y       25336
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49172     0          Y       24902
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49171     0          Y       26064
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49171     0          Y       25000
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49172     0          Y       25019
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks8/brick                                49172     0          Y       26083
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       25355
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       24921
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       26102
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49173     0          Y       25038
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick1                              49156     0          Y       30009
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49174     0          Y       24940
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49175     0          Y       25393
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49175     0          Y       24959
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49174     0          Y       26121
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks12/brick                               49174     0          Y       25057
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49175     0          Y       25076
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49175     0          Y       26140
Self-heal Daemon on localhost               N/A       N/A        Y       17960
Self-heal Daemon on gqas010.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       13756
Self-heal Daemon on gqas015.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17415
Self-heal Daemon on gqas014.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17200
 
Task Status of Volume testvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 60d3e3e4-661c-4520-9f5f-482d95d81a82
Status               : in progress         
 
[root@gqas009 ~]#

Comment 3 Ambarish 2017-01-23 10:00:57 UTC
Setup is in same state in case someone wants to take a look.

Comment 4 Ambarish 2017-01-23 10:02:29 UTC
The application side is impacted again with this.
Setting blocker flag.

Comment 6 Soumya Koduri 2017-01-23 12:50:20 UTC
On server node - 
[root@gqas009 ~]# showmount -e
Export list for gqas009.sbu.lab.eng.bos.redhat.com:
/replicate (everyone)

On the client - 

192.168.97.121://testvol on /E type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.97.185,local_lock=none,addr=192.168.97.121)


The volume which was mounted on the client-side is not exported. Hence because of NFSv4 pseudoFS behaviour, mount is still accessible but with read-only access. 

NFS-Ganesha servers on other nodes seem to be still  exporting testvol.

[root@gqas014 ~]# showmount -e
Export list for gqas014.sbu.lab.eng.bos.redhat.com:
/testvol   (everyone)

Comment 7 Ambarish 2017-01-23 13:25:55 UTC
Soumya,

What would cause an unexport on only one node?

Is it coz of the same rpc issue/network disconnects( as in https://bugzilla.redhat.com/show_bug.cgi?id=1411219) that I am hitting here?

Comment 8 Soumya Koduri 2017-01-23 16:58:16 UTC
Could be. Not sure why the volume got unexported. Its not admin intervention for sure as there is still UP_THREAD running for that volume.


(gdb) f 1
#1  0x00007f0275198add in GLUSTERFSAL_UP_Thread (Arg=0x7f0270006e50) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:152
152			rc = glfs_h_poll_upcall(glfsexport->gl_fs, &cbk);
(gdb) p *glfsexport
$5 = {gl_fs = 0x7f0270007d40, mount_path = 0x7f0270002c90 "/testvol", export_path = 0x7f0270006dc0 "/", saveduid = 0, savedgid = 0, export = {exports = {
      next = 0x7f02753a13a0 <GlusterFS+128>, prev = 0x7f0270007a50}, fsal = 0x7f02753a1390 <GlusterFS+112>, up_ops = 0x7f02700d23a0, exp_ops = {
      get_name = 0x7f03020ef5d0 <get_name>, unexport = 0x7f03020ef5e0 <export_unexport>, release = 0x7f0275192680 <export_release>, 
      lookup_path = 0x7f02751927e0 <lookup_path>, lookup_junction = 0x7f03020ef600 <lookup_junction>, extract_handle = 0x7f0275192600 <extract_handle>, 
      create_handle = 0x7f0275192440 <create_handle>, get_fs_dynamic_info = 0x7f02751922c0 <get_dynamic_info>, fs_supports = 0x7f02751922a0 <fs_supports>, 
      fs_maxfilesize = 0x7f0275192280 <fs_maxfilesize>, fs_maxread = 0x7f0275192260 <fs_maxread>, fs_maxwrite = 0x7f0275192240 <fs_maxwrite>, 
      fs_maxlink = 0x7f0275192220 <fs_maxlink>, fs_maxnamelen = 0x7f0275192200 <fs_maxnamelen>, fs_maxpathlen = 0x7f02751921e0 <fs_maxpathlen>, 
      fs_lease_time = 0x7f02751921c0 <fs_lease_time>, fs_acl_support = 0x7f02751921a0 <fs_acl_support>, fs_supported_attrs = 0x7f0275192170 <fs_supported_attrs>, 
      fs_umask = 0x7f0275192150 <fs_umask>, fs_xattr_access_rights = 0x7f0275192130 <fs_xattr_access_rights>, check_quota = 0x7f03020ef6d0 <check_quota>, 
      get_quota = 0x7f03020ef9f0 <get_quota>, set_quota = 0x7f03020ef990 <set_quota>, getdevicelist = 0x7f03020ef6e0 <getdevicelist>, 
      fs_layouttypes = 0x7f03020ef6f0 <fs_layouttypes>, fs_layout_blocksize = 0x7f03020ef700 <fs_layout_blocksize>, 
      fs_maximum_segments = 0x7f03020ef710 <fs_maximum_segments>, fs_loc_body_size = 0x7f03020ef720 <fs_loc_body_size>, 
      get_write_verifier = 0x7f03020ef860 <global_verifier>, alloc_state = 0x7f0275192af0 <glusterfs_alloc_state>, free_state = 0x7f03020f1430 <free_state>}, 
    sub_export = 0x0, super_export = 0x7f02700d2270}, acl_enable = false, pnfs_ds_enabled = true, pnfs_mds_enabled = false, destroy_mode = 0 '\000', 
  up_thread = 139647710433920}
(gdb) c
Continuing.



22/01/2017 00:03:07 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvol exported at : '/'
22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] posix2fsal_error :FSAL :CRIT :Mapping 77(default) to ERR_FSAL_SERVERFAULT
22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with File descriptor in bad state
22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Undefined server error
22/01/2017 05:13:11 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume replicate exported at : '/'




[2017-01-23 03:09:18.811044] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-7: Mismatching GFID's in loc
[2017-01-23 03:09:18.811110] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 10-testvol-dht: Found anomalies in /file_srcdir/gqac010.sbu.lab.eng.bos.redhat.com/thrd_04/d_002/d_001/d_008 (gfid = 6d71d7bc-00b7-4d4e-ac82-347519ea31e6). Holes=1 overlaps=0
[2017-01-23 03:09:18.811155] W [MSGID: 109005] [dht-selfheal.c:2116:dht_selfheal_directory] 10-testvol-dht: Directory selfheal failed : 1 subvolumes have unrecoverable errors. path = /file_srcdir/gqac010.sbu.lab.eng.bos.redhat.com/thrd_04/d_002/d_001/d_008, gfid = 6d71d7bc-00b7-4d4e-ac82-347519ea31e6
[2017-01-23 03:09:18.822643] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 10-testvol-dht: no subvolume


[root@gqas009 ~]# grep -i "2017-01-23 05:" /var/log/ganesha-gfapi.log | grep -i "Not Connected"
[2017-01-23 05:57:11.799096] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-replicate-client-1: remote operation failed [Transport endpoint is not connected]
[2017-01-23 05:57:11.805384] I [socket.c:3439:socket_submit_request] 0-replicate-client-1: not connected (priv->connected = 0)
The message "E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-replicate-client-1: remote operation failed [Transport endpoint is not connected]" repeated 231 times between [2017-01-23 05:57:11.799096] and [2017-01-23 05:57:11.862054]
[root@gqas009 ~]# 
[root@gqas009 ~]# grep -E "[2017-01-23 05:5" /var/log/ganesha-gfapi.log | grep -i "Not Connected"^C
[root@gqas009 ~]# 
[root@gqas009 ~]# 
[root@gqas009 ~]# vim /var/log/ganesha-gfapi.log 
[root@gqas009 ~]# 
[root@gqas009 ~]# 
[root@gqas009 ~]# 
[root@gqas009 ~]# grep -i "2017-01-23 05:" /var/log/ganesha-gfapi.log | grep -i "disconnect"
[2017-01-23 05:43:28.107190] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-15: disconnected from testvol-client-15. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:28.107256] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-14: disconnected from testvol-client-14. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:28.107272] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-13: disconnected from testvol-client-13. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:28.107352] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-17: disconnected from testvol-client-17. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.178745] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-39: disconnected from testvol-client-39. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.178788] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-38: disconnected from testvol-client-38. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.178824] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-37: disconnected from testvol-client-37. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.242689] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-41: disconnected from testvol-client-41. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:32.138710] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-35: disconnected from testvol-client-35. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:32.138719] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-32: disconnected from testvol-client-32. Client process will keep trying to connect to glusterd until brick's port is available



As you have mentioned there are Disconnect messages for all the sub-volumes. Do you happen to have noted the time at which the I/Os started throwing error? If not it semms difficult to match with the exact error messages in the logs.

Comment 9 Soumya Koduri 2017-01-23 17:03:08 UTC
(In reply to Soumya Koduri from comment #8)
> Could be. Not sure why the volume got unexported. Its not admin intervention
> for sure as there is still UP_THREAD running for that volume.
> 
> 
> (gdb) f 1
> #1  0x00007f0275198add in GLUSTERFSAL_UP_Thread (Arg=0x7f0270006e50) at
> /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:152
> 152			rc = glfs_h_poll_upcall(glfsexport->gl_fs, &cbk);
> (gdb) p *glfsexport
> $5 = {gl_fs = 0x7f0270007d40, mount_path = 0x7f0270002c90 "/testvol",
> export_path = 0x7f0270006dc0 "/", saveduid = 0, savedgid = 0, export =
> {exports = {
>       next = 0x7f02753a13a0 <GlusterFS+128>, prev = 0x7f0270007a50}, fsal =
> 0x7f02753a1390 <GlusterFS+112>, up_ops = 0x7f02700d23a0, exp_ops = {
>       get_name = 0x7f03020ef5d0 <get_name>, unexport = 0x7f03020ef5e0
> <export_unexport>, release = 0x7f0275192680 <export_release>, 
>       lookup_path = 0x7f02751927e0 <lookup_path>, lookup_junction =
> 0x7f03020ef600 <lookup_junction>, extract_handle = 0x7f0275192600
> <extract_handle>, 
>       create_handle = 0x7f0275192440 <create_handle>, get_fs_dynamic_info =
> 0x7f02751922c0 <get_dynamic_info>, fs_supports = 0x7f02751922a0
> <fs_supports>, 
>       fs_maxfilesize = 0x7f0275192280 <fs_maxfilesize>, fs_maxread =
> 0x7f0275192260 <fs_maxread>, fs_maxwrite = 0x7f0275192240 <fs_maxwrite>, 
>       fs_maxlink = 0x7f0275192220 <fs_maxlink>, fs_maxnamelen =
> 0x7f0275192200 <fs_maxnamelen>, fs_maxpathlen = 0x7f02751921e0
> <fs_maxpathlen>, 
>       fs_lease_time = 0x7f02751921c0 <fs_lease_time>, fs_acl_support =
> 0x7f02751921a0 <fs_acl_support>, fs_supported_attrs = 0x7f0275192170
> <fs_supported_attrs>, 
>       fs_umask = 0x7f0275192150 <fs_umask>, fs_xattr_access_rights =
> 0x7f0275192130 <fs_xattr_access_rights>, check_quota = 0x7f03020ef6d0
> <check_quota>, 
>       get_quota = 0x7f03020ef9f0 <get_quota>, set_quota = 0x7f03020ef990
> <set_quota>, getdevicelist = 0x7f03020ef6e0 <getdevicelist>, 
>       fs_layouttypes = 0x7f03020ef6f0 <fs_layouttypes>, fs_layout_blocksize
> = 0x7f03020ef700 <fs_layout_blocksize>, 
>       fs_maximum_segments = 0x7f03020ef710 <fs_maximum_segments>,
> fs_loc_body_size = 0x7f03020ef720 <fs_loc_body_size>, 
>       get_write_verifier = 0x7f03020ef860 <global_verifier>, alloc_state =
> 0x7f0275192af0 <glusterfs_alloc_state>, free_state = 0x7f03020f1430
> <free_state>}, 
>     sub_export = 0x0, super_export = 0x7f02700d2270}, acl_enable = false,
> pnfs_ds_enabled = true, pnfs_mds_enabled = false, destroy_mode = 0 '\000', 
>   up_thread = 139647710433920}
> (gdb) c
> Continuing.
> 
> 

[root@gqas009 ~]# showmount -e
Export list for gqas009.sbu.lab.eng.bos.redhat.com:
/replicate (everyone)

Dang/Frank/Matt,

Could you please comment on the possible reasons for a volume entry to be removed from the exports list without any admin intervention (i.e without unexport operation being done explicitly)

Comment 10 Daniel Gryniewicz 2017-01-23 18:07:54 UTC
Unexport only seems to happen on shudown or as a result of a DBUS operation.

Comment 11 Soumya Koduri 2017-01-23 18:47:34 UTC
(In reply to Daniel Gryniewicz from comment #10)
> Unexport only seems to happen on shudown or as a result of a DBUS operation.

Hmm Okay.. But unexport should be calling FSAL's export release() right? In that case we shouldn't be having UP_THREAD still running for that unexported entry.

Comment 12 Soumya Koduri 2017-01-23 19:36:45 UTC
There seem to volume stop performed from node 009.

@Ambarish,

Request you to update the gluster operations (any and in the sequence) as well you may perform while running these tests in the bug. It took almost several hours before we found out that the volume was tried to be stopped at one point.


 233  gluster v set testvol cluster.shd-wait-qlength: 1000000000000000000000000000000000
  234  gluster v set testvol cluster.shd-wait-qlength 1000000000000000000000000000000000
  235  gluster v set testvol cluster.shd-wait-qlength 1000000000000000000000000000
  236  gluster v set testvol cluster.shd-wait-qlength 10000000000000
  237  gluster v stop testvol
  238  gluster v rebalance testvol status
  271  gluster v statedump testvol all


Since the volume stop operation shall be performed on the local host first before executing on peer nodes, as part of stop, the volume got unexported. But on other nodes, volume 'stop' seem to have failed with below errors-

[2017-01-23 05:50:28.822386] I [MSGID: 106474] [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found Hostname is gqas009.sbu.lab.eng.bos.redhat.com
[2017-01-23 05:50:29.786886] W [MSGID: 106285] [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management: rebalance session is in progress for the volume 'testvol'
[2017-01-23 05:50:29.786961] E [MSGID: 106301] [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : rebalance session is in progress for the volume 'testvol'

Which the reason why other nodes still seem to be exporting the volume. There doesn't seem to be any issue with how nfs-ganesha is behaving. 

Request @Atin to confirm if this is expected behaviour of glusterd or if it needs to execute vol stop on other nodes post rebalance operation is done.

Current rebalance status shows -

[root@gqas009 ~]# gluster v rebalance testvol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost               12        11.5KB       3082934             0             4            completed        5:27:37
      gqas015.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             7             0               failed        0:15:38
      gqas014.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0            completed        3:25:34
      gqas010.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             7             0               failed        0:13:18
volume rebalance: testvol: success
[root@gqas009 ~]#

Comment 13 Soumya Koduri 2017-01-23 19:42:52 UTC
> 
> Since the volume stop operation shall be performed on the local host first
> before executing on peer nodes, as part of stop, the volume got unexported.
> But on other nodes, volume 'stop' seem to have failed with below errors-
> 
> [2017-01-23 05:50:28.822386] I [MSGID: 106474]
> [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found
> Hostname is gqas009.sbu.lab.eng.bos.redhat.com
> [2017-01-23 05:50:29.786886] W [MSGID: 106285]
> [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management:
> rebalance session is in progress for the volume 'testvol'
> [2017-01-23 05:50:29.786961] E [MSGID: 106301]
> [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Stop' failed on localhost : rebalance session is in
> progress for the volume 'testvol'
> 


A correction here. Vol stop seem to have failed on localhost itself but after the volume in unexported. The check for rebalance is post unexport. I am not sure what should be the right order. We need wider audience and discussion to decide the order (if at all possible) in which operations need to be performed for volume stop (or any other vol operation)

Comment 14 Ambarish 2017-01-24 03:31:54 UTC
Soumya,

My sincere apologies for this.

The vol stop was done as a part of the setup,not during the tests,so I din't mention about those in the bug..But I should have known better :/

Atin/Soumya,
Ideally,CLI should have thrown error right?If I try to stop a volume when rebalance is in progress?

And if vol stop itself failed on the localhost,why did the volume get unexported on that node in the first place?

Comment 15 Atin Mukherjee 2017-01-24 04:38:21 UTC
(In reply to Soumya Koduri from comment #12)
> There seem to volume stop performed from node 009.
> 
> @Ambarish,
> 
> Request you to update the gluster operations (any and in the sequence) as
> well you may perform while running these tests in the bug. It took almost
> several hours before we found out that the volume was tried to be stopped at
> one point.
> 
> 
>  233  gluster v set testvol cluster.shd-wait-qlength:
> 1000000000000000000000000000000000
>   234  gluster v set testvol cluster.shd-wait-qlength
> 1000000000000000000000000000000000
>   235  gluster v set testvol cluster.shd-wait-qlength
> 1000000000000000000000000000
>   236  gluster v set testvol cluster.shd-wait-qlength 10000000000000
>   237  gluster v stop testvol
>   238  gluster v rebalance testvol status
>   271  gluster v statedump testvol all
> 
> 
> Since the volume stop operation shall be performed on the local host first
> before executing on peer nodes, as part of stop, the volume got unexported.
> But on other nodes, volume 'stop' seem to have failed with below errors-
> 
> [2017-01-23 05:50:28.822386] I [MSGID: 106474]
> [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found
> Hostname is gqas009.sbu.lab.eng.bos.redhat.com
> [2017-01-23 05:50:29.786886] W [MSGID: 106285]
> [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management:
> rebalance session is in progress for the volume 'testvol'
> [2017-01-23 05:50:29.786961] E [MSGID: 106301]
> [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Stop' failed on localhost : rebalance session is in
> progress for the volume 'testvol'
> 
> Which the reason why other nodes still seem to be exporting the volume.
> There doesn't seem to be any issue with how nfs-ganesha is behaving. 
> 
> Request @Atin to confirm if this is expected behaviour of glusterd or if it
> needs to execute vol stop on other nodes post rebalance operation is done.

volume stop command should fail here given the staging operation in one of the node has failed. We wouldn't be even reaching to commit phase where the volume status is changed to stopped, so the volume should still be in started state. Now coming to the exporting/unexporting part, does this happen at the staging part? If yes, that's the issue, any commits (change to the state of the volume) should be done at the commit where as the validation should take place at the staging phase only.

> 
> Current rebalance status shows -
> 
> [root@gqas009 ~]# gluster v rebalance testvol status
>                                     Node Rebalanced-files          size     
> scanned      failures       skipped               status  run time in h:m:s
>                                ---------      -----------   -----------  
> -----------   -----------   -----------         ------------    
> --------------
>                                localhost               12        11.5KB     
> 3082934             0             4            completed        5:27:37
>       gqas015.sbu.lab.eng.bos.redhat.com                0        0Bytes     
> 0             7             0               failed        0:15:38
>       gqas014.sbu.lab.eng.bos.redhat.com                0        0Bytes     
> 0             0             0            completed        3:25:34
>       gqas010.sbu.lab.eng.bos.redhat.com                0        0Bytes     
> 0             7             0               failed        0:13:18
> volume rebalance: testvol: success
> [root@gqas009 ~]#

Comment 16 Atin Mukherjee 2017-01-24 04:40:14 UTC
(In reply to Ambarish from comment #14)
> Soumya,
> 
> My sincere apologies for this.
> 
> The vol stop was done as a part of the setup,not during the tests,so I din't
> mention about those in the bug..But I should have known better :/
> 
> Atin/Soumya,
> Ideally,CLI should have thrown error right?If I try to stop a volume when
> rebalance is in progress?

Yes, CLI should fail as I mentioned in comment 15.

> 
> And if vol stop itself failed on the localhost,why did the volume get
> unexported on that node in the first place?

Please refer to my earlier comment, this is a flaw on how the ganesha (un)export functionality has been coded. Given this has been done at the staging and GlusterD doesn't have any rollback mechanism, this inconsistency can be seen.

Comment 17 Soumya Koduri 2017-01-24 07:29:18 UTC
(In reply to Atin Mukherjee from comment #16)
> (In reply to Ambarish from comment #14)
> > Soumya,
> > 
> > My sincere apologies for this.
> > 
> > The vol stop was done as a part of the setup,not during the tests,so I din't
> > mention about those in the bug..But I should have known better :/
> > 
> > Atin/Soumya,
> > Ideally,CLI should have thrown error right?If I try to stop a volume when
> > rebalance is in progress?
> 
> Yes, CLI should fail as I mentioned in comment 15.
> 
> > 
> > And if vol stop itself failed on the localhost,why did the volume get
> > unexported on that node in the first place?
> 
> Please refer to my earlier comment, this is a flaw on how the ganesha
> (un)export functionality has been coded. Given this has been done at the
> staging and GlusterD doesn't have any rollback mechanism, this inconsistency
> can be seen.


The reason it was done that way is because unexporting volume via ganesha is a 2-step process. 

1) Unexport the volume from the nfs-ganesha servers on all the nodes
2) Only after that remove the volume entry from the shared_storage.

(@Jiffin, please correct me) -
But from what I know of, the only way we can perform these two steps one after the other using glusterd is to perform step1 in stage phase and 2nd one in commit phase -
https://review.gluster.org/#/q/topic:bug-1355956
https://review.gluster.org/#/c/14908/

Could you please let us know if there is any other way this could be achieved using glusterd? We can then plan to fix it in the next release if possible.

Comment 18 Jiffin 2017-01-24 07:36:50 UTC
(In reply to Soumya Koduri from comment #17)
> (In reply to Atin Mukherjee from comment #16)
> > (In reply to Ambarish from comment #14)
> > > Soumya,
> > > 
> > > My sincere apologies for this.
> > > 
> > > The vol stop was done as a part of the setup,not during the tests,so I din't
> > > mention about those in the bug..But I should have known better :/
> > > 
> > > Atin/Soumya,
> > > Ideally,CLI should have thrown error right?If I try to stop a volume when
> > > rebalance is in progress?
> > 
> > Yes, CLI should fail as I mentioned in comment 15.
> > 
> > > 
> > > And if vol stop itself failed on the localhost,why did the volume get
> > > unexported on that node in the first place?
> > 
> > Please refer to my earlier comment, this is a flaw on how the ganesha
> > (un)export functionality has been coded. Given this has been done at the
> > staging and GlusterD doesn't have any rollback mechanism, this inconsistency
> > can be seen.
> 
> 
> The reason it was done that way is because unexporting volume via ganesha is
> a 2-step process. 
> 
> 1) Unexport the volume from the nfs-ganesha servers on all the nodes
> 2) Only after that remove the volume entry from the shared_storage.
> 
> (@Jiffin, please correct me) -
> But from what I know of, the only way we can perform these two steps one
> after the other using glusterd is to perform step1 in stage phase and 2nd
> one in commit phase -
> https://review.gluster.org/#/q/topic:bug-1355956
> https://review.gluster.org/#/c/14908/

Soumya you are right. Current behavior is this. We rely on export file for the export id of the volume. We can do it another way (not in this release).
Check the details of export entry with ganesha using dbus message and then unexport the entry instead of relying export file

> Could you please let us know if there is any other way this could be
> achieved using glusterd? We can then plan to fix it in the next release if
> possible.

Comment 20 Soumya Koduri 2017-01-24 07:52:13 UTC
Thanks Atin.

(In reply to Jiffin from comment #18)

> 
> Soumya you are right. Current behavior is this. We rely on export file for
> the export id of the volume. We can do it another way (not in this release).
> Check the details of export entry with ganesha using dbus message and then
> unexport the entry instead of relying export file
> 
Okay. For just a volume export/unexport, that seems plausible but there could be cases where in volume path can be changed to sub-dir path or multiple sub-dirs could be exported. We need to consider all those cases and check which approach fits in the most. But yes this is something which we cant be addressed at this point of the release.


Note You need to log in before you can comment on or make changes to this bug.