Bug 1415630
| Summary: | [Stress] : FS became read only on one of the Ganesha mounts on Stress setup. | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
| Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> |
| Status: | CLOSED NOTABUG | QA Contact: | Ambarish <asoman> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.2 | CC: | amukherj, asoman, bturner, dang, ffilz, jthottan, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-25 13:09:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ambarish
2017-01-23 10:00:04 UTC
Setup is in same state in case someone wants to take a look. The application side is impacted again with this. Setting blocker flag. On server node - [root@gqas009 ~]# showmount -e Export list for gqas009.sbu.lab.eng.bos.redhat.com: /replicate (everyone) On the client - 192.168.97.121://testvol on /E type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.97.185,local_lock=none,addr=192.168.97.121) The volume which was mounted on the client-side is not exported. Hence because of NFSv4 pseudoFS behaviour, mount is still accessible but with read-only access. NFS-Ganesha servers on other nodes seem to be still exporting testvol. [root@gqas014 ~]# showmount -e Export list for gqas014.sbu.lab.eng.bos.redhat.com: /testvol (everyone) Soumya, What would cause an unexport on only one node? Is it coz of the same rpc issue/network disconnects( as in https://bugzilla.redhat.com/show_bug.cgi?id=1411219) that I am hitting here? Could be. Not sure why the volume got unexported. Its not admin intervention for sure as there is still UP_THREAD running for that volume.
(gdb) f 1
#1 0x00007f0275198add in GLUSTERFSAL_UP_Thread (Arg=0x7f0270006e50) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:152
152 rc = glfs_h_poll_upcall(glfsexport->gl_fs, &cbk);
(gdb) p *glfsexport
$5 = {gl_fs = 0x7f0270007d40, mount_path = 0x7f0270002c90 "/testvol", export_path = 0x7f0270006dc0 "/", saveduid = 0, savedgid = 0, export = {exports = {
next = 0x7f02753a13a0 <GlusterFS+128>, prev = 0x7f0270007a50}, fsal = 0x7f02753a1390 <GlusterFS+112>, up_ops = 0x7f02700d23a0, exp_ops = {
get_name = 0x7f03020ef5d0 <get_name>, unexport = 0x7f03020ef5e0 <export_unexport>, release = 0x7f0275192680 <export_release>,
lookup_path = 0x7f02751927e0 <lookup_path>, lookup_junction = 0x7f03020ef600 <lookup_junction>, extract_handle = 0x7f0275192600 <extract_handle>,
create_handle = 0x7f0275192440 <create_handle>, get_fs_dynamic_info = 0x7f02751922c0 <get_dynamic_info>, fs_supports = 0x7f02751922a0 <fs_supports>,
fs_maxfilesize = 0x7f0275192280 <fs_maxfilesize>, fs_maxread = 0x7f0275192260 <fs_maxread>, fs_maxwrite = 0x7f0275192240 <fs_maxwrite>,
fs_maxlink = 0x7f0275192220 <fs_maxlink>, fs_maxnamelen = 0x7f0275192200 <fs_maxnamelen>, fs_maxpathlen = 0x7f02751921e0 <fs_maxpathlen>,
fs_lease_time = 0x7f02751921c0 <fs_lease_time>, fs_acl_support = 0x7f02751921a0 <fs_acl_support>, fs_supported_attrs = 0x7f0275192170 <fs_supported_attrs>,
fs_umask = 0x7f0275192150 <fs_umask>, fs_xattr_access_rights = 0x7f0275192130 <fs_xattr_access_rights>, check_quota = 0x7f03020ef6d0 <check_quota>,
get_quota = 0x7f03020ef9f0 <get_quota>, set_quota = 0x7f03020ef990 <set_quota>, getdevicelist = 0x7f03020ef6e0 <getdevicelist>,
fs_layouttypes = 0x7f03020ef6f0 <fs_layouttypes>, fs_layout_blocksize = 0x7f03020ef700 <fs_layout_blocksize>,
fs_maximum_segments = 0x7f03020ef710 <fs_maximum_segments>, fs_loc_body_size = 0x7f03020ef720 <fs_loc_body_size>,
get_write_verifier = 0x7f03020ef860 <global_verifier>, alloc_state = 0x7f0275192af0 <glusterfs_alloc_state>, free_state = 0x7f03020f1430 <free_state>},
sub_export = 0x0, super_export = 0x7f02700d2270}, acl_enable = false, pnfs_ds_enabled = true, pnfs_mds_enabled = false, destroy_mode = 0 '\000',
up_thread = 139647710433920}
(gdb) c
Continuing.
22/01/2017 00:03:07 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvol exported at : '/'
22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] posix2fsal_error :FSAL :CRIT :Mapping 77(default) to ERR_FSAL_SERVERFAULT
22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with File descriptor in bad state
22/01/2017 04:42:31 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[work-180] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Undefined server error
22/01/2017 05:13:11 : epoch 98550000 : gqas009.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-26220[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume replicate exported at : '/'
[2017-01-23 03:09:18.811044] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-7: Mismatching GFID's in loc
[2017-01-23 03:09:18.811110] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 10-testvol-dht: Found anomalies in /file_srcdir/gqac010.sbu.lab.eng.bos.redhat.com/thrd_04/d_002/d_001/d_008 (gfid = 6d71d7bc-00b7-4d4e-ac82-347519ea31e6). Holes=1 overlaps=0
[2017-01-23 03:09:18.811155] W [MSGID: 109005] [dht-selfheal.c:2116:dht_selfheal_directory] 10-testvol-dht: Directory selfheal failed : 1 subvolumes have unrecoverable errors. path = /file_srcdir/gqac010.sbu.lab.eng.bos.redhat.com/thrd_04/d_002/d_001/d_008, gfid = 6d71d7bc-00b7-4d4e-ac82-347519ea31e6
[2017-01-23 03:09:18.822643] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 10-testvol-dht: no subvolume
[root@gqas009 ~]# grep -i "2017-01-23 05:" /var/log/ganesha-gfapi.log | grep -i "Not Connected"
[2017-01-23 05:57:11.799096] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-replicate-client-1: remote operation failed [Transport endpoint is not connected]
[2017-01-23 05:57:11.805384] I [socket.c:3439:socket_submit_request] 0-replicate-client-1: not connected (priv->connected = 0)
The message "E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-replicate-client-1: remote operation failed [Transport endpoint is not connected]" repeated 231 times between [2017-01-23 05:57:11.799096] and [2017-01-23 05:57:11.862054]
[root@gqas009 ~]#
[root@gqas009 ~]# grep -E "[2017-01-23 05:5" /var/log/ganesha-gfapi.log | grep -i "Not Connected"^C
[root@gqas009 ~]#
[root@gqas009 ~]#
[root@gqas009 ~]# vim /var/log/ganesha-gfapi.log
[root@gqas009 ~]#
[root@gqas009 ~]#
[root@gqas009 ~]#
[root@gqas009 ~]# grep -i "2017-01-23 05:" /var/log/ganesha-gfapi.log | grep -i "disconnect"
[2017-01-23 05:43:28.107190] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-15: disconnected from testvol-client-15. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:28.107256] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-14: disconnected from testvol-client-14. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:28.107272] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-13: disconnected from testvol-client-13. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:28.107352] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-17: disconnected from testvol-client-17. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.178745] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-39: disconnected from testvol-client-39. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.178788] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-38: disconnected from testvol-client-38. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.178824] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-37: disconnected from testvol-client-37. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:31.242689] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-41: disconnected from testvol-client-41. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:32.138710] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-35: disconnected from testvol-client-35. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-23 05:43:32.138719] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 10-testvol-client-32: disconnected from testvol-client-32. Client process will keep trying to connect to glusterd until brick's port is available
As you have mentioned there are Disconnect messages for all the sub-volumes. Do you happen to have noted the time at which the I/Os started throwing error? If not it semms difficult to match with the exact error messages in the logs.
(In reply to Soumya Koduri from comment #8) > Could be. Not sure why the volume got unexported. Its not admin intervention > for sure as there is still UP_THREAD running for that volume. > > > (gdb) f 1 > #1 0x00007f0275198add in GLUSTERFSAL_UP_Thread (Arg=0x7f0270006e50) at > /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:152 > 152 rc = glfs_h_poll_upcall(glfsexport->gl_fs, &cbk); > (gdb) p *glfsexport > $5 = {gl_fs = 0x7f0270007d40, mount_path = 0x7f0270002c90 "/testvol", > export_path = 0x7f0270006dc0 "/", saveduid = 0, savedgid = 0, export = > {exports = { > next = 0x7f02753a13a0 <GlusterFS+128>, prev = 0x7f0270007a50}, fsal = > 0x7f02753a1390 <GlusterFS+112>, up_ops = 0x7f02700d23a0, exp_ops = { > get_name = 0x7f03020ef5d0 <get_name>, unexport = 0x7f03020ef5e0 > <export_unexport>, release = 0x7f0275192680 <export_release>, > lookup_path = 0x7f02751927e0 <lookup_path>, lookup_junction = > 0x7f03020ef600 <lookup_junction>, extract_handle = 0x7f0275192600 > <extract_handle>, > create_handle = 0x7f0275192440 <create_handle>, get_fs_dynamic_info = > 0x7f02751922c0 <get_dynamic_info>, fs_supports = 0x7f02751922a0 > <fs_supports>, > fs_maxfilesize = 0x7f0275192280 <fs_maxfilesize>, fs_maxread = > 0x7f0275192260 <fs_maxread>, fs_maxwrite = 0x7f0275192240 <fs_maxwrite>, > fs_maxlink = 0x7f0275192220 <fs_maxlink>, fs_maxnamelen = > 0x7f0275192200 <fs_maxnamelen>, fs_maxpathlen = 0x7f02751921e0 > <fs_maxpathlen>, > fs_lease_time = 0x7f02751921c0 <fs_lease_time>, fs_acl_support = > 0x7f02751921a0 <fs_acl_support>, fs_supported_attrs = 0x7f0275192170 > <fs_supported_attrs>, > fs_umask = 0x7f0275192150 <fs_umask>, fs_xattr_access_rights = > 0x7f0275192130 <fs_xattr_access_rights>, check_quota = 0x7f03020ef6d0 > <check_quota>, > get_quota = 0x7f03020ef9f0 <get_quota>, set_quota = 0x7f03020ef990 > <set_quota>, getdevicelist = 0x7f03020ef6e0 <getdevicelist>, > fs_layouttypes = 0x7f03020ef6f0 <fs_layouttypes>, fs_layout_blocksize > = 0x7f03020ef700 <fs_layout_blocksize>, > fs_maximum_segments = 0x7f03020ef710 <fs_maximum_segments>, > fs_loc_body_size = 0x7f03020ef720 <fs_loc_body_size>, > get_write_verifier = 0x7f03020ef860 <global_verifier>, alloc_state = > 0x7f0275192af0 <glusterfs_alloc_state>, free_state = 0x7f03020f1430 > <free_state>}, > sub_export = 0x0, super_export = 0x7f02700d2270}, acl_enable = false, > pnfs_ds_enabled = true, pnfs_mds_enabled = false, destroy_mode = 0 '\000', > up_thread = 139647710433920} > (gdb) c > Continuing. > > [root@gqas009 ~]# showmount -e Export list for gqas009.sbu.lab.eng.bos.redhat.com: /replicate (everyone) Dang/Frank/Matt, Could you please comment on the possible reasons for a volume entry to be removed from the exports list without any admin intervention (i.e without unexport operation being done explicitly) Unexport only seems to happen on shudown or as a result of a DBUS operation. (In reply to Daniel Gryniewicz from comment #10) > Unexport only seems to happen on shudown or as a result of a DBUS operation. Hmm Okay.. But unexport should be calling FSAL's export release() right? In that case we shouldn't be having UP_THREAD still running for that unexported entry. There seem to volume stop performed from node 009.
@Ambarish,
Request you to update the gluster operations (any and in the sequence) as well you may perform while running these tests in the bug. It took almost several hours before we found out that the volume was tried to be stopped at one point.
233 gluster v set testvol cluster.shd-wait-qlength: 1000000000000000000000000000000000
234 gluster v set testvol cluster.shd-wait-qlength 1000000000000000000000000000000000
235 gluster v set testvol cluster.shd-wait-qlength 1000000000000000000000000000
236 gluster v set testvol cluster.shd-wait-qlength 10000000000000
237 gluster v stop testvol
238 gluster v rebalance testvol status
271 gluster v statedump testvol all
Since the volume stop operation shall be performed on the local host first before executing on peer nodes, as part of stop, the volume got unexported. But on other nodes, volume 'stop' seem to have failed with below errors-
[2017-01-23 05:50:28.822386] I [MSGID: 106474] [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found Hostname is gqas009.sbu.lab.eng.bos.redhat.com
[2017-01-23 05:50:29.786886] W [MSGID: 106285] [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management: rebalance session is in progress for the volume 'testvol'
[2017-01-23 05:50:29.786961] E [MSGID: 106301] [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : rebalance session is in progress for the volume 'testvol'
Which the reason why other nodes still seem to be exporting the volume. There doesn't seem to be any issue with how nfs-ganesha is behaving.
Request @Atin to confirm if this is expected behaviour of glusterd or if it needs to execute vol stop on other nodes post rebalance operation is done.
Current rebalance status shows -
[root@gqas009 ~]# gluster v rebalance testvol status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 12 11.5KB 3082934 0 4 completed 5:27:37
gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 7 0 failed 0:15:38
gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 completed 3:25:34
gqas010.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 7 0 failed 0:13:18
volume rebalance: testvol: success
[root@gqas009 ~]#
>
> Since the volume stop operation shall be performed on the local host first
> before executing on peer nodes, as part of stop, the volume got unexported.
> But on other nodes, volume 'stop' seem to have failed with below errors-
>
> [2017-01-23 05:50:28.822386] I [MSGID: 106474]
> [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found
> Hostname is gqas009.sbu.lab.eng.bos.redhat.com
> [2017-01-23 05:50:29.786886] W [MSGID: 106285]
> [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management:
> rebalance session is in progress for the volume 'testvol'
> [2017-01-23 05:50:29.786961] E [MSGID: 106301]
> [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Stop' failed on localhost : rebalance session is in
> progress for the volume 'testvol'
>
A correction here. Vol stop seem to have failed on localhost itself but after the volume in unexported. The check for rebalance is post unexport. I am not sure what should be the right order. We need wider audience and discussion to decide the order (if at all possible) in which operations need to be performed for volume stop (or any other vol operation)
Soumya, My sincere apologies for this. The vol stop was done as a part of the setup,not during the tests,so I din't mention about those in the bug..But I should have known better :/ Atin/Soumya, Ideally,CLI should have thrown error right?If I try to stop a volume when rebalance is in progress? And if vol stop itself failed on the localhost,why did the volume get unexported on that node in the first place? (In reply to Soumya Koduri from comment #12) > There seem to volume stop performed from node 009. > > @Ambarish, > > Request you to update the gluster operations (any and in the sequence) as > well you may perform while running these tests in the bug. It took almost > several hours before we found out that the volume was tried to be stopped at > one point. > > > 233 gluster v set testvol cluster.shd-wait-qlength: > 1000000000000000000000000000000000 > 234 gluster v set testvol cluster.shd-wait-qlength > 1000000000000000000000000000000000 > 235 gluster v set testvol cluster.shd-wait-qlength > 1000000000000000000000000000 > 236 gluster v set testvol cluster.shd-wait-qlength 10000000000000 > 237 gluster v stop testvol > 238 gluster v rebalance testvol status > 271 gluster v statedump testvol all > > > Since the volume stop operation shall be performed on the local host first > before executing on peer nodes, as part of stop, the volume got unexported. > But on other nodes, volume 'stop' seem to have failed with below errors- > > [2017-01-23 05:50:28.822386] I [MSGID: 106474] > [glusterd-ganesha.c:456:check_host_list] 0-management: ganesha host found > Hostname is gqas009.sbu.lab.eng.bos.redhat.com > [2017-01-23 05:50:29.786886] W [MSGID: 106285] > [glusterd-volume-ops.c:1751:glusterd_op_stage_stop_volume] 0-management: > rebalance session is in progress for the volume 'testvol' > [2017-01-23 05:50:29.786961] E [MSGID: 106301] > [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of > operation 'Volume Stop' failed on localhost : rebalance session is in > progress for the volume 'testvol' > > Which the reason why other nodes still seem to be exporting the volume. > There doesn't seem to be any issue with how nfs-ganesha is behaving. > > Request @Atin to confirm if this is expected behaviour of glusterd or if it > needs to execute vol stop on other nodes post rebalance operation is done. volume stop command should fail here given the staging operation in one of the node has failed. We wouldn't be even reaching to commit phase where the volume status is changed to stopped, so the volume should still be in started state. Now coming to the exporting/unexporting part, does this happen at the staging part? If yes, that's the issue, any commits (change to the state of the volume) should be done at the commit where as the validation should take place at the staging phase only. > > Current rebalance status shows - > > [root@gqas009 ~]# gluster v rebalance testvol status > Node Rebalanced-files size > scanned failures skipped status run time in h:m:s > --------- ----------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 12 11.5KB > 3082934 0 4 completed 5:27:37 > gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes > 0 7 0 failed 0:15:38 > gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes > 0 0 0 completed 3:25:34 > gqas010.sbu.lab.eng.bos.redhat.com 0 0Bytes > 0 7 0 failed 0:13:18 > volume rebalance: testvol: success > [root@gqas009 ~]# (In reply to Ambarish from comment #14) > Soumya, > > My sincere apologies for this. > > The vol stop was done as a part of the setup,not during the tests,so I din't > mention about those in the bug..But I should have known better :/ > > Atin/Soumya, > Ideally,CLI should have thrown error right?If I try to stop a volume when > rebalance is in progress? Yes, CLI should fail as I mentioned in comment 15. > > And if vol stop itself failed on the localhost,why did the volume get > unexported on that node in the first place? Please refer to my earlier comment, this is a flaw on how the ganesha (un)export functionality has been coded. Given this has been done at the staging and GlusterD doesn't have any rollback mechanism, this inconsistency can be seen. (In reply to Atin Mukherjee from comment #16) > (In reply to Ambarish from comment #14) > > Soumya, > > > > My sincere apologies for this. > > > > The vol stop was done as a part of the setup,not during the tests,so I din't > > mention about those in the bug..But I should have known better :/ > > > > Atin/Soumya, > > Ideally,CLI should have thrown error right?If I try to stop a volume when > > rebalance is in progress? > > Yes, CLI should fail as I mentioned in comment 15. > > > > > And if vol stop itself failed on the localhost,why did the volume get > > unexported on that node in the first place? > > Please refer to my earlier comment, this is a flaw on how the ganesha > (un)export functionality has been coded. Given this has been done at the > staging and GlusterD doesn't have any rollback mechanism, this inconsistency > can be seen. The reason it was done that way is because unexporting volume via ganesha is a 2-step process. 1) Unexport the volume from the nfs-ganesha servers on all the nodes 2) Only after that remove the volume entry from the shared_storage. (@Jiffin, please correct me) - But from what I know of, the only way we can perform these two steps one after the other using glusterd is to perform step1 in stage phase and 2nd one in commit phase - https://review.gluster.org/#/q/topic:bug-1355956 https://review.gluster.org/#/c/14908/ Could you please let us know if there is any other way this could be achieved using glusterd? We can then plan to fix it in the next release if possible. (In reply to Soumya Koduri from comment #17) > (In reply to Atin Mukherjee from comment #16) > > (In reply to Ambarish from comment #14) > > > Soumya, > > > > > > My sincere apologies for this. > > > > > > The vol stop was done as a part of the setup,not during the tests,so I din't > > > mention about those in the bug..But I should have known better :/ > > > > > > Atin/Soumya, > > > Ideally,CLI should have thrown error right?If I try to stop a volume when > > > rebalance is in progress? > > > > Yes, CLI should fail as I mentioned in comment 15. > > > > > > > > And if vol stop itself failed on the localhost,why did the volume get > > > unexported on that node in the first place? > > > > Please refer to my earlier comment, this is a flaw on how the ganesha > > (un)export functionality has been coded. Given this has been done at the > > staging and GlusterD doesn't have any rollback mechanism, this inconsistency > > can be seen. > > > The reason it was done that way is because unexporting volume via ganesha is > a 2-step process. > > 1) Unexport the volume from the nfs-ganesha servers on all the nodes > 2) Only after that remove the volume entry from the shared_storage. > > (@Jiffin, please correct me) - > But from what I know of, the only way we can perform these two steps one > after the other using glusterd is to perform step1 in stage phase and 2nd > one in commit phase - > https://review.gluster.org/#/q/topic:bug-1355956 > https://review.gluster.org/#/c/14908/ Soumya you are right. Current behavior is this. We rely on export file for the export id of the volume. We can do it another way (not in this release). Check the details of export entry with ganesha using dbus message and then unexport the entry instead of relying export file > Could you please let us know if there is any other way this could be > achieved using glusterd? We can then plan to fix it in the next release if > possible. Thanks Atin. (In reply to Jiffin from comment #18) > > Soumya you are right. Current behavior is this. We rely on export file for > the export id of the volume. We can do it another way (not in this release). > Check the details of export entry with ganesha using dbus message and then > unexport the entry instead of relying export file > Okay. For just a volume export/unexport, that seems plausible but there could be cases where in volume path can be changed to sub-dir path or multiple sub-dirs could be exported. We need to consider all those cases and check which approach fits in the most. But yes this is something which we cant be addressed at this point of the release. |