+++ This bug was initially created as a clone of Bug #1636218 +++ +++ This bug was initially created as a clone of Bug #1635050 +++ Description of problem: With brick multiplexing, when a snapshot restore is done, it leads to a snapshot brick getting the client volume file for the corresponding snapshot volume instead of the correct brick volume file. [2018-09-28 23:34:10.405953] I [glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick1/mirror [2018-09-28 23:34:10.406009] I [glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick2/mirror The above 2 lines indicate that it is a brick process. Because of restore, the original snapshot volume's bricks are stopped leading to a detach of the bricks happening which is logged above. [2018-09-28 23:34:10.407276] E [rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap [2018-09-28 23:34:10.407276] E [rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap [2018-09-28 23:34:10.407299] E [rpcsvc.c:1670:rpcsvc_program_unregister] 0-rpc-service: portmap unregistration of program failed [2018-09-28 23:34:10.407310] E [rpcsvc.c:1670:rpcsvc_program_unregister] 0-rpc-service: portmap unregistration of program failed [2018-09-28 23:34:10.407319] E [rpcsvc.c:1720:rpcsvc_program_unregister] 0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num: 1885957735, Ver: 1, Port: 0 [2018-09-28 23:34:10.407323] E [rpcsvc.c:1720:rpcsvc_program_unregister] 0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num: 1885957735, Ver: 1, Port: 0 [2018-09-28 23:34:10.409365] I [barrier.c:665:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing all the queued fops [2018-09-28 23:34:10.409382] I [barrier.c:665:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing all the queued fops [2018-09-28 23:34:10.413063] I [io-stats.c:3937:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded [2018-09-28 23:34:10.413151] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1' [2018-09-28 23:34:10.413178] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3' [2018-09-28 23:34:10.413182] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:3 [2018-09-28 23:34:10.413195] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:3 [2018-09-28 23:34:10.413700] I [io-stats.c:3937:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded [2018-09-28 23:34:10.413775] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1' [2018-09-28 23:34:10.413786] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3' [2018-09-28 23:34:10.413805] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:2 [2018-09-28 23:34:10.413807] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:2 [2018-09-28 23:34:10.923188] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed [2018-09-28 23:34:10.923350] I [MSGID: 101191] [event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 3 [2018-09-28 23:34:11.031917] I [MSGID: 101191] [event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 4 [2018-09-28 23:34:57.645690] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/protocol/client.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.646288] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol: xlator_api. Fall ba ck to old symbols [2018-09-28 23:34:57.646714] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.646929] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.647103] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.647274] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.647470] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.647817] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.648021] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.649370] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.649654] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.649858] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.650027] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.650192] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.650381] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.650716] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.650914] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol: xlator_api. Fall back to old symbols [2018-09-28 23:34:57.652878] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.address-family' is not recognized [2018-09-28 23:34:57.652911] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.tcp-user-timeout' is not recognized [2018-09-28 23:34:57.652919] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.socket.keepalive-time' is not recognized [2018-09-28 23:34:57.652927] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.socket.keepalive-interval' is not recognized [2018-09-28 23:34:57.652934] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.socket.keepalive-count' is not recognized [2018-09-28 23:34:57.652946] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.address-family' is not recognized [2018-09-28 23:34:57.652954] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.tcp-user-timeout' is not recognized [2018-09-28 23:34:57.652962] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.socket.keepalive-time' is not recognized [2018-09-28 23:34:57.652969] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.socket.keepalive-interval' is not recognized [2018-09-28 23:34:57.652976] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.socket.keepalive-count' is not recognized [2018-09-28 23:34:57.683329] I [MSGID: 114020] [client.c:2354:notify] 7-mirror-client-1: parent translators are ready, attempting connect on transport Final graph: +------------------------------------------------------------------------------+ 1: volume mirror-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host workspace 5: option remote-subvolume /run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick1/mirror 6: option transport-type socket 7: option transport.address-family inet 8: option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab 9: option password 01f72c35-9b3a-47b3-852b-95bdb481e66a 10: option transport.tcp-user-timeout 0 11: option transport.socket.keepalive-time 20 12: option transport.socket.keepalive-interval 2 13: option transport.socket.keepalive-count 9 14: option send-gids true 15: end-volume 16: 17: volume mirror-client-1 18: type protocol/client 19: option ping-timeout 42 20: option remote-host workspace 21: option remote-subvolume /run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick2/mirror 22: option transport-type socket 23: option transport.address-family inet 24: option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab 25: option password 01f72c35-9b3a-47b3-852b-95bdb481e66a 26: option transport.tcp-user-timeout 0 27: option transport.socket.keepalive-time 20 28: option transport.socket.keepalive-interval 2 29: option transport.socket.keepalive-count 9 30: option send-gids true 31: end-volume 32: 33: volume 31717a7458a543dca4bffc8e6b1017cc-replicate-0 34: type cluster/replicate 35: option afr-pending-xattr mirror-client-0,mirror-client-1 36: option use-compound-fops off 37: subvolumes mirror-client-0 mirror-client-1 38: end-volume 39: 40: volume 31717a7458a543dca4bffc8e6b1017cc-dht 41: type cluster/distribute 42: option lock-migration off 43: option force-migration off 44: subvolumes 31717a7458a543dca4bffc8e6b1017cc-replicate-0 45: end-volume 46: 47: volume 31717a7458a543dca4bffc8e6b1017cc-read-only 48: type features/read-only 49: option read-only on 50: subvolumes 31717a7458a543dca4bffc8e6b1017cc-dht 51: end-volume 52: 53: volume 31717a7458a543dca4bffc8e6b1017cc-write-behind 54: type performance/write-behind 55: subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-only 56: end-volume 57: 58: volume 31717a7458a543dca4bffc8e6b1017cc-read-ahead 59: type performance/read-ahead 60: subvolumes 31717a7458a543dca4bffc8e6b1017cc-write-behind 61: end-volume 62: 63: volume 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead 64: type performance/readdir-ahead 65: option parallel-readdir off 66: option rda-request-size 131072 67: option rda-cache-limit 10MB 68: subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-ahead 69: end-volume 70: 71: volume 31717a7458a543dca4bffc8e6b1017cc-io-cache 72: type performance/io-cache 73: subvolumes 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead 74: end-volume 75: 76: volume 31717a7458a543dca4bffc8e6b1017cc-quick-read 77: type performance/quick-read 78: subvolumes 31717a7458a543dca4bffc8e6b1017cc-io-cache 79: end-volume 80: 81: volume 31717a7458a543dca4bffc8e6b1017cc-open-behind 82: type performance/open-behind 83: subvolumes 31717a7458a543dca4bffc8e6b1017cc-quick-read 84: end-volume 85: 86: volume 31717a7458a543dca4bffc8e6b1017cc-md-cache 87: type performance/md-cache 88: subvolumes 31717a7458a543dca4bffc8e6b1017cc-open-behind 89: end-volume 90: 91: volume 31717a7458a543dca4bffc8e6b1017cc 92: type debug/io-stats 93: option log-level INFO 94: option latency-measurement off 95: option count-fop-hits off 96: subvolumes 31717a7458a543dca4bffc8e6b1017cc-md-cache 97: end-volume 98: +------------------------------------------------------------------------------+ Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Create couple of snapshots for a gluster volume 2. Then do a snapshot restore 3. Actual results: Client volfile is sent to the snapshot brick. Expected results: Actual brick volfile should be sent. Additional info: --- Additional comment from Worker Ant on 2018-10-01 18:03:32 EDT --- REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the volfile) posted (#1) for review on master by Raghavendra Bhat --- Additional comment from Mohammed Rafi KC on 2018-10-03 07:04:01 EDT --- I tried to reproduce the issue. I wasn't able to hit with the given steps to reproduce. Am I missing anything here? --- Additional comment from Raghavendra Bhat on 2018-10-03 08:47:30 EDT --- Rafi, once you do snapshot restore, please check the log file of the multiplexed snapshot brick. It prints the client volfile. IMO it should not happen. In fact, due to another memory corruption bug, the brick process crashed after it got the client volfile (the corruption bug is in the client stack). --- Additional comment from Worker Ant on 2018-10-04 01:03:07 EDT --- REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the volfile) posted (#4) for review on master by Atin Mukherjee --- Additional comment from Worker Ant on 2018-10-04 14:33:01 EDT --- REVIEW: https://review.gluster.org/21348 (mgmt/glusterd: use proper path to the volfile) posted (#1) for review on release-4.1 by Raghavendra Bhat --- Additional comment from Worker Ant on 2018-10-05 10:26:07 EDT --- COMMIT: https://review.gluster.org/21348 committed in release-4.1 by "Raghavendra Bhat" <raghavendra> with a commit message- mgmt/glusterd: use proper path to the volfile Till now, glusterd was generating the volfile path for the snapshot volume's bricks like this. /snaps/<snap name>/<brick volfile> But in reality, the path to the brick volfile for a snapshot volume is /snaps/<snap name>/<snap volume name>/<brick volfile> The above workaround was used to distinguish between a mount command used to mount the snapshot volume, and a brick of the snapshot volume, so that based on what is actually happening, glusterd can return the proper volfile (client volfile for the former and the brick volfile for the latter). But, this was causing problems for snapshot restore when brick multiplexing is enabled. Because, with brick multiplexing, it tries to find the volfile and sends GETSPEC rpc call to glusterd using the 2nd style of path i.e. /snaps/<snap name>/<snap volume name>/<brick volfile> So, when the snapshot brick (which is multiplexed) sends a GETSPEC rpc request to glusterd for obtaining the brick volume file, glusterd was returning the client volume file of the snapshot volume instead of the brick volume file. Change-Id: I28b2dfa5d9b379fe943db92c2fdfea879a6a594e fixes: bz#1636218 Signed-off-by: Raghavendra Bhat <raghavendra>
REVIEW: https://review.gluster.org/21580 (mgmt/glusterd: use proper path to the volfile) posted (#1) for review on release-3.12 by Raghavendra Bhat
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. This bag has been fixed in gluster 4.1.6 and gluster5 series