Bug 1647165

Summary: [SNAPSHOT]: with brick multiplexing, snapshot restore will make glusterd send wrong volfile
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: snapshotAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.12CC: bugs, rabhat, rkavunga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1636218 Environment:
Last Closed: 2018-11-20 09:25:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1635050, 1636218    
Bug Blocks: 1636162, 1636291    

Description Raghavendra Bhat 2018-11-06 19:05:45 UTC
+++ This bug was initially created as a clone of Bug #1636218 +++

+++ This bug was initially created as a clone of Bug #1635050 +++

Description of problem:

With brick multiplexing, when a snapshot restore is done, it leads to a snapshot brick getting the client volume file for the corresponding snapshot volume instead of the correct brick volume file.


[2018-09-28 23:34:10.405953] I [glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick1/mirror
[2018-09-28 23:34:10.406009] I [glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick2/mirror

The above 2 lines indicate that it is a brick process. Because of restore, the original snapshot volume's bricks are stopped leading to a detach of the bricks happening which is logged above.


[2018-09-28 23:34:10.407276] E [rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
[2018-09-28 23:34:10.407276] E [rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
[2018-09-28 23:34:10.407299] E [rpcsvc.c:1670:rpcsvc_program_unregister] 0-rpc-service: portmap unregistration of program failed
[2018-09-28 23:34:10.407310] E [rpcsvc.c:1670:rpcsvc_program_unregister] 0-rpc-service: portmap unregistration of program failed
[2018-09-28 23:34:10.407319] E [rpcsvc.c:1720:rpcsvc_program_unregister] 0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num: 1885957735, Ver: 1, Port: 0
[2018-09-28 23:34:10.407323] E [rpcsvc.c:1720:rpcsvc_program_unregister] 0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num: 1885957735, Ver: 1, Port: 0
[2018-09-28 23:34:10.409365] I [barrier.c:665:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing all the queued fops
[2018-09-28 23:34:10.409382] I [barrier.c:665:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing all the queued fops
[2018-09-28 23:34:10.413063] I [io-stats.c:3937:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded
[2018-09-28 23:34:10.413151] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1'
[2018-09-28 23:34:10.413178] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3'
[2018-09-28 23:34:10.413182] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:3
[2018-09-28 23:34:10.413195] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:3
[2018-09-28 23:34:10.413700] I [io-stats.c:3937:fini] 0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded
[2018-09-28 23:34:10.413775] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1'
[2018-09-28 23:34:10.413786] I [rpcsvc.c:2054:rpcsvc_spawn_threads] 0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3'
[2018-09-28 23:34:10.413805] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:2
[2018-09-28 23:34:10.413807] I [rpcsvc.c:1993:rpcsvc_request_handler] 0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:2
[2018-09-28 23:34:10.923188] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2018-09-28 23:34:10.923350] I [MSGID: 101191] [event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 3
[2018-09-28 23:34:11.031917] I [MSGID: 101191] [event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 4
[2018-09-28 23:34:57.645690] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/protocol/client.so: undefined symbol: xlator_api. Fall back
 to old symbols
[2018-09-28 23:34:57.646288] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol: xlator_api. Fall ba
ck to old symbols
[2018-09-28 23:34:57.646714] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.646929] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647103] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647274] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647470] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647817] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.648021] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649370] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649654] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649858] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650027] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650192] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650381] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650716] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650914] I [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.652878] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.address-family' is not recognized
[2018-09-28 23:34:57.652911] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.tcp-user-timeout' is not recognized
[2018-09-28 23:34:57.652919] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.socket.keepalive-time' is not recognized
[2018-09-28 23:34:57.652927] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.socket.keepalive-interval' is not recognized
[2018-09-28 23:34:57.652934] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option 'transport.socket.keepalive-count' is not recognized
[2018-09-28 23:34:57.652946] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.address-family' is not recognized
[2018-09-28 23:34:57.652954] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.tcp-user-timeout' is not recognized
[2018-09-28 23:34:57.652962] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.socket.keepalive-time' is not recognized
[2018-09-28 23:34:57.652969] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.socket.keepalive-interval' is not recognized
[2018-09-28 23:34:57.652976] W [MSGID: 101174] [graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option 'transport.socket.keepalive-count' is not recognized
[2018-09-28 23:34:57.683329] I [MSGID: 114020] [client.c:2354:notify] 7-mirror-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume mirror-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host workspace
  5:     option remote-subvolume /run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick1/mirror
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab
  9:     option password 01f72c35-9b3a-47b3-852b-95bdb481e66a
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option send-gids true
 15: end-volume
 16:  
 17: volume mirror-client-1
 18:     type protocol/client
 19:     option ping-timeout 42
 20:     option remote-host workspace
 21:     option remote-subvolume /run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick2/mirror
 22:     option transport-type socket
 23:     option transport.address-family inet
 24:     option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab
 25:     option password 01f72c35-9b3a-47b3-852b-95bdb481e66a
 26:     option transport.tcp-user-timeout 0
 27:     option transport.socket.keepalive-time 20
 28:     option transport.socket.keepalive-interval 2
 29:     option transport.socket.keepalive-count 9
 30:     option send-gids true
 31: end-volume
 32:  
 33: volume 31717a7458a543dca4bffc8e6b1017cc-replicate-0
 34:     type cluster/replicate
 35:     option afr-pending-xattr mirror-client-0,mirror-client-1
 36:     option use-compound-fops off
 37:     subvolumes mirror-client-0 mirror-client-1
 38: end-volume
 39:
 40: volume 31717a7458a543dca4bffc8e6b1017cc-dht
 41:     type cluster/distribute
 42:     option lock-migration off
 43:     option force-migration off
 44:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-replicate-0
 45: end-volume
 46:  
 47: volume 31717a7458a543dca4bffc8e6b1017cc-read-only
 48:     type features/read-only
 49:     option read-only on
 50:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-dht
 51: end-volume
 52:  
 53: volume 31717a7458a543dca4bffc8e6b1017cc-write-behind
 54:     type performance/write-behind
 55:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-only
 56: end-volume
 57:  
 58: volume 31717a7458a543dca4bffc8e6b1017cc-read-ahead
 59:     type performance/read-ahead
 60:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-write-behind
 61: end-volume
 62:  
 63: volume 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead
 64:     type performance/readdir-ahead
 65:     option parallel-readdir off
 66:     option rda-request-size 131072
 67:     option rda-cache-limit 10MB
 68:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-ahead
 69: end-volume
 70:  
 71: volume 31717a7458a543dca4bffc8e6b1017cc-io-cache
 72:     type performance/io-cache
 73:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead
 74: end-volume
 75:  
 76: volume 31717a7458a543dca4bffc8e6b1017cc-quick-read
 77:     type performance/quick-read
 78:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-io-cache
 79: end-volume
 80:  
 81: volume 31717a7458a543dca4bffc8e6b1017cc-open-behind
 82:     type performance/open-behind
 83:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-quick-read
 84: end-volume
 85:  
 86: volume 31717a7458a543dca4bffc8e6b1017cc-md-cache
 87:     type performance/md-cache
 88:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-open-behind
 89: end-volume
 90:
 91: volume 31717a7458a543dca4bffc8e6b1017cc
 92:     type debug/io-stats
 93:     option log-level INFO
 94:     option latency-measurement off
 95:     option count-fop-hits off
 96:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-md-cache
 97: end-volume
 98:  
+------------------------------------------------------------------------------+


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create couple of snapshots for a gluster volume
2. Then do a snapshot restore
3.

Actual results:
Client volfile is sent to the snapshot brick.

Expected results:
Actual brick volfile should be sent.

Additional info:

--- Additional comment from Worker Ant on 2018-10-01 18:03:32 EDT ---

REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the volfile) posted (#1) for review on master by Raghavendra Bhat

--- Additional comment from Mohammed Rafi KC on 2018-10-03 07:04:01 EDT ---

I tried to reproduce the issue. I wasn't able to hit with the given steps to reproduce. Am I missing anything here?

--- Additional comment from Raghavendra Bhat on 2018-10-03 08:47:30 EDT ---


Rafi, once you do snapshot restore, please check the log file of the multiplexed snapshot brick. It prints the client volfile. IMO it should not happen. In fact, due to another memory corruption bug, the brick process crashed after it got the client volfile (the corruption bug is in the client stack).

--- Additional comment from Worker Ant on 2018-10-04 01:03:07 EDT ---

REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the volfile) posted (#4) for review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2018-10-04 14:33:01 EDT ---

REVIEW: https://review.gluster.org/21348 (mgmt/glusterd: use proper path to the volfile) posted (#1) for review on release-4.1 by Raghavendra Bhat

--- Additional comment from Worker Ant on 2018-10-05 10:26:07 EDT ---

COMMIT: https://review.gluster.org/21348 committed in release-4.1 by "Raghavendra Bhat" <raghavendra> with a commit message- mgmt/glusterd: use proper path to the volfile

Till now, glusterd was generating the volfile path for the snapshot
volume's bricks like this.

/snaps/<snap name>/<brick volfile>

But in reality, the path to the brick volfile for a snapshot volume is

/snaps/<snap name>/<snap volume name>/<brick volfile>

The above workaround was used to distinguish between a mount command used
to mount the snapshot volume, and a brick of the snapshot volume, so that
based on what is actually happening, glusterd can return the proper volfile
(client volfile for the former and the brick volfile for the latter). But,
this was causing problems for snapshot restore when brick multiplexing is
enabled. Because, with brick multiplexing, it tries to find the volfile
and sends GETSPEC rpc call to glusterd using the 2nd style of path i.e.

/snaps/<snap name>/<snap volume name>/<brick volfile>

So, when the snapshot brick (which is multiplexed) sends a GETSPEC rpc
request to glusterd for obtaining the brick volume file, glusterd was
returning the client volume file of the snapshot volume instead of the
brick volume file.

Change-Id: I28b2dfa5d9b379fe943db92c2fdfea879a6a594e
fixes: bz#1636218
Signed-off-by: Raghavendra Bhat <raghavendra>

Comment 1 Worker Ant 2018-11-06 19:08:52 UTC
REVIEW: https://review.gluster.org/21580 (mgmt/glusterd: use proper path to the volfile) posted (#1) for review on release-3.12 by Raghavendra Bhat

Comment 2 Mohammed Rafi KC 2018-11-20 09:25:15 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

This bag has been fixed in gluster 4.1.6 and gluster5 series