Bug 1636291
Summary: | [SNAPSHOT]: with brick multiplexing, snapshot restore will make glusterd send wrong volfile | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Atin Mukherjee <amukherj> |
Component: | snapshot | Assignee: | Raghavendra Bhat <rabhat> |
Status: | CLOSED ERRATA | QA Contact: | Upasana <ubansal> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.4 | CC: | akrishna, bugs, rabhat, rhs-bugs, rkavunga, sanandpa, sankarshan, sheggodu, storage-qa-internal, sunkumar, ubansal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.4.z Batch Update 2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.12.2-27 | Doc Type: | Bug Fix |
Doc Text: |
Previously, when brick multiplexing was enabled and a snapshot was being restored, glusterd would be sent the volume file of a client to the snapshot brick instead of sending the brick volume file. As a consequence, snapshot bricks were created based on the client volume files leading to unavailability of the snapshot data to applications. This update ensures that correct volume file is used while creating snapshot when brick multiplexing is enabled.
|
Story Points: | --- |
Clone Of: | 1636162 | Environment: | |
Last Closed: | 2018-12-17 17:07:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1635050, 1636162, 1636218, 1647165 | ||
Bug Blocks: |
Description
Atin Mukherjee
2018-10-05 02:13:25 UTC
upstream patch : https://review.gluster.org/21314 The steps mentioned in comment #6 are correct. A volume file is one which is used by gluster processes to define their behavior (as in whether it is a brick, a client etc). A typical client volume file looks like this. +------------------------------------------------------------------------------+ 1: volume mirror-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host workspace 5: option remote-subvolume /run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick1/mirror 6: option transport-type socket 7: option transport.address-family inet 8: option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab 9: option password 01f72c35-9b3a-47b3-852b-95bdb481e66a 10: option transport.tcp-user-timeout 0 11: option transport.socket.keepalive-time 20 12: option transport.socket.keepalive-interval 2 13: option transport.socket.keepalive-count 9 14: option send-gids true 15: end-volume 16: 17: volume mirror-client-1 18: type protocol/client 19: option ping-timeout 42 20: option remote-host workspace 21: option remote-subvolume /run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick2/mirror 22: option transport-type socket 23: option transport.address-family inet 24: option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab 25: option password 01f72c35-9b3a-47b3-852b-95bdb481e66a 26: option transport.tcp-user-timeout 0 27: option transport.socket.keepalive-time 20 28: option transport.socket.keepalive-interval 2 29: option transport.socket.keepalive-count 9 30: option send-gids true 31: end-volume 32: 33: volume 31717a7458a543dca4bffc8e6b1017cc-replicate-0 34: type cluster/replicate 35: option afr-pending-xattr mirror-client-0,mirror-client-1 36: option use-compound-fops off 37: subvolumes mirror-client-0 mirror-client-1 38: end-volume 39: 40: volume 31717a7458a543dca4bffc8e6b1017cc-dht 41: type cluster/distribute 42: option lock-migration off 43: option force-migration off 44: subvolumes 31717a7458a543dca4bffc8e6b1017cc-replicate-0 45: end-volume 46: 47: volume 31717a7458a543dca4bffc8e6b1017cc-read-only 48: type features/read-only 49: option read-only on 50: subvolumes 31717a7458a543dca4bffc8e6b1017cc-dht 51: end-volume 52: 53: volume 31717a7458a543dca4bffc8e6b1017cc-write-behind 54: type performance/write-behind 55: subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-only 56: end-volume 57: 58: volume 31717a7458a543dca4bffc8e6b1017cc-read-ahead 59: type performance/read-ahead 60: subvolumes 31717a7458a543dca4bffc8e6b1017cc-write-behind 61: end-volume 62: 63: volume 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead 64: type performance/readdir-ahead 65: option parallel-readdir off 66: option rda-request-size 131072 67: option rda-cache-limit 10MB 68: subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-ahead 69: end-volume 70: 71: volume 31717a7458a543dca4bffc8e6b1017cc-io-cache 72: type performance/io-cache 73: subvolumes 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead 74: end-volume 75: 76: volume 31717a7458a543dca4bffc8e6b1017cc-quick-read 77: type performance/quick-read 78: subvolumes 31717a7458a543dca4bffc8e6b1017cc-io-cache 79: end-volume 80: 81: volume 31717a7458a543dca4bffc8e6b1017cc-open-behind 82: type performance/open-behind 83: subvolumes 31717a7458a543dca4bffc8e6b1017cc-quick-read 84: end-volume 85: 86: volume 31717a7458a543dca4bffc8e6b1017cc-md-cache 87: type performance/md-cache 88: subvolumes 31717a7458a543dca4bffc8e6b1017cc-open-behind 89: end-volume 90: 91: volume 31717a7458a543dca4bffc8e6b1017cc 92: type debug/io-stats 93: option log-level INFO 94: option latency-measurement off 95: option count-fop-hits off 96: subvolumes 31717a7458a543dca4bffc8e6b1017cc-md-cache 97: end-volume 98: +------------------------------------------------------------------------------+ Let me give some information which might help you in validating. Below is the description of one of the xlators in the volume file namely protocol/client. The presence of this xlator means a the process whose logfile had this volfile printed (full volfile printed above, this is just one of the xlators described by the volfile) acted like a client. 1: volume mirror-client-0 2: type protocol/client =========> This is a client xlator. Type says it 3: option ping-timeout 42 4: option remote-host workspace 5: option remote-subvolume /run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick1/mirror 6: option transport-type socket 7: option transport.address-family inet 8: option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab 9: option password 01f72c35-9b3a-47b3-852b-95bdb481e66a 10: option transport.tcp-user-timeout 0 11: option transport.socket.keepalive-time 20 12: option transport.socket.keepalive-interval 2 13: option transport.socket.keepalive-count 9 14: option send-gids true 15: end-volume So, in the volfile you get in the log file of the snapshot brick process (there should be one snapshot brick process which multiplexes multiple snapshot bricks into one single process) you should not see a reference to a client xlator like above. Instead, you should have a description about a xlator whose type is printed as protocol/server. NOTE: protocol/server is not the only xlator that is unique to a brick process. There are several other xlators that are present in a brick process, but not in a client process. Please let me know if this is sufficient. If you need confirmation from my side as to whether the multiplexed snapshot brick process remained a brick after the restore of one of the snapshots and did not become client (it should not), you can do one of the following things. 1) Share the log file of the multiplexed snapshot brick process after your tests are done. (i.e. attach here, from all nodes) 2) Share /var/log/glusterfs directory itself here (from all nodes) If you are going to attach logs here (either log file of multiplexed snapshot brick process or /var/log/glusterfs), please clear /var/log/glusterfs from all the nodes before you run your tests. This ensures that the logs you have uploaded are only from this test and avoids confusion. Updated the doc text in doc text field. Kindly review for technical accuracy Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3827 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |