Description of problem: ------------------------ "gluster volume sync <hostname> <volume_name>" doesn't sync the volume information to the hosts in peer. For all the sync command options the output is always "please delete all the volume before full sync" . Deleting all volumes is not ideal at all. [10/12/12 - 13:02:19 root@rhs-client6 ~]# gluster volume sync client-6 please delete all the volumes before full sync [10/12/12 - 13:02:42 root@rhs-client6 ~]# gluster volume sync client-6 all please delete all the volumes before full sync [10/12/12 - 13:02:44 root@rhs-client6 ~]# gluster volume sync client-6 replicate-rhevh please delete the volume: replicate-rhevh before sync Version-Release number of selected component (if applicable): ------------------------------------------------------------- [10/12/12 - 13:17:17 root@rhs-client6 ~]# rpm -qa | grep gluster glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch [10/12/12 - 13:17:24 root@rhs-client6 ~]# gluster --version glusterfs 3.3.0rhsvirt1 built on Oct 8 2012 15:23:00 How reproducible: ---------------- Often Additional Info:- ---------------- Refer to Bug865693 . The bug 865693 needs volume sync and we are unable to perform sync operation.
this is a behavior of glusterd in general, and not very specific to 2.0+ related testing alone.
Submitted patch at http://review.gluster.org/4188
keeping it in POST for indicating that the patch is in review process.
CHANGE: http://review.gluster.org/4188 (glusterd: volume-sync shouldn't validate volume-id) merged in master by Vijay Bellur (vbellur)
The issue is still seen. This has come about since the following commit - http://review.gluster.com/4570.
CHANGE: http://review.gluster.org/4624 (glusterd: Fixed volume-sync in synctask codepath.) merged in master by Vijay Bellur (vbellur)
Updating summary since this is a general bug.
Workflow for using volume sync, when a volume configuration is gone out of sync in 2 nodes. Let the two nodes in the cluster be called Node1 and Node2. Let us assume Node2 has the 'correct' volume configuration. This is similar to picking the correct copy of data in a split-brain scenario. Administrator's discretion is required. Node2: 1) gluster peer detach Node1 force Node1: 2) Check if this node is detached from the cluster using #gluster peer status It should return, "No peers present" 3) Stop glusterd on Node1 #service glusterd stop 4) rm -rf /var/lib/glusterd/vols/VOLNAME 5)Start glusterd on Node1 #service glusterd start Node2: 6) Now, probe Node1 back into the cluster. #gluster peer probe Node1
The steps in the comment 9 will however sync the volume since we are detaching the peer and re-attaching the peer . 1. when "gluster peer detach <node> force" is executed, the /var/lib/glusterd/vols directory on <node> are cleaned up. 2. when we do peer probe <node>, the volumes are however synced to <node>. With the above steps we need not have to execute "gluster volume sync" command.
Verified the fix on build: ~~~~~~~~~~~~~~~~~~~~~~~~~ root@king [Jul-11-2013-16:37:38] >rpm -qa | grep glusterfs-server glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64 root@king [Jul-11-2013-16:37:44] >gluster --version glusterfs 3.4.0.12rhs.beta3 built on Jul 6 2013 14:35:18 Steps used to verify: ====================== 1. Create 2 2x2 distribute-replicate volume ( 4 storage nodes : node1, node2, node3 and node4 ) 2. Stop glusterd's on node1 and node3. 3. set any volume option for both the volumes 4. Stop glusterd's on node2 and node4. 5. Start glusterd's on node1 and node3. 6. Set any volume option for both the volumes. 7. Start glusterd's on node2 and node4. 8. execute : "gluster peer status" Result: ======= node1 and node3 are in "Peer Rejected" state for node2 and node4. node2 and node4 are in "Peer Rejected" state for node1 and node3. 9. On node1 execute : "gluster volume sync <node2> vol1" , "gluster volume sync <node2> vol2" . This is successful. 10. "gluster volume info" on node1 now has the synced volume information. The volume information on node1 is same as volume information on node2 and node4. Actual Result: =============== Even though the volume information has been synced, the node1 is still in "Peer Rejected" state for Node2 and Node4. Hence the "gluster volume status" on node2 and node4 doesn't recognize the brick process on node1. Additional Info: =============== Restarting glusterd on node1 will move node1 to "Peer in Cluster" state for node2 and node4. But volume sync command execution itself doesn't move the node from "Peer Rejected" state to "Peer in Cluster" state even after successful sync.
Created attachment 772180 [details] history of command execution on nodes History of command execution on nodes
The bug still exist and hence moving it to assigned state.
Dev ack to 3.0 RHS BZs
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html