Bug 865700

Summary: "gluster volume sync" command not working as expected
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: spandura
Severity: high Docs Contact:
Priority: high    
Version: 2.0CC: grajaiya, jdarcy, nsathyan, rhs-bugs, rwheeler, sasundar, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: glusterd
Fixed In Version: glusterfs-3.4.0.1rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 950048 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 950048    
Bug Blocks:    
Attachments:
Description Flags
history of command execution on nodes none

Description spandura 2012-10-12 07:49:37 UTC
Description of problem:
------------------------
"gluster volume sync <hostname> <volume_name>" doesn't sync the volume information to the hosts in peer. 

For all the sync command options the output is always "please delete all the volume before full sync" . Deleting all volumes is not ideal at all. 

[10/12/12 - 13:02:19 root@rhs-client6 ~]# gluster volume sync client-6 
please delete all the volumes before full sync

[10/12/12 - 13:02:42 root@rhs-client6 ~]# gluster volume sync client-6 all
please delete all the volumes before full sync

[10/12/12 - 13:02:44 root@rhs-client6 ~]# gluster volume sync client-6 replicate-rhevh
please delete the volume: replicate-rhevh before sync


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
[10/12/12 - 13:17:17 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


[10/12/12 - 13:17:24 root@rhs-client6 ~]# gluster --version
glusterfs 3.3.0rhsvirt1 built on Oct  8 2012 15:23:00


How reproducible:
----------------
Often

Additional Info:-
----------------
Refer to Bug865693 . The bug 865693 needs volume sync and we are unable to perform sync operation.

Comment 2 Amar Tumballi 2012-10-15 07:43:25 UTC
this is a behavior of glusterd in general, and not very specific to 2.0+ related testing alone.

Comment 3 krishnan parthasarathi 2012-11-15 06:22:45 UTC
Submitted patch at http://review.gluster.org/4188

Comment 4 Amar Tumballi 2012-11-28 03:07:22 UTC
keeping it in POST for indicating that the patch is in review process.

Comment 5 Vijay Bellur 2012-11-28 07:28:12 UTC
CHANGE: http://review.gluster.org/4188 (glusterd: volume-sync shouldn't validate volume-id) merged in master by Vijay Bellur (vbellur)

Comment 6 krishnan parthasarathi 2013-02-27 09:35:27 UTC
The issue is still seen. This has come about since the following commit - http://review.gluster.com/4570.

Comment 7 Vijay Bellur 2013-03-11 04:02:26 UTC
CHANGE: http://review.gluster.org/4624 (glusterd: Fixed volume-sync in synctask codepath.) merged in master by Vijay Bellur (vbellur)

Comment 8 Gowrishankar Rajaiyan 2013-04-16 11:52:42 UTC
Updating summary since this is a general bug.

Comment 9 krishnan parthasarathi 2013-07-11 05:45:14 UTC
Workflow for using volume sync, when a volume configuration is gone out of sync in 2 nodes.

Let the two nodes in the cluster be called Node1 and Node2.
Let us assume Node2 has the 'correct' volume configuration. This is similar to picking the correct copy of data in a split-brain scenario. Administrator's discretion is required.

Node2:
1) gluster peer detach Node1 force

Node1:
2) Check if this node is detached from the cluster using
 #gluster peer status
It should return, "No peers present"

3) Stop glusterd on Node1
 #service glusterd stop

4) rm -rf /var/lib/glusterd/vols/VOLNAME

5)Start glusterd on Node1
 #service glusterd start

Node2:
6) Now, probe Node1 back into the cluster.
 #gluster peer probe Node1

Comment 10 spandura 2013-07-11 07:26:34 UTC
The steps in the comment 9 will however sync the volume since we are detaching the peer and re-attaching the peer . 

1. when "gluster peer detach <node> force" is executed, the /var/lib/glusterd/vols directory on <node> are cleaned up. 

2. when we do peer probe <node>, the volumes are  however synced to <node>. 

With the above steps we need not have to execute "gluster volume sync" command.

Comment 11 spandura 2013-07-11 11:25:08 UTC
Verified the fix on build: 
~~~~~~~~~~~~~~~~~~~~~~~~~
root@king [Jul-11-2013-16:37:38] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64

root@king [Jul-11-2013-16:37:44] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18

Steps used to verify:
======================
1. Create 2 2x2 distribute-replicate volume ( 4 storage nodes : node1, node2, node3 and node4 )

2. Stop glusterd's on node1 and node3. 

3. set any volume option for both the volumes

4. Stop glusterd's  on node2 and node4. 

5. Start glusterd's on node1 and node3. 

6. Set any volume option for both the volumes. 

7. Start glusterd's on node2 and node4. 

8. execute : "gluster peer status"

Result:
=======
node1 and node3 are in "Peer Rejected" state for node2 and node4. 

node2 and node4 are in "Peer Rejected" state for node1 and node3. 

9. On node1 execute : "gluster volume sync <node2> vol1" , "gluster volume sync <node2> vol2"  . This is successful.

10. "gluster volume info" on node1 now has the synced volume information. The volume information on node1 is same as volume information on node2 and node4. 

Actual Result:
===============
Even though the volume information has been synced, the node1 is still in "Peer Rejected" state for Node2 and Node4.

Hence the "gluster volume status" on node2 and node4 doesn't recognize the brick process on node1. 

Additional Info:
=============== 
Restarting glusterd on node1 will move node1 to "Peer in Cluster" state for node2 and node4. 

But volume sync command execution itself doesn't move the node from "Peer Rejected" state to "Peer in Cluster" state even after successful sync.

Comment 12 spandura 2013-07-11 11:27:26 UTC
Created attachment 772180 [details]
history of command execution on nodes

History of command execution on nodes

Comment 13 spandura 2013-07-11 11:28:08 UTC
The bug still exist and hence moving it to assigned state.

Comment 14 Nagaprasad Sathyanarayana 2014-05-06 11:43:39 UTC
Dev ack to 3.0 RHS BZs

Comment 16 Vivek Agarwal 2015-03-23 07:39:31 UTC
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Comment 17 Vivek Agarwal 2015-03-23 07:40:19 UTC
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html