Bug 865700 - "gluster volume sync" command not working as expected
"gluster volume sync" command not working as expected
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
spandura
glusterd
:
Depends On: 950048
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-12 03:49 EDT by spandura
Modified: 2015-05-13 23:26 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.1rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 950048 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
history of command execution on nodes (4.44 KB, application/x-gzip)
2013-07-11 07:27 EDT, spandura
no flags Details

  None (edit)
Description spandura 2012-10-12 03:49:37 EDT
Description of problem:
------------------------
"gluster volume sync <hostname> <volume_name>" doesn't sync the volume information to the hosts in peer. 

For all the sync command options the output is always "please delete all the volume before full sync" . Deleting all volumes is not ideal at all. 

[10/12/12 - 13:02:19 root@rhs-client6 ~]# gluster volume sync client-6 
please delete all the volumes before full sync

[10/12/12 - 13:02:42 root@rhs-client6 ~]# gluster volume sync client-6 all
please delete all the volumes before full sync

[10/12/12 - 13:02:44 root@rhs-client6 ~]# gluster volume sync client-6 replicate-rhevh
please delete the volume: replicate-rhevh before sync


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
[10/12/12 - 13:17:17 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


[10/12/12 - 13:17:24 root@rhs-client6 ~]# gluster --version
glusterfs 3.3.0rhsvirt1 built on Oct  8 2012 15:23:00


How reproducible:
----------------
Often

Additional Info:-
----------------
Refer to Bug865693 . The bug 865693 needs volume sync and we are unable to perform sync operation.
Comment 2 Amar Tumballi 2012-10-15 03:43:25 EDT
this is a behavior of glusterd in general, and not very specific to 2.0+ related testing alone.
Comment 3 krishnan parthasarathi 2012-11-15 01:22:45 EST
Submitted patch at http://review.gluster.org/4188
Comment 4 Amar Tumballi 2012-11-27 22:07:22 EST
keeping it in POST for indicating that the patch is in review process.
Comment 5 Vijay Bellur 2012-11-28 02:28:12 EST
CHANGE: http://review.gluster.org/4188 (glusterd: volume-sync shouldn't validate volume-id) merged in master by Vijay Bellur (vbellur@redhat.com)
Comment 6 krishnan parthasarathi 2013-02-27 04:35:27 EST
The issue is still seen. This has come about since the following commit - http://review.gluster.com/4570.
Comment 7 Vijay Bellur 2013-03-11 00:02:26 EDT
CHANGE: http://review.gluster.org/4624 (glusterd: Fixed volume-sync in synctask codepath.) merged in master by Vijay Bellur (vbellur@redhat.com)
Comment 8 Gowrishankar Rajaiyan 2013-04-16 07:52:42 EDT
Updating summary since this is a general bug.
Comment 9 krishnan parthasarathi 2013-07-11 01:45:14 EDT
Workflow for using volume sync, when a volume configuration is gone out of sync in 2 nodes.

Let the two nodes in the cluster be called Node1 and Node2.
Let us assume Node2 has the 'correct' volume configuration. This is similar to picking the correct copy of data in a split-brain scenario. Administrator's discretion is required.

Node2:
1) gluster peer detach Node1 force

Node1:
2) Check if this node is detached from the cluster using
 #gluster peer status
It should return, "No peers present"

3) Stop glusterd on Node1
 #service glusterd stop

4) rm -rf /var/lib/glusterd/vols/VOLNAME

5)Start glusterd on Node1
 #service glusterd start

Node2:
6) Now, probe Node1 back into the cluster.
 #gluster peer probe Node1
Comment 10 spandura 2013-07-11 03:26:34 EDT
The steps in the comment 9 will however sync the volume since we are detaching the peer and re-attaching the peer . 

1. when "gluster peer detach <node> force" is executed, the /var/lib/glusterd/vols directory on <node> are cleaned up. 

2. when we do peer probe <node>, the volumes are  however synced to <node>. 

With the above steps we need not have to execute "gluster volume sync" command.
Comment 11 spandura 2013-07-11 07:25:08 EDT
Verified the fix on build: 
~~~~~~~~~~~~~~~~~~~~~~~~~
root@king [Jul-11-2013-16:37:38] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64

root@king [Jul-11-2013-16:37:44] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18

Steps used to verify:
======================
1. Create 2 2x2 distribute-replicate volume ( 4 storage nodes : node1, node2, node3 and node4 )

2. Stop glusterd's on node1 and node3. 

3. set any volume option for both the volumes

4. Stop glusterd's  on node2 and node4. 

5. Start glusterd's on node1 and node3. 

6. Set any volume option for both the volumes. 

7. Start glusterd's on node2 and node4. 

8. execute : "gluster peer status"

Result:
=======
node1 and node3 are in "Peer Rejected" state for node2 and node4. 

node2 and node4 are in "Peer Rejected" state for node1 and node3. 

9. On node1 execute : "gluster volume sync <node2> vol1" , "gluster volume sync <node2> vol2"  . This is successful.

10. "gluster volume info" on node1 now has the synced volume information. The volume information on node1 is same as volume information on node2 and node4. 

Actual Result:
===============
Even though the volume information has been synced, the node1 is still in "Peer Rejected" state for Node2 and Node4.

Hence the "gluster volume status" on node2 and node4 doesn't recognize the brick process on node1. 

Additional Info:
=============== 
Restarting glusterd on node1 will move node1 to "Peer in Cluster" state for node2 and node4. 

But volume sync command execution itself doesn't move the node from "Peer Rejected" state to "Peer in Cluster" state even after successful sync.
Comment 12 spandura 2013-07-11 07:27:26 EDT
Created attachment 772180 [details]
history of command execution on nodes

History of command execution on nodes
Comment 13 spandura 2013-07-11 07:28:08 EDT
The bug still exist and hence moving it to assigned state.
Comment 14 Nagaprasad Sathyanarayana 2014-05-06 07:43:39 EDT
Dev ack to 3.0 RHS BZs
Comment 16 Vivek Agarwal 2015-03-23 03:39:31 EDT
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html
Comment 17 Vivek Agarwal 2015-03-23 03:40:19 EDT
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Note You need to log in before you can comment on or make changes to this bug.