865700 – "gluster volume sync" command not working as expected

Bug 865700 - "gluster volume sync" command not working as expected

Summary: "gluster volume sync" command not working as expected

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:	glusterd
Depends On:	950048
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-12 07:49 UTC by spandura
Modified:	2015-05-14 03:26 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0.1rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	950048 (view as bug list)
Environment:
Last Closed:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
history of command execution on nodes (4.44 KB, application/x-gzip) 2013-07-11 11:27 UTC, spandura	no flags	Details
View All

Description spandura 2012-10-12 07:49:37 UTC

Description of problem:
------------------------
"gluster volume sync <hostname> <volume_name>" doesn't sync the volume information to the hosts in peer. 

For all the sync command options the output is always "please delete all the volume before full sync" . Deleting all volumes is not ideal at all. 

[10/12/12 - 13:02:19 root@rhs-client6 ~]# gluster volume sync client-6 
please delete all the volumes before full sync

[10/12/12 - 13:02:42 root@rhs-client6 ~]# gluster volume sync client-6 all
please delete all the volumes before full sync

[10/12/12 - 13:02:44 root@rhs-client6 ~]# gluster volume sync client-6 replicate-rhevh
please delete the volume: replicate-rhevh before sync


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
[10/12/12 - 13:17:17 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


[10/12/12 - 13:17:24 root@rhs-client6 ~]# gluster --version
glusterfs 3.3.0rhsvirt1 built on Oct  8 2012 15:23:00


How reproducible:
----------------
Often

Additional Info:-
----------------
Refer to Bug865693 . The bug 865693 needs volume sync and we are unable to perform sync operation.

Comment 2 Amar Tumballi 2012-10-15 07:43:25 UTC

this is a behavior of glusterd in general, and not very specific to 2.0+ related testing alone.

Comment 3 krishnan parthasarathi 2012-11-15 06:22:45 UTC

Submitted patch at http://review.gluster.org/4188

Comment 4 Amar Tumballi 2012-11-28 03:07:22 UTC

keeping it in POST for indicating that the patch is in review process.

Comment 5 Vijay Bellur 2012-11-28 07:28:12 UTC

CHANGE: http://review.gluster.org/4188 (glusterd: volume-sync shouldn't validate volume-id) merged in master by Vijay Bellur (vbellur)

Comment 6 krishnan parthasarathi 2013-02-27 09:35:27 UTC

The issue is still seen. This has come about since the following commit - http://review.gluster.com/4570.

Comment 7 Vijay Bellur 2013-03-11 04:02:26 UTC

CHANGE: http://review.gluster.org/4624 (glusterd: Fixed volume-sync in synctask codepath.) merged in master by Vijay Bellur (vbellur)

Comment 8 Gowrishankar Rajaiyan 2013-04-16 11:52:42 UTC

Updating summary since this is a general bug.

Comment 9 krishnan parthasarathi 2013-07-11 05:45:14 UTC

Workflow for using volume sync, when a volume configuration is gone out of sync in 2 nodes.

Let the two nodes in the cluster be called Node1 and Node2.
Let us assume Node2 has the 'correct' volume configuration. This is similar to picking the correct copy of data in a split-brain scenario. Administrator's discretion is required.

Node2:
1) gluster peer detach Node1 force

Node1:
2) Check if this node is detached from the cluster using
 #gluster peer status
It should return, "No peers present"

3) Stop glusterd on Node1
 #service glusterd stop

4) rm -rf /var/lib/glusterd/vols/VOLNAME

5)Start glusterd on Node1
 #service glusterd start

Node2:
6) Now, probe Node1 back into the cluster.
 #gluster peer probe Node1

Comment 10 spandura 2013-07-11 07:26:34 UTC

The steps in the comment 9 will however sync the volume since we are detaching the peer and re-attaching the peer . 

1. when "gluster peer detach <node> force" is executed, the /var/lib/glusterd/vols directory on <node> are cleaned up. 

2. when we do peer probe <node>, the volumes are  however synced to <node>. 

With the above steps we need not have to execute "gluster volume sync" command.

Comment 11 spandura 2013-07-11 11:25:08 UTC

Verified the fix on build: 
~~~~~~~~~~~~~~~~~~~~~~~~~
root@king [Jul-11-2013-16:37:38] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64

root@king [Jul-11-2013-16:37:44] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18

Steps used to verify:
======================
1. Create 2 2x2 distribute-replicate volume ( 4 storage nodes : node1, node2, node3 and node4 )

2. Stop glusterd's on node1 and node3. 

3. set any volume option for both the volumes

4. Stop glusterd's  on node2 and node4. 

5. Start glusterd's on node1 and node3. 

6. Set any volume option for both the volumes. 

7. Start glusterd's on node2 and node4. 

8. execute : "gluster peer status"

Result:
=======
node1 and node3 are in "Peer Rejected" state for node2 and node4. 

node2 and node4 are in "Peer Rejected" state for node1 and node3. 

9. On node1 execute : "gluster volume sync <node2> vol1" , "gluster volume sync <node2> vol2"  . This is successful.

10. "gluster volume info" on node1 now has the synced volume information. The volume information on node1 is same as volume information on node2 and node4. 

Actual Result:
===============
Even though the volume information has been synced, the node1 is still in "Peer Rejected" state for Node2 and Node4.

Hence the "gluster volume status" on node2 and node4 doesn't recognize the brick process on node1. 

Additional Info:
=============== 
Restarting glusterd on node1 will move node1 to "Peer in Cluster" state for node2 and node4. 

But volume sync command execution itself doesn't move the node from "Peer Rejected" state to "Peer in Cluster" state even after successful sync.

Comment 12 spandura 2013-07-11 11:27:26 UTC

Created attachment 772180 [details]
history of command execution on nodes

History of command execution on nodes

Comment 13 spandura 2013-07-11 11:28:08 UTC

The bug still exist and hence moving it to assigned state.

Comment 14 Nagaprasad Sathyanarayana 2014-05-06 11:43:39 UTC

Dev ack to 3.0 RHS BZs

Comment 16 Vivek Agarwal 2015-03-23 07:39:31 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Comment 17 Vivek Agarwal 2015-03-23 07:40:19 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Note You need to log in before you can comment on or make changes to this bug.