865693 – volume information is out of sync in the cluster

Bug 865693 - volume information is out of sync in the cluster

Summary: volume information is out of sync in the cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	krishnan parthasarathi
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	865406 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-12 07:21 UTC by Rahul Hinduja
Modified:	2015-11-03 23:04 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0qa6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:39:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreports and /var/lib/glusterd directory files (19.90 MB, application/x-gzip) 2012-10-12 07:31 UTC, Rahul Hinduja	no flags	Details
View All

Description Rahul Hinduja 2012-10-12 07:21:13 UTC

Description of problem:

volume information is out of sync in the cluster.

Setup: Cluster formed from four nodes

rhs-client6.lab.eng.blr.redhat.com
rhs-client7.lab.eng.blr.redhat.com
rhs-client8.lab.eng.blr.redhat.com
rhs-client9.lab.eng.blr.redhat.com

"gluster volume info <volume-name>" on rhs-client6,rhs-client8,and rhs-client9 shows information  of <volume-name>. But rhs-client7 volume doesn't exist.

Note: Volume was deleted from the cluster, but client-6,client-8 and client-9 are not updated of the deletion of volume.

Version-Release number of selected component (if applicable):
=============================================================
[10/12/12 - 12:39:51 root@rhs-client6 ~]# gluster --version 
glusterfs 3.3.0rhsvirt1 built on Oct  8 2012 15:23:00
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[10/12/12 - 12:39:54 root@rhs-client6 ~]# 


[10/12/12 - 12:39:28 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


Actual results:
===============

Client-6:
=========

[10/12/12 - 12:42:01 root@rhs-client6 ~]# gluster volume info replicate-rhevh
 
Volume Name: replicate-rhevh
Type: Replicate
Volume ID: 89b7b672-63c3-41c9-a4db-6939e3f20f3c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: rhs-client6.lab.eng.blr.redhat.com:/disk2
Brick2: rhs-client7.lab.eng.blr.redhat.com:/disk2

[10/12/12 - 12:42:14 root@rhs-client6 ~]# gluster volume info | grep replicate-rhevh
Volume Name: replicate-rhevh
Volume Name: replicate-rhevh2


[10/12/12 - 12:43:22 root@rhs-client6 ~]# gluster volume status replicate-rhevh
Volume replicate-rhevh does not exist


[10/12/12 - 12:50:52 root@rhs-client6 tmp]# ps -eaf | grep glusterfsd | grep replicate
root     18307     1  0 12:05 ?        00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id replicate-rhevh.rhs-client6.lab.eng.blr.redhat.com.disk2 -p /var/lib/glusterd/vols/replicate-rhevh/run/rhs-client6.lab.eng.blr.redhat.com-disk2.pid -S /tmp/e8a7646941763241021808ad1c937947.socket --brick-name /disk2 -l /var/log/glusterfs/bricks/disk2.log --xlator-option *-posix.glusterd-uuid=9a3167f5-6050-4291-bdc5-96be6ee740c4 --brick-port 24014 --xlator-option replicate-rhevh-server.listen-port=24014
root     18313     1  0 12:05 ?        00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id replicate-rhevh2.rhs-client6.lab.eng.blr.redhat.com.replicate-disk -p /var/lib/glusterd/vols/replicate-rhevh2/run/rhs-client6.lab.eng.blr.redhat.com-replicate-disk.pid -S /tmp/37182cbe060e1cee60a593871c8ad75c.socket --brick-name /replicate-disk -l /var/log/glusterfs/bricks/replicate-disk.log --xlator-option *-posix.glusterd-uuid=9a3167f5-6050-4291-bdc5-96be6ee740c4 --brick-port 24011 --xlator-option replicate-rhevh2-server.listen-port=24011
[10/12/12 - 12:50:56 root@rhs-client6 tmp]# 




Client-7
========

[10/12/12 - 12:41:09 root@rhs-client7 ~]# gluster volume info replicate-rhevh
Volume replicate-rhevh does not exist

[10/12/12 - 12:41:22 root@rhs-client7 ~]# gluster volume info | grep replicate-rhevh
Volume Name: replicate-rhevh2

[10/12/12 - 12:42:34 root@rhs-client7 ~]# gluster volume status replicate-rhevh
Volume replicate-rhevh does not exist


[10/12/12 - 12:50:00 root@rhs-client7 tar]# ps -eaf | grep glusterfsd | grep replicate
root     24995     1  0 12:05 ?        00:00:02 /usr/sbin/glusterfsd -s localhost --volfile-id replicate-rhevh2.rhs-client7.lab.eng.blr.redhat.com.replicate-disk -p /var/lib/glusterd/vols/replicate-rhevh2/run/rhs-client7.lab.eng.blr.redhat.com-replicate-disk.pid -S /tmp/34ce168cca1ffd0f64c69b974431b3a4.socket --brick-name /replicate-disk -l /var/log/glusterfs/bricks/replicate-disk.log --xlator-option *-posix.glusterd-uuid=b9d6cb21-051f-4791-9476-734856e77fbf --brick-port 24013 --xlator-option replicate-rhevh2-server.listen-port=24013
[10/12/12 - 12:50:03 root@rhs-client7 tar]# 


Note: Client-8 and client-9 have the same information as of client-6

Comment 1 Rahul Hinduja 2012-10-12 07:31:59 UTC

Created attachment 625839 [details]
sosreports and /var/lib/glusterd directory files

Comment 3 Amar Tumballi 2012-10-15 07:44:01 UTC

general behavior/bug in glusterd. Not very specific to 2.0+

Comment 4 Amar Tumballi 2012-10-19 04:56:38 UTC

*** Bug 865406 has been marked as a duplicate of this bug. ***

Comment 5 Amar Tumballi 2012-11-29 10:13:15 UTC

http://review.gluster.org/4188

Comment 6 Amar Tumballi 2012-12-26 09:13:50 UTC

marking ON_QA for rhs-2.1.0 flag. Let us know which update of RHS.2.0.z do we need this fix.

Comment 7 Gowrishankar Rajaiyan 2013-04-16 11:46:52 UTC

Updating summary since this is a general bug.

Comment 8 spandura 2013-08-26 08:59:54 UTC

Verified the fix on the build:
==============================
glusterfs 3.4.0.22rhs built on Aug 23 2013 01:58:42

"volume sync command scenarios" 
========================================
a. gluster volume sync <hostname> 
b. gluster volume sync <hostname> all 
c. gluster volume sync <hostname> <volume_name>
d. gluster volume sync <localhost> 

Following cases tests the above 4 scenarios. 

==============================================================================
Case1 : ( 1 x 2 replicate volumes, 2 Storage nodes )
==============================================================================
1. start glusterd on both the storage nodes. 

2. peer probe storage_node2 ( from storage_node1 )

3. On storage_node1 execute:

for i in `seq 1 5`; do 
    gluster v create "vol_rep_$i" replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node2:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set "vol_rep_$i" self-heal-daemon off
    gluster v info "vol_rep_$i"
    gluster v start "vol_rep_$i"
done

4. killall glusterfsd glusterd glusterfs on storage_node2. 

5. On storage_node1 execute: 

rm -rf /rhs/bricks/*
for i in `seq 1 5`; do 
    gluster v stop vol_rep_${i} --mode=script
    gluster v delete vol_rep_${i} --mode=script
    gluster v create vol_rep_${i} replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node1:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set vol_rep_${i} self-heal-daemon on
    gluster v info vol_rep_${i}
    gluster v start vol_rep_${i}
done

6. restart glusterd on storage_node2 (service glusterd start)

7. check the peer status from both the nodes. ( Both the nodes will be in "Peer Rejected" state for each other ). 

8. From storage_node2 execute : 
+++++++++++++++++++++++++++++++++++
a). "gluster volume sync <storage_node1> vol_rep_1"
   
Expected: "volume 'vol_rep_1' information should be synced from storage_node1 to storage_node2."

Actual : as expected
 
b. "gluster volume sync <storage_node1> all"

Expected: "All volumes information should be synced from storage_node1 to storage_node2."

Actual : as expected

c. "gluster volume sync <storage_node2>"

Expected: "volume sync: failed: sync from localhost not allowed "

Actual : as expected

==============================================================================
Case2 : ( 1 x 2 replicate volumes, 2 Storage nodes )
==============================================================================
1. start glusterd on both the storage nodes. 

2. peer probe storage_node2 ( from storage_node1 )

3. On storage_node1 execute:

for i in `seq 1 5`; do 
    gluster v create "vol_rep_$i" replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node2:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set "vol_rep_$i" self-heal-daemon off
    gluster v info "vol_rep_$i"
    gluster v start "vol_rep_$i"
done

4. killall glusterfsd glusterd glusterfs on storage_node2. 

5. On storage_node1 execute: 

rm -rf /rhs/bricks/*
for i in `seq 1 5`; do 
    gluster v stop vol_rep_${i} --mode=script
    gluster v delete vol_rep_${i} --mode=script
    gluster v create vol_rep_${i} replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node1:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set vol_rep_${i} self-heal-daemon on
    gluster v info vol_rep_${i}
    gluster v start vol_rep_${i}
done

6. restart glusterd on storage_node2 (service glusterd start)

7. check the peer status from both the nodes. ( Both the nodes will be in "Peer Rejected" state for each other ). 

8. From storage_node1 execute : 
++++++++++++++++++++++++++++++++++++
a. "gluster volume sync <storage_node2> vol_rep_1"
   
Expected:"volume vol_rep_1 information should be synced from storage_node2 to storage_node1.
Actual : as expected
 
b. "gluster volume sync <storage_node2>"

Expected: "All volumes information should be synced from storage_node2 to storage_node1."
Actual : as expected

==============================================================================
Note:
==============================================================================
The above 2 cases verifies this bug. 

However the Peers remains in "Peer Rejected" state and the volumes are not restarted. This issue is tracked in the bug : https://bugzilla.redhat.com/show_bug.cgi?id=865700

Moving this bug from ON_QA to Verified state.

Comment 9 Scott Haines 2013-09-23 22:39:16 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:42 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.