Bug 865693

Summary: volume information is out of sync in the cluster
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED ERRATA QA Contact: spandura
Severity: urgent Docs Contact:
Priority: medium    
Version: 2.0CC: amarts, grajaiya, nsathyan, rhs-bugs, shaines, spandura, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0qa6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:39:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreports and /var/lib/glusterd directory files none

Description Rahul Hinduja 2012-10-12 07:21:13 UTC
Description of problem:

volume information is out of sync in the cluster.

Setup: Cluster formed from four nodes

rhs-client6.lab.eng.blr.redhat.com
rhs-client7.lab.eng.blr.redhat.com
rhs-client8.lab.eng.blr.redhat.com
rhs-client9.lab.eng.blr.redhat.com

"gluster volume info <volume-name>" on rhs-client6,rhs-client8,and rhs-client9 shows information  of <volume-name>. But rhs-client7 volume doesn't exist.

Note: Volume was deleted from the cluster, but client-6,client-8 and client-9 are not updated of the deletion of volume.

Version-Release number of selected component (if applicable):
=============================================================
[10/12/12 - 12:39:51 root@rhs-client6 ~]# gluster --version 
glusterfs 3.3.0rhsvirt1 built on Oct  8 2012 15:23:00
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[10/12/12 - 12:39:54 root@rhs-client6 ~]# 


[10/12/12 - 12:39:28 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


Actual results:
===============

Client-6:
=========

[10/12/12 - 12:42:01 root@rhs-client6 ~]# gluster volume info replicate-rhevh
 
Volume Name: replicate-rhevh
Type: Replicate
Volume ID: 89b7b672-63c3-41c9-a4db-6939e3f20f3c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: rhs-client6.lab.eng.blr.redhat.com:/disk2
Brick2: rhs-client7.lab.eng.blr.redhat.com:/disk2

[10/12/12 - 12:42:14 root@rhs-client6 ~]# gluster volume info | grep replicate-rhevh
Volume Name: replicate-rhevh
Volume Name: replicate-rhevh2


[10/12/12 - 12:43:22 root@rhs-client6 ~]# gluster volume status replicate-rhevh
Volume replicate-rhevh does not exist


[10/12/12 - 12:50:52 root@rhs-client6 tmp]# ps -eaf | grep glusterfsd | grep replicate
root     18307     1  0 12:05 ?        00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id replicate-rhevh.rhs-client6.lab.eng.blr.redhat.com.disk2 -p /var/lib/glusterd/vols/replicate-rhevh/run/rhs-client6.lab.eng.blr.redhat.com-disk2.pid -S /tmp/e8a7646941763241021808ad1c937947.socket --brick-name /disk2 -l /var/log/glusterfs/bricks/disk2.log --xlator-option *-posix.glusterd-uuid=9a3167f5-6050-4291-bdc5-96be6ee740c4 --brick-port 24014 --xlator-option replicate-rhevh-server.listen-port=24014
root     18313     1  0 12:05 ?        00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id replicate-rhevh2.rhs-client6.lab.eng.blr.redhat.com.replicate-disk -p /var/lib/glusterd/vols/replicate-rhevh2/run/rhs-client6.lab.eng.blr.redhat.com-replicate-disk.pid -S /tmp/37182cbe060e1cee60a593871c8ad75c.socket --brick-name /replicate-disk -l /var/log/glusterfs/bricks/replicate-disk.log --xlator-option *-posix.glusterd-uuid=9a3167f5-6050-4291-bdc5-96be6ee740c4 --brick-port 24011 --xlator-option replicate-rhevh2-server.listen-port=24011
[10/12/12 - 12:50:56 root@rhs-client6 tmp]# 




Client-7
========

[10/12/12 - 12:41:09 root@rhs-client7 ~]# gluster volume info replicate-rhevh
Volume replicate-rhevh does not exist

[10/12/12 - 12:41:22 root@rhs-client7 ~]# gluster volume info | grep replicate-rhevh
Volume Name: replicate-rhevh2

[10/12/12 - 12:42:34 root@rhs-client7 ~]# gluster volume status replicate-rhevh
Volume replicate-rhevh does not exist


[10/12/12 - 12:50:00 root@rhs-client7 tar]# ps -eaf | grep glusterfsd | grep replicate
root     24995     1  0 12:05 ?        00:00:02 /usr/sbin/glusterfsd -s localhost --volfile-id replicate-rhevh2.rhs-client7.lab.eng.blr.redhat.com.replicate-disk -p /var/lib/glusterd/vols/replicate-rhevh2/run/rhs-client7.lab.eng.blr.redhat.com-replicate-disk.pid -S /tmp/34ce168cca1ffd0f64c69b974431b3a4.socket --brick-name /replicate-disk -l /var/log/glusterfs/bricks/replicate-disk.log --xlator-option *-posix.glusterd-uuid=b9d6cb21-051f-4791-9476-734856e77fbf --brick-port 24013 --xlator-option replicate-rhevh2-server.listen-port=24013
[10/12/12 - 12:50:03 root@rhs-client7 tar]# 


Note: Client-8 and client-9 have the same information as of client-6

Comment 1 Rahul Hinduja 2012-10-12 07:31:59 UTC
Created attachment 625839 [details]
sosreports and /var/lib/glusterd directory files

Comment 3 Amar Tumballi 2012-10-15 07:44:01 UTC
general behavior/bug in glusterd. Not very specific to 2.0+

Comment 4 Amar Tumballi 2012-10-19 04:56:38 UTC
*** Bug 865406 has been marked as a duplicate of this bug. ***

Comment 5 Amar Tumballi 2012-11-29 10:13:15 UTC
http://review.gluster.org/4188

Comment 6 Amar Tumballi 2012-12-26 09:13:50 UTC
marking ON_QA for rhs-2.1.0 flag. Let us know which update of RHS.2.0.z do we need this fix.

Comment 7 Gowrishankar Rajaiyan 2013-04-16 11:46:52 UTC
Updating summary since this is a general bug.

Comment 8 spandura 2013-08-26 08:59:54 UTC
Verified the fix on the build:
==============================
glusterfs 3.4.0.22rhs built on Aug 23 2013 01:58:42

"volume sync command scenarios" 
========================================
a. gluster volume sync <hostname> 
b. gluster volume sync <hostname> all 
c. gluster volume sync <hostname> <volume_name>
d. gluster volume sync <localhost> 

Following cases tests the above 4 scenarios. 

==============================================================================
Case1 : ( 1 x 2 replicate volumes, 2 Storage nodes )
==============================================================================
1. start glusterd on both the storage nodes. 

2. peer probe storage_node2 ( from storage_node1 )

3. On storage_node1 execute:

for i in `seq 1 5`; do 
    gluster v create "vol_rep_$i" replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node2:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set "vol_rep_$i" self-heal-daemon off
    gluster v info "vol_rep_$i"
    gluster v start "vol_rep_$i"
done

4. killall glusterfsd glusterd glusterfs on storage_node2. 

5. On storage_node1 execute: 

rm -rf /rhs/bricks/*
for i in `seq 1 5`; do 
    gluster v stop vol_rep_${i} --mode=script
    gluster v delete vol_rep_${i} --mode=script
    gluster v create vol_rep_${i} replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node1:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set vol_rep_${i} self-heal-daemon on
    gluster v info vol_rep_${i}
    gluster v start vol_rep_${i}
done

6. restart glusterd on storage_node2 (service glusterd start)

7. check the peer status from both the nodes. ( Both the nodes will be in "Peer Rejected" state for each other ). 

8. From storage_node2 execute : 
+++++++++++++++++++++++++++++++++++
a). "gluster volume sync <storage_node1> vol_rep_1"
   
Expected: "volume 'vol_rep_1' information should be synced from storage_node1 to storage_node2."

Actual : as expected
 
b. "gluster volume sync <storage_node1> all"

Expected: "All volumes information should be synced from storage_node1 to storage_node2."

Actual : as expected

c. "gluster volume sync <storage_node2>"

Expected: "volume sync: failed: sync from localhost not allowed "

Actual : as expected

==============================================================================
Case2 : ( 1 x 2 replicate volumes, 2 Storage nodes )
==============================================================================
1. start glusterd on both the storage nodes. 

2. peer probe storage_node2 ( from storage_node1 )

3. On storage_node1 execute:

for i in `seq 1 5`; do 
    gluster v create "vol_rep_$i" replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node2:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set "vol_rep_$i" self-heal-daemon off
    gluster v info "vol_rep_$i"
    gluster v start "vol_rep_$i"
done

4. killall glusterfsd glusterd glusterfs on storage_node2. 

5. On storage_node1 execute: 

rm -rf /rhs/bricks/*
for i in `seq 1 5`; do 
    gluster v stop vol_rep_${i} --mode=script
    gluster v delete vol_rep_${i} --mode=script
    gluster v create vol_rep_${i} replica 2 storage_node1:/rhs/bricks/vol_rep_${i}_b0 storage_node1:/rhs/bricks/vol_rep_${i}_b1 --mode=script
    gluster v set vol_rep_${i} self-heal-daemon on
    gluster v info vol_rep_${i}
    gluster v start vol_rep_${i}
done

6. restart glusterd on storage_node2 (service glusterd start)

7. check the peer status from both the nodes. ( Both the nodes will be in "Peer Rejected" state for each other ). 

8. From storage_node1 execute : 
++++++++++++++++++++++++++++++++++++
a. "gluster volume sync <storage_node2> vol_rep_1"
   
Expected:"volume vol_rep_1 information should be synced from storage_node2 to storage_node1.
Actual : as expected
 
b. "gluster volume sync <storage_node2>"

Expected: "All volumes information should be synced from storage_node2 to storage_node1."
Actual : as expected

==============================================================================
Note:
==============================================================================
The above 2 cases verifies this bug. 

However the Peers remains in "Peer Rejected" state and the volumes are not restarted. This issue is tracked in the bug : https://bugzilla.redhat.com/show_bug.cgi?id=865700

Moving this bug from ON_QA to Verified state.

Comment 9 Scott Haines 2013-09-23 22:39:16 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:42 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html