Bug 1168080 - All the bricks on one of the node goes offline and doesn't comes back up when one of the node is shutdown and the other node is rebooted in 2X2 gluster volume.
Summary: All the bricks on one of the node goes offline and doesn't comes back up when...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.5.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1164222
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-26 04:18 UTC by Poornima G
Modified: 2014-11-26 05:46 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1164222
Environment:
Last Closed: 2014-11-26 05:46:13 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Poornima G 2014-11-26 04:19:54 UTC
+++ This bug was initially created as a clone of Bug #1164222 +++

Description of problem:
****************************
On a 2 node cluster with 2X2 volume , when one node is brought down(shutdown) and the other node is rebooted,the bricks on the rebooted node goes offline and never comes back up.

Version-Release number of selected component (if applicable):
[root@rhsauto026 bricks]# rpm -qa | grep glusterfs
glusterfs-api-3.6.0.29-3.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.29-3.el6rhs.x86_64
glusterfs-libs-3.6.0.29-3.el6rhs.x86_64
glusterfs-cli-3.6.0.29-3.el6rhs.x86_64
glusterfs-rdma-3.6.0.29-3.el6rhs.x86_64
glusterfs-3.6.0.29-3.el6rhs.x86_64
glusterfs-fuse-3.6.0.29-3.el6rhs.x86_64
glusterfs-server-3.6.0.29-3.el6rhs.x86_64
samba-glusterfs-3.6.509-169.1.el6rhs.x86_64

How reproducible:
Tried twice

Steps to Reproduce:
1.create a 2X2 volume on 2 node cluster
2.shutdown node 1 , reboot node 2
3.Check volume status once the node 2 comes up

Actual results:
********************
Once the rebooted node comes up , the bricks on this node are offline.


Expected results:
***********************
Once the rebooted node comes up the brick on this node should be online.

[root@rhsauto025 /]# gluster vol status
Status of volume: gluster-vol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.0:/rhs/brick1/gluster-vol/b1		49152	Y	3973
Brick 10.70.37.1:/rhs/brick1/gluster-vol/b2		49152	Y	3721
Brick 10.70.37.0:/rhs/brick1/gluster-vol/b3		49153	Y	3984
Brick 10.70.37.1:/rhs/brick1/gluster-vol/b4		49153	Y	3732
NFS Server on localhost					2049	Y	3999
Self-heal Daemon on localhost				N/A	Y	4007
NFS Server on 10.70.37.1				2049	Y	3746
Self-heal Daemon on 10.70.37.1				N/A	Y	3754
 
Task Status of Volume gluster-vol
------------------------------------------------------------------------------


Volume Name: gluster-vol
Type: Distributed-Replicate
Volume ID: 5843bd43-10ad-4b10-a210-69d2b015dd60
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.37.0:/rhs/brick1/gluster-vol/b1
Brick2: 10.70.37.1:/rhs/brick1/gluster-vol/b2
Brick3: 10.70.37.0:/rhs/brick1/gluster-vol/b3
Brick4: 10.70.37.1:/rhs/brick1/gluster-vol/b4
Options Reconfigured:
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256
[root@rhsauto026 bricks]# chkconfig glusterd --list
glusterd       	0:off	1:off	2:on	3:on	4:on	5:on	6:off

Once the rebooted node came up:


[root@rhsauto026 ~]# gluster vol status
Status of volume: gluster-vol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.1:/rhs/brick1/gluster-vol/b2		N/A	N	N/A
Brick 10.70.37.1:/rhs/brick1/gluster-vol/b4		N/A	N	N/A
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	N/A
 
Task Status of Volume gluster-vol
------------------------------------------------------------------------------
There are no active volume tasks

Comment 2 Atin Mukherjee 2014-11-26 05:46:13 UTC
As per the design, brick daemons will not be started until a friend update is received if there are other peers in the cluster, this is just to ensure that the node which is coming up doesn't end up with spawning daemons with stale data. 

In this case, since it was 2 node cluster and one node was down the brick daemons were not started as the friend update was not received. 

However we can start the brick daemons by an volume start force to bypass this check.


Note You need to log in before you can comment on or make changes to this bug.