Bug 1164222

Summary:	All the bricks on one of the node goes offline and doesn't comes back up when one of the node is shutdown and the other node is rebooted in 2X2 gluster volume.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	surabhi <sbhaloth>
Component:	glusterd	Assignee:	Bug Updates Notification Mailing List <rhs-bugs>
Status:	CLOSED NOTABUG	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.0	CC:	amukherj, nlevinki, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1168080 (view as bug list)		Environment:
Last Closed:	2014-11-26 05:45:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1168080

Description surabhi 2014-11-14 11:36:18 UTC

Description of problem:
****************************
On a 2 node cluster with 2X2 volume , when one node is brought down(shutdown) and the other node is rebooted,the bricks on the rebooted node goes offline and never comes back up.

Version-Release number of selected component (if applicable):
[root@rhsauto026 bricks]# rpm -qa | grep glusterfs
glusterfs-api-3.6.0.29-3.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.29-3.el6rhs.x86_64
glusterfs-libs-3.6.0.29-3.el6rhs.x86_64
glusterfs-cli-3.6.0.29-3.el6rhs.x86_64
glusterfs-rdma-3.6.0.29-3.el6rhs.x86_64
glusterfs-3.6.0.29-3.el6rhs.x86_64
glusterfs-fuse-3.6.0.29-3.el6rhs.x86_64
glusterfs-server-3.6.0.29-3.el6rhs.x86_64
samba-glusterfs-3.6.509-169.1.el6rhs.x86_64

How reproducible:
Tried twice

Steps to Reproduce:
1.create a 2X2 volume on 2 node cluster
2.shutdown node 1 , reboot node 2
3.Check volume status once the node 2 comes up

Actual results:
********************
Once the rebooted node comes up , the bricks on this node are offline.


Expected results:
***********************
Once the rebooted node comes up the brick on this node should be online.


Additional info:
************************
Sosreports and voluem information provided below.

Comment 3 Atin Mukherjee 2014-11-26 05:45:44 UTC

As per the design, brick daemons will not be started until a friend update is received if there are other peers in the cluster, this is just to ensure that the node which is coming up doesn't end up with spawning daemons with stale data. 

In this case, since it was 2 node cluster and one node was down the brick daemons were not started as the friend update was not received. 

However we can start the brick daemons by an volume start force to bypass this check.