Bug 1164222

Summary: All the bricks on one of the node goes offline and doesn't comes back up when one of the node is shutdown and the other node is rebooted in 2X2 gluster volume.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: surabhi <sbhaloth>
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED NOTABUG QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: amukherj, nlevinki, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1168080 (view as bug list) Environment:
Last Closed: 2014-11-26 05:45:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1168080    

Description surabhi 2014-11-14 11:36:18 UTC
Description of problem:
****************************
On a 2 node cluster with 2X2 volume , when one node is brought down(shutdown) and the other node is rebooted,the bricks on the rebooted node goes offline and never comes back up.

Version-Release number of selected component (if applicable):
[root@rhsauto026 bricks]# rpm -qa | grep glusterfs
glusterfs-api-3.6.0.29-3.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.29-3.el6rhs.x86_64
glusterfs-libs-3.6.0.29-3.el6rhs.x86_64
glusterfs-cli-3.6.0.29-3.el6rhs.x86_64
glusterfs-rdma-3.6.0.29-3.el6rhs.x86_64
glusterfs-3.6.0.29-3.el6rhs.x86_64
glusterfs-fuse-3.6.0.29-3.el6rhs.x86_64
glusterfs-server-3.6.0.29-3.el6rhs.x86_64
samba-glusterfs-3.6.509-169.1.el6rhs.x86_64

How reproducible:
Tried twice

Steps to Reproduce:
1.create a 2X2 volume on 2 node cluster
2.shutdown node 1 , reboot node 2
3.Check volume status once the node 2 comes up

Actual results:
********************
Once the rebooted node comes up , the bricks on this node are offline.


Expected results:
***********************
Once the rebooted node comes up the brick on this node should be online.


Additional info:
************************
Sosreports and voluem information provided below.

Comment 3 Atin Mukherjee 2014-11-26 05:45:44 UTC
As per the design, brick daemons will not be started until a friend update is received if there are other peers in the cluster, this is just to ensure that the node which is coming up doesn't end up with spawning daemons with stale data. 

In this case, since it was 2 node cluster and one node was down the brick daemons were not started as the friend update was not received. 

However we can start the brick daemons by an volume start force to bypass this check.