1164222 – All the bricks on one of the node goes offline and doesn't comes back up when one of the node is shutdown and the other node is rebooted in 2X2 gluster volume.

Bug 1164222 - All the bricks on one of the node goes offline and doesn't comes back up when one of the node is shutdown and the other node is rebooted in 2X2 gluster volume.

Summary: All the bricks on one of the node goes offline and doesn't comes back up when...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1168080
TreeView+	depends on / blocked

Reported:	2014-11-14 11:36 UTC by surabhi
Modified:	2014-11-26 05:45 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1168080 (view as bug list)
Environment:
Last Closed:	2014-11-26 05:45:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description surabhi 2014-11-14 11:36:18 UTC

Description of problem:
****************************
On a 2 node cluster with 2X2 volume , when one node is brought down(shutdown) and the other node is rebooted,the bricks on the rebooted node goes offline and never comes back up.

Version-Release number of selected component (if applicable):
[root@rhsauto026 bricks]# rpm -qa | grep glusterfs
glusterfs-api-3.6.0.29-3.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.29-3.el6rhs.x86_64
glusterfs-libs-3.6.0.29-3.el6rhs.x86_64
glusterfs-cli-3.6.0.29-3.el6rhs.x86_64
glusterfs-rdma-3.6.0.29-3.el6rhs.x86_64
glusterfs-3.6.0.29-3.el6rhs.x86_64
glusterfs-fuse-3.6.0.29-3.el6rhs.x86_64
glusterfs-server-3.6.0.29-3.el6rhs.x86_64
samba-glusterfs-3.6.509-169.1.el6rhs.x86_64

How reproducible:
Tried twice

Steps to Reproduce:
1.create a 2X2 volume on 2 node cluster
2.shutdown node 1 , reboot node 2
3.Check volume status once the node 2 comes up

Actual results:
********************
Once the rebooted node comes up , the bricks on this node are offline.


Expected results:
***********************
Once the rebooted node comes up the brick on this node should be online.


Additional info:
************************
Sosreports and voluem information provided below.

Comment 3 Atin Mukherjee 2014-11-26 05:45:44 UTC

As per the design, brick daemons will not be started until a friend update is received if there are other peers in the cluster, this is just to ensure that the node which is coming up doesn't end up with spawning daemons with stale data. 

In this case, since it was 2 node cluster and one node was down the brick daemons were not started as the friend update was not received. 

However we can start the brick daemons by an volume start force to bypass this check.

Note You need to log in before you can comment on or make changes to this bug.