Bug 1291974 - [GlusterD]: Bricks are in offline state after node reboot
Summary: [GlusterD]: Bricks are in offline state after node reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
: RHGS 3.3.0
Assignee: Atin Mukherjee
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks: 1417147
TreeView+ depends on / blocked
 
Reported: 2015-12-16 05:41 UTC by Byreddy
Modified: 2017-09-21 04:53 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:25:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Byreddy 2015-12-16 05:41:24 UTC
Description of problem:
=======================
Created a sample Distributed volume using one node, rebooted the node and checked the volume status, bricks was in offline state.

And the same issue observed with RHSC setup.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-11


How reproducible:
=================
Multiple times able to reproduce it.


Steps to Reproduce:
===================
Scenario-I:
1. Created a sample volume using one node cluster
2. Rebooted the node
3. Check the volume status //Bricks are in offline state.

Scenario-II:[USING RHSC Setup]
1. Have two node cluster (node-1 & node-2) with Distributed-replicate volume 
2. Move one of the node to maintenance state //say node-1
3. Check volume status on other active node 

Actual results:
===============
Scenario-I: Bricks are in offline state after node reboot
Scenario-II: Local bricks of the active node are showing offline 


Expected results:
=================
In both the above scenarios, bricks should be running.


Additional info:

Comment 3 Atin Mukherjee 2015-12-17 08:57:21 UTC
We actually debugged this issue in a set up where we identified that the moment glusterd started the brick(s), immediately it received disconnect event from the brick. Post that rpc_reconnect () fails after every 3 seconds because the underlying socket is already connected, this looks weird but that's what we got to know from the state of these layers. During this, we also identified a place in glusterd code where we are not handling the rpc connection in a right way (in terms of (un)setting the connected flag). I've posted a patch [1] upstream, however having said that the patch doesn't solve the entire problem. So the RCA is still unknown.

[1] http://review.gluster.org/#/c/12908/

Byreddy,

As communicated, can you try your luck to get to a reproducer?

Comment 4 Atin Mukherjee 2017-02-09 04:08:06 UTC
I think this is now fixed through BZ 1385605.

Comment 7 Bala Konda Reddy M 2017-07-31 09:13:53 UTC
BUILD : 3.8.4-35

After node reboot, bricks are online. 
In RHSC Setup, bricks are online when one of the node is in maintenance state as expected.
Hence marking the bug as verified

Comment 9 errata-xmlrpc 2017-09-21 04:25:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 10 errata-xmlrpc 2017-09-21 04:53:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.