Bug 1257854

Summary: [glusterD]: Brick status showing offline when glusterd is down on peer node and restarted glusterd on the other node in the two node cluster.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Byreddy <bsrirama>
Component: glusterdAssignee: Samikshan Bairagya <sbairagy>
Status: CLOSED DEFERRED QA Contact: Byreddy <bsrirama>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, bsrirama, nlevinki, sankarshan, sasundar, smohan, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: glusterd
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-17 05:36:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sos report on Node1
none
sos report on Node2 (peer node) none

Description Byreddy 2015-08-28 09:19:32 UTC
Description of problem:
Brick status showing offline(N) ( gluster v status ) when glusterd is down on the peer node and restarted the glusterd on the other node in the two node cluster.

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-13

How reproducible:
Always

Steps to Reproduce:
1. Create Distributed volume using cluster of two nodes( Node-1 & Node-2)
2. Check volume status (gluster volume status <vol_name>)
3. Stop glusterD on Node-2
4. Check volume status on Node-1
5. Restart the glusterD on Node-1
6. Again check volume status on Node-1.

Actual results:
Brick status showing offline(N) even if brick process is running.

Expected results:
Brick status should show online (Y) even after glusterd restart.

Additional info:
Here after glusterd restart, brick process are running but status of it is not showing properly

Comment 4 Byreddy 2015-08-28 11:57:11 UTC
Created attachment 1067960 [details]
sos report on Node1

Comment 5 Byreddy 2015-08-28 11:58:56 UTC
Created attachment 1067961 [details]
sos report on Node2 (peer node)

Comment 7 Atin Mukherjee 2016-06-23 17:00:11 UTC
Samikshan,

Can you check this behaviour and see what's wrong here?

~Atin

Comment 8 Samikshan Bairagya 2016-06-24 03:10:53 UTC
(In reply to Atin Mukherjee from comment #7)
> Samikshan,
> 
> Can you check this behaviour and see what's wrong here?
> 

Will check this out.

Comment 9 Atin Mukherjee 2016-06-29 09:20:50 UTC
On a two node set up if the glusterd instance on first node is down and glusterd on second node is restarted then glusterd_restart_bricks () is not called until and unless glusterd on first node comes back. What it means is even though the brick process on node 2 is alive glusterd will not be able to connect to it since glusterd_brick_start () which is called by glusterd_restart_bricks () does that handling. So on a nutshell this is a known issue. We have got some upstream users complaining about this where they want a sort of an option where they don't care about split brains and still want to bring up the brick processes, with that in mind we'd need to think how we can solve this, is it worth to fix in GD 1.0 or 2.0 is what we need to take a call, my vote would be for the later. 

Byreddy,

Do you mind to close this bug and clone this upstream with GlusterD2 as a component?

~Atin

Comment 10 Byreddy 2017-02-17 05:36:36 UTC
Closing this BZ as DEFERRED to GD2