Red Hat Bugzilla – Bug 1257854
[glusterD]: Brick status showing offline when glusterd is down on peer node and restarted glusterd on the other node in the two node cluster.
Last modified: 2017-02-17 00:36:36 EST
Description of problem:
Brick status showing offline(N) ( gluster v status ) when glusterd is down on the peer node and restarted the glusterd on the other node in the two node cluster.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create Distributed volume using cluster of two nodes( Node-1 & Node-2)
2. Check volume status (gluster volume status <vol_name>)
3. Stop glusterD on Node-2
4. Check volume status on Node-1
5. Restart the glusterD on Node-1
6. Again check volume status on Node-1.
Brick status showing offline(N) even if brick process is running.
Brick status should show online (Y) even after glusterd restart.
Here after glusterd restart, brick process are running but status of it is not showing properly
Created attachment 1067960 [details]
sos report on Node1
Created attachment 1067961 [details]
sos report on Node2 (peer node)
Can you check this behaviour and see what's wrong here?
(In reply to Atin Mukherjee from comment #7)
> Can you check this behaviour and see what's wrong here?
Will check this out.
On a two node set up if the glusterd instance on first node is down and glusterd on second node is restarted then glusterd_restart_bricks () is not called until and unless glusterd on first node comes back. What it means is even though the brick process on node 2 is alive glusterd will not be able to connect to it since glusterd_brick_start () which is called by glusterd_restart_bricks () does that handling. So on a nutshell this is a known issue. We have got some upstream users complaining about this where they want a sort of an option where they don't care about split brains and still want to bring up the brick processes, with that in mind we'd need to think how we can solve this, is it worth to fix in GD 1.0 or 2.0 is what we need to take a call, my vote would be for the later.
Do you mind to close this bug and clone this upstream with GlusterD2 as a component?
Closing this BZ as DEFERRED to GD2