+++ This bug was initially created as a clone of Bug #1164222 +++ Description of problem: **************************** On a 2 node cluster with 2X2 volume , when one node is brought down(shutdown) and the other node is rebooted,the bricks on the rebooted node goes offline and never comes back up. Version-Release number of selected component (if applicable): [root@rhsauto026 bricks]# rpm -qa | grep glusterfs glusterfs-api-3.6.0.29-3.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.29-3.el6rhs.x86_64 glusterfs-libs-3.6.0.29-3.el6rhs.x86_64 glusterfs-cli-3.6.0.29-3.el6rhs.x86_64 glusterfs-rdma-3.6.0.29-3.el6rhs.x86_64 glusterfs-3.6.0.29-3.el6rhs.x86_64 glusterfs-fuse-3.6.0.29-3.el6rhs.x86_64 glusterfs-server-3.6.0.29-3.el6rhs.x86_64 samba-glusterfs-3.6.509-169.1.el6rhs.x86_64 How reproducible: Tried twice Steps to Reproduce: 1.create a 2X2 volume on 2 node cluster 2.shutdown node 1 , reboot node 2 3.Check volume status once the node 2 comes up Actual results: ******************** Once the rebooted node comes up , the bricks on this node are offline. Expected results: *********************** Once the rebooted node comes up the brick on this node should be online. [root@rhsauto025 /]# gluster vol status Status of volume: gluster-vol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.0:/rhs/brick1/gluster-vol/b1 49152 Y 3973 Brick 10.70.37.1:/rhs/brick1/gluster-vol/b2 49152 Y 3721 Brick 10.70.37.0:/rhs/brick1/gluster-vol/b3 49153 Y 3984 Brick 10.70.37.1:/rhs/brick1/gluster-vol/b4 49153 Y 3732 NFS Server on localhost 2049 Y 3999 Self-heal Daemon on localhost N/A Y 4007 NFS Server on 10.70.37.1 2049 Y 3746 Self-heal Daemon on 10.70.37.1 N/A Y 3754 Task Status of Volume gluster-vol ------------------------------------------------------------------------------ Volume Name: gluster-vol Type: Distributed-Replicate Volume ID: 5843bd43-10ad-4b10-a210-69d2b015dd60 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.37.0:/rhs/brick1/gluster-vol/b1 Brick2: 10.70.37.1:/rhs/brick1/gluster-vol/b2 Brick3: 10.70.37.0:/rhs/brick1/gluster-vol/b3 Brick4: 10.70.37.1:/rhs/brick1/gluster-vol/b4 Options Reconfigured: performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 [root@rhsauto026 bricks]# chkconfig glusterd --list glusterd 0:off 1:off 2:on 3:on 4:on 5:on 6:off Once the rebooted node came up: [root@rhsauto026 ~]# gluster vol status Status of volume: gluster-vol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.1:/rhs/brick1/gluster-vol/b2 N/A N N/A Brick 10.70.37.1:/rhs/brick1/gluster-vol/b4 N/A N N/A NFS Server on localhost N/A N N/A Self-heal Daemon on localhost N/A N N/A Task Status of Volume gluster-vol ------------------------------------------------------------------------------ There are no active volume tasks
As per the design, brick daemons will not be started until a friend update is received if there are other peers in the cluster, this is just to ensure that the node which is coming up doesn't end up with spawning daemons with stale data. In this case, since it was 2 node cluster and one node was down the brick daemons were not started as the friend update was not received. However we can start the brick daemons by an volume start force to bypass this check.