Description of problem: **************************** On a 2 node cluster with 2X2 volume , when one node is brought down(shutdown) and the other node is rebooted,the bricks on the rebooted node goes offline and never comes back up. Version-Release number of selected component (if applicable): [root@rhsauto026 bricks]# rpm -qa | grep glusterfs glusterfs-api-3.6.0.29-3.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.29-3.el6rhs.x86_64 glusterfs-libs-3.6.0.29-3.el6rhs.x86_64 glusterfs-cli-3.6.0.29-3.el6rhs.x86_64 glusterfs-rdma-3.6.0.29-3.el6rhs.x86_64 glusterfs-3.6.0.29-3.el6rhs.x86_64 glusterfs-fuse-3.6.0.29-3.el6rhs.x86_64 glusterfs-server-3.6.0.29-3.el6rhs.x86_64 samba-glusterfs-3.6.509-169.1.el6rhs.x86_64 How reproducible: Tried twice Steps to Reproduce: 1.create a 2X2 volume on 2 node cluster 2.shutdown node 1 , reboot node 2 3.Check volume status once the node 2 comes up Actual results: ******************** Once the rebooted node comes up , the bricks on this node are offline. Expected results: *********************** Once the rebooted node comes up the brick on this node should be online. Additional info: ************************ Sosreports and voluem information provided below.
As per the design, brick daemons will not be started until a friend update is received if there are other peers in the cluster, this is just to ensure that the node which is coming up doesn't end up with spawning daemons with stale data. In this case, since it was 2 node cluster and one node was down the brick daemons were not started as the friend update was not received. However we can start the brick daemons by an volume start force to bypass this check.