Description of problem: ----------------------- Gluster volume was used to host virtual machine images [RHEV-RHS Integration]. Initially used 2X2 distribute replicate volume. After adding replica pairs, did a rebalance. Remove all pairs of bricks for the volume to become a replicate volume. Again added few more bricks, to make it a distribute-replicate volume. After sometime I add-brick reported failure : "volume add-brick: failed: Staging failed on 10.70.37.116. Error: Host 10.70.37.159 is not in 'Peer in Cluster' state Staging failed on 10.70.37.175. Error: Host 10.70.37.159 is not in 'Peer in Cluster' state" But all the peers listed in all the hosts has the status 'Peer in Cluster' Version-Release number of selected component (if applicable): -------------------------------------------------------------- glusterfs-3.6.0.28-1.el6rhs How reproducible: ----------------- Haven't tried to reproduce Steps to Reproduce: ------------------- 1. Created 2X2 distributed-replicate volume with 4 RHSS Nodes in the cluster. Each having 4 bricks 2. After optimizing the volume for virt-store, use it to host VM images 3. Add more bricks and perform rebalance 4. 'remove brick start' on the volume and make it as a replicate volume 5. Add more bricks to make it as a Distribute-replicate volume and perform rebalance 6. Perform 'remove brick start' operation on the volume 7. Try to repeat step 5 and 6 Actual results: --------------- After some iteration, add-brick fails with error message : volume add-brick: failed: Staging failed on 10.70.37.116. Error: Host 10.70.37.159 is not in 'Peer in Cluster' state Staging failed on 10.70.37.175. Error: Host 10.70.37.159 is not in 'Peer in Cluster' state Expected results: ----------------- There shouldn't be any errors/problems
Created attachment 937498 [details] glusterd log from RHSS-Node-1 Attached the glusterd log from RHSS-Node-1
As per the analysis done, its not a bug. Peer updation was attempted from 10.70.37.159 without having /etc/hosts entry for the IP and hostname. So hostname resolution failed in this node & 10.70.37.175. Having an entry in /etc/hosts solve this problem.