Description of problem: ======================== In a scenario, where snap restore is performed when glusterd was down in one of the node in cluster. Restore is successful and entry is updated in the missed_snaps_list with entry 2:1. When a glusterd is brought online, the missed entry list restores and update its entry to 2:2 that means the restore is successful on this node as well. But if you than issue a command "gluster volume heal <vol-name> info" it gives a error "Transport endpoint is not connected" for the restored brick(where glusterd was down during the restore). But gluster volume status shows that the brick is online. As follows: =========== [root@inception ~]# gluster volume heal vol1 info Brick inception.lab.eng.blr.redhat.com:/var/run/gluster/snaps Number of entries: 0 Brick rhs-arch-srv2.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick2/b2 Status: Transport endpoint is not connected Brick rhs-arch-srv3.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick3/b2/ Number of entries: 0 Brick rhs-arch-srv4.lab.eng.blr.redhat.com:/var/run/gluster/snaps/48bef55ddc1a4266ba49a7873d91c457/brick4/b2/ Number of entries: 0 [root@inception ~]# Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.6.0.27-1.el6rhs.x86_64 How reproducible: ================= always Steps to Reproduce: =================== 1. Create 4 node cluster 2. Create a 2*2 volume 3. Create a snapshot (snap1) of the volume 4. Check the gluster volume heal <vol-name> info, it should be successful 5. bring down glusterd on one of the node(for ex node2) 6. offline the volume using "gluster volume stop vol" 7. Restore the volume to snap1. Restore should be successful 8. Start the volume 9. Start the glusterd on node2 10. check gluster volume status <vol-name>, it should list all the process online. 11. Check the gluster volume heal <vol-name> info Actual results: =============== It errors as "Status: Transport endpoint is not connected" for a brick participating in the node where glusterd was down during restore (node2) Expected results: ================= When glusterd is brought online at step 9 the "gluster volume heal <vol-name> info" should not error with "Transport endpoint is not connected"
Marking this bug urgent as client also doesnt connect to the brick which is part of the node2. Any writes from client is pending to this brick.
Please review and sign-off edited doc text.
Canceling need_info as Rajesh reviewed and signed-off doc text during online review meeting.