Description of problem: ======================= Currently when multiple bricks are offline and snapshot creation is issued, the snapshot fails complaining about only one brick is offline. It should print both about all the bricks which are offline. For example: =========== 2 bricks of a node is down [root@inception ~]# gluster v status vol1 | grep "inception" Brick inception.lab.eng.blr.redhat.com:/rhs/brick2/b2 N/A N 28623 Brick inception.lab.eng.blr.redhat.com:/rhs/brick3/b3 N/A N 28684 [root@inception ~]# Snapshot create only complains about 1 brick [root@inception ~]# gluster snapshot create RS2 vol1 snapshot create: failed: brick inception.lab.eng.blr.redhat.com:/rhs/brick2/b2 is not started. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. Snapshot command failed [root@inception ~]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.6.0.24-1.el6rhs.x86_64 How reproducible: ================== 1/1 Steps to Reproduce: =================== 1. bring 2 process down of node 2. create a snapshot from the same node Actual results: =============== [root@inception ~]# gluster snapshot create RS2 vol1 snapshot create: failed: brick inception.lab.eng.blr.redhat.com:/rhs/brick2/b2 is not started. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. Snapshot command failed [root@inception ~]# Expected results: ================= The illusion with the above output is that only one brick is offline whereas in actual there are 2 bricks which are offline. The message should be very clear about what all processes are offline. Tabular format is much better approach.
Can you please update the patch link in the BZ.
Version: glusterfs-3.7.1-2.el6rhs.x86_64 ======= Created snapshot when multiple bricks are offline in the volume. It prints one message for every node where bricks are down. In a 4 node cluster if some bricks from 3 nodes are down, it prints the message three times as below : gluster snapshot create S1 vol0 snapshot create: failed: One or more bricks are not running. Please run volume status command to see brick status. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. One or more bricks are not running. Please run volume status command to see brick status. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. One or more bricks are not running. Please run volume status command to see brick status. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. Snapshot command failed It should show the message only once irrespective of the number of nodes where bricks are down (or) print specific details from each node mentioning which brick is down. Moving it back to 'Assigned'
Mainline - http://review.gluster.org/#/c/11234/ 3.7 - http://review.gluster.org/#/c/11293/ Downstream - https://code.engineering.redhat.com/gerrit/51039
As per the current framework, the best we can do right now is display the node information along with the error string. That should bring some structure to the error display on screen. Please file a RFE for future, so as to tackle this issue more elegantly
Version : glusterfs-3.7.1-8.el6rhs.x86_64 Killed some bricks in the volume from 3 nodes in the cluster and created a snapshot on the volume , it fails with the below message with details on which node bricks are not running: gluster snapshot create S1 vol0 snapshot create: failed: Pre Validation failed on rhs-arch-srv2.lab.eng.blr.redhat.com. One or more bricks are not running. Please run volume status command to see brick status. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. Pre Validation failed on rhs-arch-srv3.lab.eng.blr.redhat.com. One or more bricks are not running. Please run volume status command to see brick status. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. Pre Validation failed on rhs-arch-srv4.lab.eng.blr.redhat.com. One or more bricks are not running. Please run volume status command to see brick status. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior. Snapshot command failed As per comment 7, marking this bug 'verified'. Will be raising a RFE to handle the failure scenarios more elegantly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html