Description of problem: ======================= While rebalance is in progress with "gluster volume rebalance start", snap creation fails as expected with message "snapshot create: failed: rebalance process is running for the volume vol2". [root@snapshot-12 ~]# gluster snapshot create r5 vol2 snapshot create: failed: rebalance process is running for the volume vol2 Snapshot command failed [root@snapshot-12 ~]# But, when a rebalance is in progress as part of remove brick, snap creation fails with pre-validation error. In both the cases the volume has rebalance in progress this should also complain that snap create failed because rebalance is in progress. [root@snapshot-09 ~]# gluster volume rebalance vol0 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 not started 0.00 10.70.43.20 0 0Bytes 0 0 0 not started 0.00 10.70.43.186 343 1.5MB 743 0 0 in progress 30.00 10.70.43.70 0 0Bytes 2711 0 0 in progress 30.00 volume rebalance: vol0: success: [root@snapshot-09 ~]# [root@snapshot-10 ~]# gluster snapshot create r1 vol0 snapshot create: failed: Pre Validation failed on 10.70.43.186. Please check log file for details. Snapshot command failed [root@snapshot-10 ~]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.4.1.7.snap.mar27.2014git-1.el6.x86_64 How reproducible: ================= 1/1 Steps to Reproduce: =================== 1. Create and start a volume 2. Mount a volume and create files to it 3. Remove a brick using "gluster volume remove-brick vol-name start" 4. Remove brick should be successful and should start rebalance 5. Create a snapshot of a volume Actual results: =============== Creation fails with pre-validation error Expected results: ================= It should fail gracefully with proper message as "rebalance is in progress" or "remove-brick is in progress"
Marking snapshot BZs to RHS 3.0.
1) Was not able to reproduce the issue with glusterfs-3.6.0.4 2) Fixed in http://review.gluster.org/#/c/7128/ 3) Moving the bug ON_QA
(In reply to Joseph Elwin Fernandes from comment #4) > 1) Was not able to reproduce the issue with glusterfs-3.6.0.4 > 2) Fixed in http://review.gluster.org/#/c/7128/ > 3) Moving the bug ON_QA Did you confirm that the issue was reproducible with the earlier bits? If yes, can you post the probable cause of it and what might have fixed it in the newer build. With the review link you provided, the last build was generated on 11-April-2014 while this bug was reported on 08-April-2014, so that means something between these dates would have fixed it. Please provide the proper pointer or analysis Note: This issue was fairly reproducible, will try to reproduce it on latest bits as well
Able to hit this issue with build: glusterfs-3.6.0.5-1.el6rhs.x86_64 with exactly the same steps as mentioned in Description Rebalance is in progress: ========================== [root@snapshot13 ~]# gluster volume remove-brick vol0 snapshot16.lab.eng.blr.redhat.com:/brick0/b0 snapshot15.lab.eng.blr.redhat.com:/brick0/b0 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- snapshot15.lab.eng.blr.redhat.com 78 324.3KB 183 0 0 in progress 6.00 snapshot16.lab.eng.blr.redhat.com 0 0Bytes 584 0 0 in progress 6.00 [root@snapshot13 ~]# [root@snapshot13 ~]# Snapshot creation fails with pre-validation: ============================================ [root@snapshot13 ~]# gluster snapshot create snap0 vol0 snapshot create: failed: Pre Validation failed on snapshot15.lab.eng.blr.redhat.com. Please check log file for details. Pre Validation failed on snapshot16.lab.eng.blr.redhat.com. Please check log file for details. Snapshot command failed [root@snapshot13 ~]#
https://bugzilla.redhat.com/show_bug.cgi?id=1101993#c1 Anand Avati 2014-05-29 01:05:14 EDT REVIEW: http://review.gluster.org/7899 ([glusterd/snapshot] Fix for snap create preval for remote peer err msg) posted (#7) for review on master by Joseph Fernandes (josferna)
Downstream submit https://code.engineering.redhat.com/gerrit/#/c/26717/
Adding to the comments in Comment 11 : On a nx2 volume when node is down, and snapshot is created it fails with error message "quorum is not met" whereas on nx3 when node is down and snapshot is created it fails with "One or more bricks may be down" error message. The error message should be same on nx2 and nx3 volumes.
Version : glusterfs-3.7.1-7.el6rhs.x86_64 gluster v remove-brick vol0 replica 3 rhs-arch-srv4.lab.eng.blr.redhat.com:/rhs/brick5/b5 inception.lab.eng.blr.redhat.com:/rhs/brick6/b6 rhs-arch-srv2.lab.eng.blr.redhat.com:/rhs/brick6/b6 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 19 50.1KB 28 0 0 in progress 8.00 rhs-arch-srv2.lab.eng.blr.redhat.com 13 46.8KB 176 0 0 in progress 7.00 rhs-arch-srv4.lab.eng.blr.redhat.com 70 777.1KB 76 0 0 in progress 7.00 gluster snapshot create A1 vol0 snapshot create: failed: rebalance process is running for the volume vol0 Snapshot command failed Marking the bug 'verified'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html