QATP: ==== TC#1: heal command must throw more meaningful message instead of saying heal unsuccessful when the heal is happening successfully 1. have a x3 volume and start it 2.Now mount the volume and write a file of say 1GB 3. bring down one brick and keep writing data to the file 4. Now bring up the brick which was down, and immediately bring down another brick Info: Now as there are mulitple sources for x3 and given that one source was brought down in step 4., that means there is still another source available to heal the data of the first brick brought down. 6. Now issue a heal command immediately Expected Behavior:the heal must complete successfully but the error message thrown by cli must be more clearly talking about down bricks Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"" Changed Behavior with fix: heal must throw more meaningful o/p as "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."" TC#2: heal command should say that triggering heal is unsuccessful as some bricks may be down 1. have a x2 volume and start it 2.Now mount the volume and write a file of say 1GB 3. bring down one brick and keep writing data to the file 4. Now issue a heal on the volume "glust v heal <vname>" Expected Behavior: Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"" Changed Behavior with fix: heal must throw more meaningful o/p as "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."" TC#3: heal command should say that triggering heal is unsuccessful as some bricks may be down 1. have a x3 volume and start it 2.Now mount the volume and write a file of say 1GB 3. bring down one brick and keep writing data to the file 4. Now issue a heal on the volume "glust v heal <vname>" Expected Behavior: Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"" Changed Behavior with fix: heal must throw more meaningful o/p as "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."" TC#4: heal info must throw proper output when one of the multiple source brick is brought down 1. have a x3 volume and start it 2.Now mount the volume and write a file of say 1GB 3. bring down one brick and keep writing data to the file 4. Now bring up the brick which was down, and after a few seconds bring down another brick Info: Now as there are mulitple sources for x3 and given that one source was brought down in step 4., that means there is still another source available to heal the data of the first brick brought down. 6. Now issue a heal command immediately 7. Also issue a heal info command.-------->FAILs with duplicate entries for same file Expected Behavior:the heal must complete successfully but the error message thrown by cli must be more clearly talking about down bricks Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"" Changed Behavior with fix: heal must throw more meaningful o/p as "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running.""
Ran the above published QATP, following is the result: TC#1 Passed--->main case to validate the fix TC#2 Passed TC#3 Passed TC#4 failed. Raised a seperate bug 1332194 - gluster volume heal info throwing duplicate file or gfid entries But as this failure is not really related to this fix is not regressed due to this fix, hence moving this fix to verified version tested: glusterfs-client-xlators-3.7.9-2.el7rhgs.x86_64 glusterfs-server-3.7.9-2.el7rhgs.x86_64 python-gluster-3.7.5-19.el7rhgs.noarch gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64 vdsm-gluster-4.16.30-1.3.el7rhgs.noarch glusterfs-3.7.9-2.el7rhgs.x86_64 glusterfs-api-3.7.9-2.el7rhgs.x86_64 glusterfs-cli-3.7.9-2.el7rhgs.x86_64 glusterfs-geo-replication-3.7.9-2.el7rhgs.x86_64 gluster-nagios-common-0.2.3-1.el7rhgs.noarch glusterfs-libs-3.7.9-2.el7rhgs.x86_64 glusterfs-fuse-3.7.9-2.el7rhgs.x86_64 glusterfs-rdma-3.7.9-2.el7rhgs.x86_64 [root@dhcp35-191 glusterfs]#
raised a bug 1333705 - gluster volume heal info "healed" and "heal-failed" showing wrong information which could be coz of this fix. but given the commands are to be deprecated, hence not severe
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240