Bug 1294612 - Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"
Self heal command gives error "Launching heal operation to perform index self...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.1
x86_64 Linux
unspecified Severity high
: ---
: RHGS 3.1.3
Assigned To: Ravishankar N
nchilaka
: ZStream
Depends On: 1200252
Blocks: 1299184 1302291 1306922
  Show dependency treegraph
 
Reported: 2015-12-29 04:35 EST by Ravishankar N
Modified: 2016-09-17 08:19 EDT (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9-1
Doc Type: Bug Fix
Doc Text:
Previously, if any bricks were down when an index heal was started, the following message was displayed: "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful". This was confusing to users because the heal could still succeed if at least one source and one sink brick were available. The message has been modified to avoid confusion, and now reads: "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."
Story Points: ---
Clone Of: 1200252
: 1302291 (view as bug list)
Environment:
Last Closed: 2016-06-23 01:00:41 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 7 nchilaka 2016-05-02 09:16:11 EDT
QATP:
====

    TC#1:  heal  command must throw more meaningful message instead of saying heal unsuccessful when the heal is happening successfully  

    1. have a x3 volume and start it 

    2.Now mount the volume and write a file of say 1GB

    3. bring down one brick and keep writing data to the file

    4. Now bring up the brick which was down, and immediately bring down another brick

    Info: Now as there are mulitple sources for x3 and given that one source was brought down in step 4., that means there is still another source available to heal the data of the first brick brought down.

    6. Now issue a heal command immediately

    Expected Behavior:the heal must complete successfully but the error message thrown by cli must be more clearly talking about down bricks

    Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful""

    Changed Behavior with fix: heal must throw more meaningful o/p as  "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running.""

    TC#2: heal command should say that triggering heal is unsuccessful as some bricks may be down
    1. have a x2 volume and start it 

    2.Now mount the volume and write a file of say 1GB

    3. bring down one brick and keep writing data to the file

    4. Now issue a heal on the volume "glust  v heal <vname>"

    Expected Behavior:

    Previous  behavior: the heal command used to throw following error " Self heal  command gives error "Launching heal operation to perform index self heal  on volume vol0 has been unsuccessful""

    Changed Behavior with fix:  heal must throw more meaningful o/p as  "Launching heal operation to  perform index self heal on volume vol0 has not been been successful on  all nodes. Please check if all brick processes are running.""


    TC#3: heal command should say that triggering heal is unsuccessful as some bricks may be down 

    1. have a x3 volume and start it 

    2.Now mount the volume and write a file of say 1GB

    3. bring down one brick and keep writing data to the file

    4. Now issue a heal on the volume "glust  v heal <vname>"

    Expected Behavior:

    Previous  behavior: the heal command used to throw following error " Self heal  command gives error "Launching heal operation to perform index self heal  on volume vol0 has been unsuccessful""

    Changed Behavior with fix:  heal must throw more meaningful o/p as  "Launching heal operation to  perform index self heal on volume vol0 has not been been successful on  all nodes. Please check if all brick processes are running.""

    TC#4:  heal info must throw proper output when one of the multiple source brick is brought down

    1. have a x3 volume and start it 

    2.Now mount the volume and write a file of say 1GB

    3. bring down one brick and keep writing data to the file

    4. Now bring up the brick which was down, and after a few seconds  bring down another brick

    Info:  Now as there are mulitple sources for x3 and given that one source was  brought down in step 4., that means there is still another source  available to heal the data of the first brick brought down.

    6. Now issue a heal command immediately

    7. Also issue a heal info command.-------->FAILs with duplicate entries for same file

    Expected  Behavior:the heal must complete successfully but the error message  thrown by cli must be more clearly talking about down bricks

    Previous  behavior: the heal command used to throw following error " Self heal  command gives error "Launching heal operation to perform index self heal  on volume vol0 has been unsuccessful""

    Changed Behavior with fix:  heal must throw more meaningful o/p as  "Launching heal operation to  perform index self heal on volume vol0 has not been been successful on  all nodes. Please check if all brick processes are running.""
Comment 8 nchilaka 2016-05-02 09:18:20 EDT
Ran the above published QATP, following is the result:
TC#1 Passed--->main case to validate the fix
TC#2 Passed
TC#3 Passed
TC#4 failed. Raised a seperate bug 1332194 - gluster volume heal info throwing duplicate file or gfid entries 
But as this failure is not really related to this fix is not regressed due to this fix, hence moving this fix to verified

version tested:
glusterfs-client-xlators-3.7.9-2.el7rhgs.x86_64
glusterfs-server-3.7.9-2.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
glusterfs-3.7.9-2.el7rhgs.x86_64
glusterfs-api-3.7.9-2.el7rhgs.x86_64
glusterfs-cli-3.7.9-2.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-2.el7rhgs.x86_64
gluster-nagios-common-0.2.3-1.el7rhgs.noarch
glusterfs-libs-3.7.9-2.el7rhgs.x86_64
glusterfs-fuse-3.7.9-2.el7rhgs.x86_64
glusterfs-rdma-3.7.9-2.el7rhgs.x86_64
[root@dhcp35-191 glusterfs]#
Comment 9 nchilaka 2016-05-06 03:52:31 EDT
raised a bug 1333705 - gluster volume heal info "healed" and "heal-failed" showing wrong information
which could be coz of this fix. but given the commands are to be deprecated, hence not severe
Comment 13 errata-xmlrpc 2016-06-23 01:00:41 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.