Bug 1464352

Summary: volume section is not ignoring errors by default while triggering rebalance
Product: Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: gdeployAssignee: Sachidananda Urs <surs>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.3CC: amukherj, asrivast, rhinduja, rhs-bugs, smohan, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: 3.3.0-devel-freeze-exception
Fixed In Version: gdeploy-2.0.2-12 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-21 04:49:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1417151    

Description SATHEESARAN 2017-06-23 08:08:10 UTC
Description of problem:
-----------------------
The default behaviour of the any section is that to ignore errors by default. If user wishes **not** to proceed with gdeploy execution post any failure, he have to add this line ignore_*_errors=yes

But [volume] section now stops execution, when encountering a failure, which is against the expectation

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gdeploy-2.0.2-11.el7rhgs

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Trigger rebalance on a volume

Actual results:
---------------
The script tries to start the volume, as the volume is already started, it fails to proceed further and doesn't triggers rebalance

Expected results:
-----------------
As long as ignore_volume_errors=no is mentioned, volume start failure should be ignored, and rebalance should get triggered


Additional info:
----------------

Here is the exact config file:
[hosts]
host1.example.com

[volume]
action=rebalance
volname=vol1
state=start

This was working well with gdeploy-2.0.1-13.el7rhgs

Comment 2 Sachidananda Urs 2017-06-23 09:58:03 UTC
(In reply to SATHEESARAN from comment #0)
> Description of problem:
> -----------------------
> The default behaviour of the any section is that to ignore errors by
> default. If user wishes **not** to proceed with gdeploy execution post any
> failure, he have to add this line ignore_*_errors=yes


sas, the default is not to ignore errors. And gdeploy to stop soon after it 
encounters errors. We changed to this behavior after a debate.

Ref: https://github.com/gluster/gdeploy/blob/master/gdeploylib/helpers.py#L467

            # Exit gdeploy in case of errors and user has explicitly set
            # not to ignore errors
            if retcode != 0 and Global.ignore_errors != 'yes':
                self.cleanup_and_quit(1)


However, the scenario in this bug is a special case. And we will handle that.

Comment 3 Sachidananda Urs 2017-06-23 10:18:17 UTC
sas, for now we have two ways to bypass this:

1. set force=yes in the config file.

[hosts]
host1.example.com

[volume]
action=rebalance
volname=vol1
state=start
force=yes

2. Set the ignore errors to yes.

[hosts]
host1.example.com

[volume]
action=rebalance
volname=vol1
state=start
ignore_volume_errors=yes

Comment 4 SATHEESARAN 2017-06-23 10:20:39 UTC
(In reply to Sachidananda Urs from comment #3)
> sas, for now we have two ways to bypass this:
> 
> 1. set force=yes in the config file.
> 
> [hosts]
> host1.example.com
> 
> [volume]
> action=rebalance
> volname=vol1
> state=start
> force=yes
> 
> 2. Set the ignore errors to yes.
> 
> [hosts]
> host1.example.com
> 
> [volume]
> action=rebalance
> volname=vol1
> state=start
> ignore_volume_errors=yes

Thanks Sac, I have tested the both and both worked good

Comment 5 Sachidananda Urs 2017-06-23 10:37:28 UTC
Commit: https://github.com/gluster/gdeploy/commit/79dd754358 fixes the issue

Comment 8 SATHEESARAN 2017-07-10 13:52:22 UTC
Tested with gdeploy-2.0.2-12.el7rhgs.

Performance rebalance after add-brick operation using the following conf file:


[hosts]
host1.example.com
host2.example.com
host3.example.com

[volume1]
action=add-brick
volname=vmstore
bricks=/gluster/brick2/b2

[volume2]
action=rebalance
state=start
volname=vmstore

Rebalance was triggered on the volume successfully

Comment 10 errata-xmlrpc 2017-09-21 04:49:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2777