Bug 1464352

Summary:	volume section is not ignoring errors by default while triggering rebalance
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	SATHEESARAN <sasundar>
Component:	gdeploy	Assignee:	Sachidananda Urs <surs>
Status:	CLOSED ERRATA	QA Contact:	SATHEESARAN <sasundar>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, asrivast, rhinduja, rhs-bugs, smohan, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:	3.3.0-devel-freeze-exception
Fixed In Version:	gdeploy-2.0.2-12	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-21 04:49:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1417151

Description SATHEESARAN 2017-06-23 08:08:10 UTC

Description of problem:
-----------------------
The default behaviour of the any section is that to ignore errors by default. If user wishes **not** to proceed with gdeploy execution post any failure, he have to add this line ignore_*_errors=yes

But [volume] section now stops execution, when encountering a failure, which is against the expectation

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gdeploy-2.0.2-11.el7rhgs

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Trigger rebalance on a volume

Actual results:
---------------
The script tries to start the volume, as the volume is already started, it fails to proceed further and doesn't triggers rebalance

Expected results:
-----------------
As long as ignore_volume_errors=no is mentioned, volume start failure should be ignored, and rebalance should get triggered


Additional info:
----------------

Here is the exact config file:
[hosts]
host1.example.com

[volume]
action=rebalance
volname=vol1
state=start

This was working well with gdeploy-2.0.1-13.el7rhgs

Comment 2 Sachidananda Urs 2017-06-23 09:58:03 UTC

(In reply to SATHEESARAN from comment #0)
> Description of problem:
> -----------------------
> The default behaviour of the any section is that to ignore errors by
> default. If user wishes **not** to proceed with gdeploy execution post any
> failure, he have to add this line ignore_*_errors=yes


sas, the default is not to ignore errors. And gdeploy to stop soon after it 
encounters errors. We changed to this behavior after a debate.

Ref: https://github.com/gluster/gdeploy/blob/master/gdeploylib/helpers.py#L467

            # Exit gdeploy in case of errors and user has explicitly set
            # not to ignore errors
            if retcode != 0 and Global.ignore_errors != 'yes':
                self.cleanup_and_quit(1)


However, the scenario in this bug is a special case. And we will handle that.

Comment 3 Sachidananda Urs 2017-06-23 10:18:17 UTC

sas, for now we have two ways to bypass this:

1. set force=yes in the config file.

[hosts]
host1.example.com

[volume]
action=rebalance
volname=vol1
state=start
force=yes

2. Set the ignore errors to yes.

[hosts]
host1.example.com

[volume]
action=rebalance
volname=vol1
state=start
ignore_volume_errors=yes

Comment 4 SATHEESARAN 2017-06-23 10:20:39 UTC

(In reply to Sachidananda Urs from comment #3)
> sas, for now we have two ways to bypass this:
> 
> 1. set force=yes in the config file.
> 
> [hosts]
> host1.example.com
> 
> [volume]
> action=rebalance
> volname=vol1
> state=start
> force=yes
> 
> 2. Set the ignore errors to yes.
> 
> [hosts]
> host1.example.com
> 
> [volume]
> action=rebalance
> volname=vol1
> state=start
> ignore_volume_errors=yes

Thanks Sac, I have tested the both and both worked good

Comment 5 Sachidananda Urs 2017-06-23 10:37:28 UTC

Commit: https://github.com/gluster/gdeploy/commit/79dd754358 fixes the issue

Comment 8 SATHEESARAN 2017-07-10 13:52:22 UTC

Tested with gdeploy-2.0.2-12.el7rhgs.

Performance rebalance after add-brick operation using the following conf file:


[hosts]
host1.example.com
host2.example.com
host3.example.com

[volume1]
action=add-brick
volname=vmstore
bricks=/gluster/brick2/b2

[volume2]
action=rebalance
state=start
volname=vmstore

Rebalance was triggered on the volume successfully

Comment 10 errata-xmlrpc 2017-09-21 04:49:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2777