Bug 1416361

Summary: Gdeploy returns exit code '0' even when it fails to complete all the tasks
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ramesh N <rnachimu>
Component: gdeployAssignee: Sachidananda Urs <surs>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: amukherj, dkota, msaini, rcyriac, rhs-bugs, rnachimu, sabose, sac, sbhaloth, smohan, storage-qa-internal, surs
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gdeploy-2.0.1-9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 05:09:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1277939, 1351528    

Description Ramesh N 2017-01-25 11:11:41 UTC
Description of problem:

Gdeploy returns successful exit code '0' even when it fails to complete all the tasks.

Version-Release number of selected component (if applicable):
 gdeploy-2.0.1-8.el7rhgs.noarch.rpm

How reproducible:

Always.

Steps to Reproduce:
1. Create gdeploy config file with some mistake like giving raid type as 'none' 
2. Executes gdeploy -c <config-file>
3. Gdeploy fails to execute with some error
4. check the result of 'echo $?'

Actual results:

0

Expected results:

Some error code but not 0.

Additional info:

Same issue can be reproduced by not configuring password less ssh. This issue affects other tools which are depending on the exit code of the gdeploy command.

Comment 2 Sachidananda Urs 2017-02-01 08:01:57 UTC
Commit: https://github.com/gluster/gdeploy/commit/d928be7 fixes the issue.

Comment 4 Manisha Saini 2017-02-06 11:03:07 UTC
Tested this with gdeploy-2.0.1-9.el7rhgs.noarch.

If gdeploy fails to execute some task it still returns 0 as exit code

Steps:
1.Create gdeploy file with below configuration
2.Avoid doing passwordless ssh for host 10.70.37.110
3.Run gdeploy conf file

Gdeploy.conf

[hosts]
10.70.37.131
10.70.37.110
10.70.37.85


[peer]
action=probe

==============
# gdeploy -c gdeploy.conf 

PLAY [master] ******************************************************************

TASK [Creates a Trusted Storage Pool] ******************************************
fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
	to retry, use: --limit @/tmp/tmpTyzfus/gluster-peer-probe.retry

PLAY RECAP *********************************************************************
10.70.37.110               : ok=0    changed=0    unreachable=1    failed=0   

Ignoring errors...

=====================

Check output of echo $?

# echo $?
0


It still returns 0 as exit code.

Comment 5 Atin Mukherjee 2017-02-06 12:59:44 UTC
Sac/Devyani,

Looking at the fix mentioned at https://github.com/gluster/gdeploy/commit/d928be7 may I ask you what's the coverage of cleanup_and_quit ()? IIUC (with my limited knowledge of gDeploy) this function is not been called for all error handling cases, in that case this fix might not be correct from a generic perspective?

Comment 6 Atin Mukherjee 2017-02-07 04:12:02 UTC
Manisha,

I got a chance to talk to Sac and we beleive there is a key point missed out while testing this bug, please note by default gDeploy ignore the errors and hence it has a ignore_error parameter, in this particular case to over ride that behaviour can you please retest the scenario with ignore_peer_errors=no ?

I am moving this BZ back to ON_QA, in case it doesn't work please mark it failed again.

Comment 7 Manisha Saini 2017-02-07 09:35:44 UTC
Tested this bug on the build on which it was filed.

# rpm -qa | grep gdeploy
gdeploy-2.0.1-8.el7rhgs.noarch

1. With using ignore_peer_errors option -


Gdeploy conf file-
---------
[hosts]
10.70.37.131
10.70.37.110
10.70.37.85

[peer]
action=probe
ignore_peer_errors=no

-------
# echo $?
1

=================
# gdeploy -c gdeploy.conf 

PLAY [master] ******************************************************************

TASK [Creates a Trusted Storage Pool] ******************************************
fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
	to retry, use: --limit @/tmp/tmpm4XEuQ/gluster-peer-probe.retry

PLAY RECAP *********************************************************************
10.70.37.110               : ok=0    changed=0    unreachable=1    failed=0   

Ignoring errors...

=================

2.Without using ignore_peer_errors option

Conf file

------------
[hosts]
10.70.37.131
10.70.37.110
10.70.37.85

[peer]
action=probe

-----------
# echo $?
0


=================
# gdeploy -c gdeploy.conf 

PLAY [master] ******************************************************************

TASK [Creates a Trusted Storage Pool] ******************************************
fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
	to retry, use: --limit @/tmp/tmpkn7S5u/gluster-peer-probe.retry

PLAY RECAP *********************************************************************
10.70.37.110               : ok=0    changed=0    unreachable=1    failed=0   


=================



With the build which has the fix
# rpm -qa | grep gdeploy
gdeploy-2.0.1-9.el7rhgs.noarch

1. With using ignore_peer_errors option -

# cat gdeploy.conf 
[hosts]
10.70.37.131
10.70.37.110
10.70.37.85

[peer]
action=probe
ignore_peer_errors=no


# echo $?
1

2.Without using ignore_peer_errors option

# cat gdeploy.conf 
[hosts]
10.70.37.131
10.70.37.110
10.70.37.85

[peer]
action=probe

# echo $?
0



Observation:

In previous build (on which the bug is filed) and the current build has the same observation with using ignore_peer_errors parameter in gdeploy file.

Comment 8 Manisha Saini 2017-02-10 08:56:27 UTC
Tested this by giving raid option as none in gdeploy file

Steps:
1. Create gdeploy config file with some mistake like giving raid type as 'none' 
2. Executes gdeploy -c <config-file>
3. Gdeploy fails to execute with some error
4. check the result of 'echo $?'

Result:

#gdeploy -c gdeploy.conf 

Error: Unsupported disk type!
[root@dhcp37-110 gdeploy]# echo $?
1


@Ramesh:

This bug reported has two issues addressed:
1.Giving wrong raid type
2.By not configuring passwordless ssh (Same issue can be reproduced as mentioned in additional info)

According to my findings
Issue 1 is resolved in the build gdeploy-2.0.1-9.el7rhgs.noarch

But for issue second,I have the same observation on both the builds (the one on which this bug was opened gdeploy-2.0.1-8.el7rhgs.noarch and the one having the fix gdeploy-2.0.1-9.el7rhgs.noarch)

Result for issue 2 is there in comment #7


Ramesh can you update this bug with your findings with the fix??Your observation on not configuring passwordless-ssh??

Comment 9 Atin Mukherjee 2017-02-13 13:47:55 UTC
(In reply to Manisha Saini from comment #7)
> Tested this bug on the build on which it was filed.
> 
> # rpm -qa | grep gdeploy
> gdeploy-2.0.1-8.el7rhgs.noarch
> 
> 1. With using ignore_peer_errors option -
> 
> 
> Gdeploy conf file-
> ---------
> [hosts]
> 10.70.37.131
> 10.70.37.110
> 10.70.37.85
> 
> [peer]
> action=probe
> ignore_peer_errors=no
> 
> -------
> # echo $?
> 1
> 
> =================
> # gdeploy -c gdeploy.conf 
> 
> PLAY [master]
> ******************************************************************
> 
> TASK [Creates a Trusted Storage Pool]
> ******************************************
> fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to
> connect to the host via ssh: Permission denied
> (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
> 	to retry, use: --limit @/tmp/tmpm4XEuQ/gluster-peer-probe.retry
> 
> PLAY RECAP
> *********************************************************************
> 10.70.37.110               : ok=0    changed=0    unreachable=1    failed=0 
> 
> 
> Ignoring errors...
> 
> =================
> 
> 2.Without using ignore_peer_errors option
> 
> Conf file
> 
> ------------
> [hosts]
> 10.70.37.131
> 10.70.37.110
> 10.70.37.85
> 
> [peer]
> action=probe
> 
> -----------
> # echo $?
> 0
> 
> 
> =================
> # gdeploy -c gdeploy.conf 
> 
> PLAY [master]
> ******************************************************************
> 
> TASK [Creates a Trusted Storage Pool]
> ******************************************
> fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to
> connect to the host via ssh: Permission denied
> (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
> 	to retry, use: --limit @/tmp/tmpkn7S5u/gluster-peer-probe.retry
> 
> PLAY RECAP
> *********************************************************************
> 10.70.37.110               : ok=0    changed=0    unreachable=1    failed=0 
> 
> 
> 
> =================
> 
> 
> 
> With the build which has the fix
> # rpm -qa | grep gdeploy
> gdeploy-2.0.1-9.el7rhgs.noarch
> 
> 1. With using ignore_peer_errors option -
> 
> # cat gdeploy.conf 
> [hosts]
> 10.70.37.131
> 10.70.37.110
> 10.70.37.85
> 
> [peer]
> action=probe
> ignore_peer_errors=no
> 
> 
> # echo $?
> 1
> 
> 2.Without using ignore_peer_errors option
> 
> # cat gdeploy.conf 
> [hosts]
> 10.70.37.131
> 10.70.37.110
> 10.70.37.85
> 
> [peer]
> action=probe
> 
> # echo $?
> 0
> 
> 
> 
> Observation:
> 
> In previous build (on which the bug is filed) and the current build has the
> same observation with using ignore_peer_errors parameter in gdeploy file.

Sac - can you please take a look at this observation and provide your feedback?

Comment 10 Sachidananda Urs 2017-02-13 15:24:39 UTC
> > With the build which has the fix
> > # rpm -qa | grep gdeploy
> > gdeploy-2.0.1-9.el7rhgs.noarch
> > 
> > 1. With using ignore_peer_errors option -
> > 
> > # cat gdeploy.conf 
> > [hosts]
> > 10.70.37.131
> > 10.70.37.110
> > 10.70.37.85
> > 
> > [peer]
> > action=probe
> > ignore_peer_errors=no
> > 
> > 
> > # echo $?
> > 1
> > 
> > 2.Without using ignore_peer_errors option
> > 
> > # cat gdeploy.conf 
> > [hosts]
> > 10.70.37.131
> > 10.70.37.110
> > 10.70.37.85
> > 
> > [peer]
> > action=probe
> > 
> > # echo $?
> > 0
> > 
> > 
> > 
> > Observation:
> > 
> > In previous build (on which the bug is filed) and the current build has the
> > same observation with using ignore_peer_errors parameter in gdeploy file.
> 
> Sac - can you please take a look at this observation and provide your
> feedback?

That is the required behaviour, Ramesh had not set the ignore_peer_errors to no.
However, the bug in the broader meant that for any errors when exit was called
it should have non-zero exit status. That is addressed in the bug.

And observation what Manisha is making is right. Exit status should be non-zero.
Ramesh should take a look at the results once and confirm if this addresses their needs.

Comment 11 Ramesh N 2017-02-14 10:45:37 UTC
We hit this issue while failing to configure ssh and while giving wrong RAID type. If both of them are taken care and if gdeploy returns proper error code when have ignore_xx_errors=no then we can mark this bug as verified.

Comment 12 Manisha Saini 2017-02-14 12:52:41 UTC
Based on comment 8, comment 9,comment 10,

While using parameter ignore_peer_errors when passwordless-ssh is not configured on any node,Gdeploy return error code 1.
Based on this observation and above mentioned comments,moving this bug to verified state.

Comment 14 errata-xmlrpc 2017-03-23 05:09:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0482.html