Bug 1416361
Summary: | Gdeploy returns exit code '0' even when it fails to complete all the tasks | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ramesh N <rnachimu> |
Component: | gdeploy | Assignee: | Sachidananda Urs <surs> |
Status: | CLOSED ERRATA | QA Contact: | Manisha Saini <msaini> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.1 | CC: | amukherj, dkota, msaini, rcyriac, rhs-bugs, rnachimu, sabose, sac, sbhaloth, smohan, storage-qa-internal, surs |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | gdeploy-2.0.1-9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:09:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1277939, 1351528 |
Description
Ramesh N
2017-01-25 11:11:41 UTC
Commit: https://github.com/gluster/gdeploy/commit/d928be7 fixes the issue. Tested this with gdeploy-2.0.1-9.el7rhgs.noarch. If gdeploy fails to execute some task it still returns 0 as exit code Steps: 1.Create gdeploy file with below configuration 2.Avoid doing passwordless ssh for host 10.70.37.110 3.Run gdeploy conf file Gdeploy.conf [hosts] 10.70.37.131 10.70.37.110 10.70.37.85 [peer] action=probe ============== # gdeploy -c gdeploy.conf PLAY [master] ****************************************************************** TASK [Creates a Trusted Storage Pool] ****************************************** fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true} to retry, use: --limit @/tmp/tmpTyzfus/gluster-peer-probe.retry PLAY RECAP ********************************************************************* 10.70.37.110 : ok=0 changed=0 unreachable=1 failed=0 Ignoring errors... ===================== Check output of echo $? # echo $? 0 It still returns 0 as exit code. Sac/Devyani, Looking at the fix mentioned at https://github.com/gluster/gdeploy/commit/d928be7 may I ask you what's the coverage of cleanup_and_quit ()? IIUC (with my limited knowledge of gDeploy) this function is not been called for all error handling cases, in that case this fix might not be correct from a generic perspective? Manisha, I got a chance to talk to Sac and we beleive there is a key point missed out while testing this bug, please note by default gDeploy ignore the errors and hence it has a ignore_error parameter, in this particular case to over ride that behaviour can you please retest the scenario with ignore_peer_errors=no ? I am moving this BZ back to ON_QA, in case it doesn't work please mark it failed again. Tested this bug on the build on which it was filed. # rpm -qa | grep gdeploy gdeploy-2.0.1-8.el7rhgs.noarch 1. With using ignore_peer_errors option - Gdeploy conf file- --------- [hosts] 10.70.37.131 10.70.37.110 10.70.37.85 [peer] action=probe ignore_peer_errors=no ------- # echo $? 1 ================= # gdeploy -c gdeploy.conf PLAY [master] ****************************************************************** TASK [Creates a Trusted Storage Pool] ****************************************** fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true} to retry, use: --limit @/tmp/tmpm4XEuQ/gluster-peer-probe.retry PLAY RECAP ********************************************************************* 10.70.37.110 : ok=0 changed=0 unreachable=1 failed=0 Ignoring errors... ================= 2.Without using ignore_peer_errors option Conf file ------------ [hosts] 10.70.37.131 10.70.37.110 10.70.37.85 [peer] action=probe ----------- # echo $? 0 ================= # gdeploy -c gdeploy.conf PLAY [master] ****************************************************************** TASK [Creates a Trusted Storage Pool] ****************************************** fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true} to retry, use: --limit @/tmp/tmpkn7S5u/gluster-peer-probe.retry PLAY RECAP ********************************************************************* 10.70.37.110 : ok=0 changed=0 unreachable=1 failed=0 ================= With the build which has the fix # rpm -qa | grep gdeploy gdeploy-2.0.1-9.el7rhgs.noarch 1. With using ignore_peer_errors option - # cat gdeploy.conf [hosts] 10.70.37.131 10.70.37.110 10.70.37.85 [peer] action=probe ignore_peer_errors=no # echo $? 1 2.Without using ignore_peer_errors option # cat gdeploy.conf [hosts] 10.70.37.131 10.70.37.110 10.70.37.85 [peer] action=probe # echo $? 0 Observation: In previous build (on which the bug is filed) and the current build has the same observation with using ignore_peer_errors parameter in gdeploy file. Tested this by giving raid option as none in gdeploy file Steps: 1. Create gdeploy config file with some mistake like giving raid type as 'none' 2. Executes gdeploy -c <config-file> 3. Gdeploy fails to execute with some error 4. check the result of 'echo $?' Result: #gdeploy -c gdeploy.conf Error: Unsupported disk type! [root@dhcp37-110 gdeploy]# echo $? 1 @Ramesh: This bug reported has two issues addressed: 1.Giving wrong raid type 2.By not configuring passwordless ssh (Same issue can be reproduced as mentioned in additional info) According to my findings Issue 1 is resolved in the build gdeploy-2.0.1-9.el7rhgs.noarch But for issue second,I have the same observation on both the builds (the one on which this bug was opened gdeploy-2.0.1-8.el7rhgs.noarch and the one having the fix gdeploy-2.0.1-9.el7rhgs.noarch) Result for issue 2 is there in comment #7 Ramesh can you update this bug with your findings with the fix??Your observation on not configuring passwordless-ssh?? (In reply to Manisha Saini from comment #7) > Tested this bug on the build on which it was filed. > > # rpm -qa | grep gdeploy > gdeploy-2.0.1-8.el7rhgs.noarch > > 1. With using ignore_peer_errors option - > > > Gdeploy conf file- > --------- > [hosts] > 10.70.37.131 > 10.70.37.110 > 10.70.37.85 > > [peer] > action=probe > ignore_peer_errors=no > > ------- > # echo $? > 1 > > ================= > # gdeploy -c gdeploy.conf > > PLAY [master] > ****************************************************************** > > TASK [Creates a Trusted Storage Pool] > ****************************************** > fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to > connect to the host via ssh: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true} > to retry, use: --limit @/tmp/tmpm4XEuQ/gluster-peer-probe.retry > > PLAY RECAP > ********************************************************************* > 10.70.37.110 : ok=0 changed=0 unreachable=1 failed=0 > > > Ignoring errors... > > ================= > > 2.Without using ignore_peer_errors option > > Conf file > > ------------ > [hosts] > 10.70.37.131 > 10.70.37.110 > 10.70.37.85 > > [peer] > action=probe > > ----------- > # echo $? > 0 > > > ================= > # gdeploy -c gdeploy.conf > > PLAY [master] > ****************************************************************** > > TASK [Creates a Trusted Storage Pool] > ****************************************** > fatal: [10.70.37.110]: UNREACHABLE! => {"changed": false, "msg": "Failed to > connect to the host via ssh: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true} > to retry, use: --limit @/tmp/tmpkn7S5u/gluster-peer-probe.retry > > PLAY RECAP > ********************************************************************* > 10.70.37.110 : ok=0 changed=0 unreachable=1 failed=0 > > > > ================= > > > > With the build which has the fix > # rpm -qa | grep gdeploy > gdeploy-2.0.1-9.el7rhgs.noarch > > 1. With using ignore_peer_errors option - > > # cat gdeploy.conf > [hosts] > 10.70.37.131 > 10.70.37.110 > 10.70.37.85 > > [peer] > action=probe > ignore_peer_errors=no > > > # echo $? > 1 > > 2.Without using ignore_peer_errors option > > # cat gdeploy.conf > [hosts] > 10.70.37.131 > 10.70.37.110 > 10.70.37.85 > > [peer] > action=probe > > # echo $? > 0 > > > > Observation: > > In previous build (on which the bug is filed) and the current build has the > same observation with using ignore_peer_errors parameter in gdeploy file. Sac - can you please take a look at this observation and provide your feedback?
> > With the build which has the fix
> > # rpm -qa | grep gdeploy
> > gdeploy-2.0.1-9.el7rhgs.noarch
> >
> > 1. With using ignore_peer_errors option -
> >
> > # cat gdeploy.conf
> > [hosts]
> > 10.70.37.131
> > 10.70.37.110
> > 10.70.37.85
> >
> > [peer]
> > action=probe
> > ignore_peer_errors=no
> >
> >
> > # echo $?
> > 1
> >
> > 2.Without using ignore_peer_errors option
> >
> > # cat gdeploy.conf
> > [hosts]
> > 10.70.37.131
> > 10.70.37.110
> > 10.70.37.85
> >
> > [peer]
> > action=probe
> >
> > # echo $?
> > 0
> >
> >
> >
> > Observation:
> >
> > In previous build (on which the bug is filed) and the current build has the
> > same observation with using ignore_peer_errors parameter in gdeploy file.
>
> Sac - can you please take a look at this observation and provide your
> feedback?
That is the required behaviour, Ramesh had not set the ignore_peer_errors to no.
However, the bug in the broader meant that for any errors when exit was called
it should have non-zero exit status. That is addressed in the bug.
And observation what Manisha is making is right. Exit status should be non-zero.
Ramesh should take a look at the results once and confirm if this addresses their needs.
We hit this issue while failing to configure ssh and while giving wrong RAID type. If both of them are taken care and if gdeploy returns proper error code when have ignore_xx_errors=no then we can mark this bug as verified. Based on comment 8, comment 9,comment 10, While using parameter ignore_peer_errors when passwordless-ssh is not configured on any node,Gdeploy return error code 1. Based on this observation and above mentioned comments,moving this bug to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0482.html |