Bug 1079726 - engine-cleanup should return different codes for different types of failures
Summary: engine-cleanup should return different codes for different types of failures
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Yedidyah Bar David
QA Contact: Jiri Belka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-23 14:30 UTC by Lev Veyde
Modified: 2016-03-11 07:33 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-11 07:33:58 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 35503 0 master MERGED packaging: setup: return different exit statuses Never
oVirt gerrit 35632 0 master MERGED core: support different exit codes Never

Description Lev Veyde 2014-03-23 14:30:08 UTC
Description of problem:

If engine-cleanup is run on the machine that the product hasn't been installed on, or being already cleaned up before, then the engine-cleanup fails to stop gracefully.

It attempt to create an answer file and then failing to do so, also outputs an error message about it, while trying to read it back.

Also in this case it returns a non-zero return value, while actually there is no real error cleaning up the machine, since it's not necessary.

[ INFO  ] Stage: Initializing
[ INFO  ] Stage: Environment setup
          Configuration files: ['/etc/ovirt-engine-setup.conf.d/10-packaging.conf', '/home/jenkins/workspace/ovirt-engine-3.3-setup-different-timezones/jenkins/jobs/misc/tz_testing_33/cleanup.file.otopi']
          Log file: /var/log/ovirt-engine/setup/ovirt-engine-remove-20140323161044.log
          Version: otopi-1.2.0_rc4 (otopi-1.2.0-0.11.rc4.el6ev)
[ ERROR ] Could not detect a completed product setup
          Please use the cleanup utility only after a setup or after an upgrade from an older installation.
[ ERROR ] Failed to execute stage 'Environment setup': Could not detect product setup
[ INFO  ] Stage: Clean up
          Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-remove-20140323161044.log
[ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20140323161045-cleanup.conf'
[ ERROR ] Failed to execute stage 'Clean up': [Errno 2] No such file or directory: '/var/lib/ovirt-engine/setup/answers/20140323161045-cleanup.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Execution of cleanup failed

Version-Release number of selected component (if applicable):
rhevm-3.3.2-0.1000.507.3705160.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. run engine-cleanup on a clean machine.
i.e. engine-cleanup --config-append=/home/jenkins/workspace/ovirt-engine-3.3-setup-different-timezones/jenkins/jobs/misc/tz_testing_33/cleanup.file.otopi

Actual results:
engine-cleanup fails with extra error messages and non-zero errorcode.

Expected results:
engine-cleanup must close gracefully right after notifying the user that the product can't be detected, with zero errorcode.

Additional info:

Comment 1 Alon Bar-Lev 2014-03-23 14:43:38 UTC
This is due to bug#1062717 fix.

We should move ownership of /var/lib/ovirt-enigne/answers to setup-base within spec.

Comment 2 Lev Veyde 2014-03-24 08:50:50 UTC
(In reply to Alon Bar-Lev from comment #1)
> This is due to bug#1062717 fix.
> 
> We should move ownership of /var/lib/ovirt-enigne/answers to setup-base
> within spec.

Thanks.

What about the non-zero return value on clean machine ?
That used to be not the case - previously on clean machine we were receiving 0, and we have scripts that depend on that behaviour.

Comment 3 Alon Bar-Lev 2014-03-24 14:58:50 UTC
The issue is in 3.4

I do not agree this issue is urgent.

Comment 4 Sandro Bonazzola 2014-03-24 15:00:36 UTC
(In reply to Alon Bar-Lev from comment #3)
> The issue is in 3.4

Lev just told me it's there also on 3.3.
Lev, can you confirm this affects also 3.3?

> 
> I do not agree this issue is urgent.

If it's automation blcoker, it's urgent.

Comment 5 Lev Veyde 2014-03-24 15:04:26 UTC
(In reply to Sandro Bonazzola from comment #4)
> (In reply to Alon Bar-Lev from comment #3)
> > The issue is in 3.4
> 
> Lev just told me it's there also on 3.3.
> Lev, can you confirm this affects also 3.3?
> 
> > 
> > I do not agree this issue is urgent.
> 
> If it's automation blcoker, it's urgent.

The change in the return code logic affects RHEV 3.3 as well - previously on a clean machine engine-cleanup was returning 0, no it's a non-0 return value.

Comment 7 Sandro Bonazzola 2014-03-24 15:25:21 UTC
severity is not urgent, priority is urgent. It's not causing catastrophic issues.

About 0 / non-0 return value: I'm fine with keeping non-0 if product is not installed (like rm on a non exiting file), but we should allow automation to detect it's not because of other kind of failure. Can we set different exit code?

Comment 8 Lev Veyde 2014-03-24 15:36:26 UTC
(In reply to Alon Bar-Lev from comment #3)
> The issue is in 3.4
> 
> I do not agree this issue is urgent.

We see the issue in RHEV 3.3

Comment 10 Yedidyah Bar David 2014-03-25 07:09:23 UTC
(In reply to Alon Bar-Lev from comment #1)
> This is due to bug#1062717 fix.
> 
> We should move ownership of /var/lib/ovirt-enigne/answers to setup-base
> within spec.

It's already there, checked 3.3.2 downstream and and master upstream.

rhevm-setup-3.3.2-0.50.el6ev.noarch

Not sure why it was missing in this specific case. Lev - can you check?

Anyway, not sure it's related - I now verified that it returns 1 even if it does manage to create the answer file.

(In reply to Sandro Bonazzola from comment #7)
> severity is not urgent, priority is urgent. It's not causing catastrophic
> issues.
> 
> About 0 / non-0 return value: I'm fine with keeping non-0 if product is not
> installed (like rm on a non exiting file), but we should allow automation to
> detect it's not because of other kind of failure. Can we set different exit
> code?

This will require a (simple) change in otopi.

How come we did not notice this in 3.3 a long time ago?

Comment 11 Alon Bar-Lev 2014-03-25 07:54:11 UTC
(In reply to Yedidyah Bar David from comment #10)
> (In reply to Sandro Bonazzola from comment #7)
> > severity is not urgent, priority is urgent. It's not causing catastrophic
> > issues.
> > 
> > About 0 / non-0 return value: I'm fine with keeping non-0 if product is not
> > installed (like rm on a non exiting file), but we should allow automation to
> > detect it's not because of other kind of failure. Can we set different exit
> > code?
> 
> This will require a (simple) change in otopi.
> 
> How come we did not notice this in 3.3 a long time ago?

It is fine to keep non zero, as command actually fails.

Comment 12 Alon Bar-Lev 2014-03-25 09:03:45 UTC
 Yedidyah Bar David 2014-03-25 04:56:05 EDT
External Bug ID: oVirt gerrit 26062

Once again, it is perfectly ok to fail if we have nothing to clean. This is since 3.3 and was not an issue.

Comment 13 Yedidyah Bar David 2014-03-25 09:16:25 UTC
There are here few different issues:

1. The current failure in jenkins. I still do not know what causes it. eedri looked a bit and did not manage to find what _changed_ that makes it fail now when it didn't fail in the past. Lev will check and update.

2. engine-cleanup fails 'because No such file or directory: '/var/lib/ovirt-engine/setup/answers/20140323161045-cleanup.conf'. Not clear either. Can be due to missing directory (although it's in the rpm), full disk, etc. Lev will also check this one and update.

3. engine-cleanup fails because it can't find an existing product setup. This currently _also_ happens in this jenkins job, because it runs engine-cleanup _twice_, but this is so for a long time (? Eyal says so) and so it's still not clear why it started failing now.

4. Both (2.) and (3.) currently return 1. It might make sense (at least Eyal thinks so and I agree) to return different values for different errors, e.g. to allow automated testing know why a tool failed and if this failure is expected or an error. For this one I pushed a draft change [1]. Not sure we need it, I mainly did this to answer Sandro.

[1] http://gerrit.ovirt.org/26062

Comment 14 Alon Bar-Lev 2014-03-25 09:28:20 UTC
Something that worked the same in 3.3 cannot be automation blocker.

Also, automation blocker cannot be something that is per product behavior.

Current implementataion of product is engine-cleanup returns non zero if it cannot cleanup.

This is not going to be changed in 3.4.

Comment 15 Lev Veyde 2014-03-25 12:31:09 UTC
(In reply to Yedidyah Bar David from comment #10)
> (In reply to Alon Bar-Lev from comment #1)
> > This is due to bug#1062717 fix.
> > 
> > We should move ownership of /var/lib/ovirt-enigne/answers to setup-base
> > within spec.
> 
> It's already there, checked 3.3.2 downstream and and master upstream.
> 
> rhevm-setup-3.3.2-0.50.el6ev.noarch
> 
> Not sure why it was missing in this specific case. Lev - can you check?
> 
> Anyway, not sure it's related - I now verified that it returns 1 even if it
> does manage to create the answer file.
> 
> (In reply to Sandro Bonazzola from comment #7)
> > severity is not urgent, priority is urgent. It's not causing catastrophic
> > issues.
> > 
> > About 0 / non-0 return value: I'm fine with keeping non-0 if product is not
> > installed (like rm on a non exiting file), but we should allow automation to
> > detect it's not because of other kind of failure. Can we set different exit
> > code?
> 
> This will require a (simple) change in otopi.
> 
> How come we did not notice this in 3.3 a long time ago?

The issue of return codes and it's effect on our CI env. is currently being further investigated.

I'm changing the priority of the bug to the medium, since we have a workaround for the issue to stop it from blocking us on automation.

Still we need a separate return values for the real failure and already clean state, but it's not urgent.

Comment 16 Alon Bar-Lev 2014-03-26 14:10:36 UTC
Not sure why this is downstream bug.

Comment 17 Sandro Bonazzola 2014-10-17 10:00:38 UTC
Re-targeted to 3.6.0: useful for automated testing but not a blocker for 3.5.

Comment 18 Jiri Belka 2015-07-29 08:31:19 UTC
ok, ovirt-engine-setup-base-3.6.0-0.0.master.20150627185750.git6f063c1.el6

engine-cleanup on clean env fails and returns code 11.


Note You need to log in before you can comment on or make changes to this bug.