Description of problem: If engine-cleanup is run on the machine that the product hasn't been installed on, or being already cleaned up before, then the engine-cleanup fails to stop gracefully. It attempt to create an answer file and then failing to do so, also outputs an error message about it, while trying to read it back. Also in this case it returns a non-zero return value, while actually there is no real error cleaning up the machine, since it's not necessary. [ INFO ] Stage: Initializing [ INFO ] Stage: Environment setup Configuration files: ['/etc/ovirt-engine-setup.conf.d/10-packaging.conf', '/home/jenkins/workspace/ovirt-engine-3.3-setup-different-timezones/jenkins/jobs/misc/tz_testing_33/cleanup.file.otopi'] Log file: /var/log/ovirt-engine/setup/ovirt-engine-remove-20140323161044.log Version: otopi-1.2.0_rc4 (otopi-1.2.0-0.11.rc4.el6ev) [ ERROR ] Could not detect a completed product setup Please use the cleanup utility only after a setup or after an upgrade from an older installation. [ ERROR ] Failed to execute stage 'Environment setup': Could not detect product setup [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-remove-20140323161044.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20140323161045-cleanup.conf' [ ERROR ] Failed to execute stage 'Clean up': [Errno 2] No such file or directory: '/var/lib/ovirt-engine/setup/answers/20140323161045-cleanup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of cleanup failed Version-Release number of selected component (if applicable): rhevm-3.3.2-0.1000.507.3705160.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1. run engine-cleanup on a clean machine. i.e. engine-cleanup --config-append=/home/jenkins/workspace/ovirt-engine-3.3-setup-different-timezones/jenkins/jobs/misc/tz_testing_33/cleanup.file.otopi Actual results: engine-cleanup fails with extra error messages and non-zero errorcode. Expected results: engine-cleanup must close gracefully right after notifying the user that the product can't be detected, with zero errorcode. Additional info:
This is due to bug#1062717 fix. We should move ownership of /var/lib/ovirt-enigne/answers to setup-base within spec.
(In reply to Alon Bar-Lev from comment #1) > This is due to bug#1062717 fix. > > We should move ownership of /var/lib/ovirt-enigne/answers to setup-base > within spec. Thanks. What about the non-zero return value on clean machine ? That used to be not the case - previously on clean machine we were receiving 0, and we have scripts that depend on that behaviour.
The issue is in 3.4 I do not agree this issue is urgent.
(In reply to Alon Bar-Lev from comment #3) > The issue is in 3.4 Lev just told me it's there also on 3.3. Lev, can you confirm this affects also 3.3? > > I do not agree this issue is urgent. If it's automation blcoker, it's urgent.
(In reply to Sandro Bonazzola from comment #4) > (In reply to Alon Bar-Lev from comment #3) > > The issue is in 3.4 > > Lev just told me it's there also on 3.3. > Lev, can you confirm this affects also 3.3? > > > > > I do not agree this issue is urgent. > > If it's automation blcoker, it's urgent. The change in the return code logic affects RHEV 3.3 as well - previously on a clean machine engine-cleanup was returning 0, no it's a non-0 return value.
severity is not urgent, priority is urgent. It's not causing catastrophic issues. About 0 / non-0 return value: I'm fine with keeping non-0 if product is not installed (like rm on a non exiting file), but we should allow automation to detect it's not because of other kind of failure. Can we set different exit code?
(In reply to Alon Bar-Lev from comment #3) > The issue is in 3.4 > > I do not agree this issue is urgent. We see the issue in RHEV 3.3
(In reply to Alon Bar-Lev from comment #1) > This is due to bug#1062717 fix. > > We should move ownership of /var/lib/ovirt-enigne/answers to setup-base > within spec. It's already there, checked 3.3.2 downstream and and master upstream. rhevm-setup-3.3.2-0.50.el6ev.noarch Not sure why it was missing in this specific case. Lev - can you check? Anyway, not sure it's related - I now verified that it returns 1 even if it does manage to create the answer file. (In reply to Sandro Bonazzola from comment #7) > severity is not urgent, priority is urgent. It's not causing catastrophic > issues. > > About 0 / non-0 return value: I'm fine with keeping non-0 if product is not > installed (like rm on a non exiting file), but we should allow automation to > detect it's not because of other kind of failure. Can we set different exit > code? This will require a (simple) change in otopi. How come we did not notice this in 3.3 a long time ago?
(In reply to Yedidyah Bar David from comment #10) > (In reply to Sandro Bonazzola from comment #7) > > severity is not urgent, priority is urgent. It's not causing catastrophic > > issues. > > > > About 0 / non-0 return value: I'm fine with keeping non-0 if product is not > > installed (like rm on a non exiting file), but we should allow automation to > > detect it's not because of other kind of failure. Can we set different exit > > code? > > This will require a (simple) change in otopi. > > How come we did not notice this in 3.3 a long time ago? It is fine to keep non zero, as command actually fails.
Yedidyah Bar David 2014-03-25 04:56:05 EDT External Bug ID: oVirt gerrit 26062 Once again, it is perfectly ok to fail if we have nothing to clean. This is since 3.3 and was not an issue.
There are here few different issues: 1. The current failure in jenkins. I still do not know what causes it. eedri looked a bit and did not manage to find what _changed_ that makes it fail now when it didn't fail in the past. Lev will check and update. 2. engine-cleanup fails 'because No such file or directory: '/var/lib/ovirt-engine/setup/answers/20140323161045-cleanup.conf'. Not clear either. Can be due to missing directory (although it's in the rpm), full disk, etc. Lev will also check this one and update. 3. engine-cleanup fails because it can't find an existing product setup. This currently _also_ happens in this jenkins job, because it runs engine-cleanup _twice_, but this is so for a long time (? Eyal says so) and so it's still not clear why it started failing now. 4. Both (2.) and (3.) currently return 1. It might make sense (at least Eyal thinks so and I agree) to return different values for different errors, e.g. to allow automated testing know why a tool failed and if this failure is expected or an error. For this one I pushed a draft change [1]. Not sure we need it, I mainly did this to answer Sandro. [1] http://gerrit.ovirt.org/26062
Something that worked the same in 3.3 cannot be automation blocker. Also, automation blocker cannot be something that is per product behavior. Current implementataion of product is engine-cleanup returns non zero if it cannot cleanup. This is not going to be changed in 3.4.
(In reply to Yedidyah Bar David from comment #10) > (In reply to Alon Bar-Lev from comment #1) > > This is due to bug#1062717 fix. > > > > We should move ownership of /var/lib/ovirt-enigne/answers to setup-base > > within spec. > > It's already there, checked 3.3.2 downstream and and master upstream. > > rhevm-setup-3.3.2-0.50.el6ev.noarch > > Not sure why it was missing in this specific case. Lev - can you check? > > Anyway, not sure it's related - I now verified that it returns 1 even if it > does manage to create the answer file. > > (In reply to Sandro Bonazzola from comment #7) > > severity is not urgent, priority is urgent. It's not causing catastrophic > > issues. > > > > About 0 / non-0 return value: I'm fine with keeping non-0 if product is not > > installed (like rm on a non exiting file), but we should allow automation to > > detect it's not because of other kind of failure. Can we set different exit > > code? > > This will require a (simple) change in otopi. > > How come we did not notice this in 3.3 a long time ago? The issue of return codes and it's effect on our CI env. is currently being further investigated. I'm changing the priority of the bug to the medium, since we have a workaround for the issue to stop it from blocking us on automation. Still we need a separate return values for the real failure and already clean state, but it's not urgent.
Not sure why this is downstream bug.
Re-targeted to 3.6.0: useful for automated testing but not a blocker for 3.5.
ok, ovirt-engine-setup-base-3.6.0-0.0.master.20150627185750.git6f063c1.el6 engine-cleanup on clean env fails and returns code 11.