See below for engine bug. dwh code is quite similar, but not identical - because it allows, on failure, to postpone restore of the database to a later stage (so that we can start the engine quickly). +++ This bug was initially created as a clone of Bug #1547016 +++ Description of problem: The fix for bug 1492138 is not enough. Currently, if failure is after adding SchemaTransaction, abort clears the database. Need to check what's the best solution. Definitely, if we do not backup, and it's not a new database, we should not clear.
QE: Reproduction/verification flows follow. For engine+dwh on same machine, see bug 1547016 comment 1. For different machines: On engine machine: 1. Install and Setup 4.1 engine On dwh machine: 2. Install and setup 4.1 dwh to work with the engine on the engine machine. Also install otopi-debug-plugins if you want - see below On engine machine: 3. Upgrade to 4.2 (update setup packages, engine-setup) On dwh machine: 4. Update setup packages to 4.2 5. Run engine-setup and fail it. Please test with different fail points. You can automate this using the force_fail plugin of otopi-debug-plugins using one of these (please try all, please automate for the future): OTOPI_FORCE_FAIL_STAGE=STAGE_EARLY_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_LOW engine-setup OTOPI_FORCE_FAIL_STAGE=STAGE_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_HIGH engine-setup OTOPI_FORCE_FAIL_STAGE=STAGE_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_DEFAULT engine-setup OTOPI_FORCE_FAIL_STAGE=STAGE_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_LOW engine-setup 6. engine-setup, also if rolling back successfully, leaves all services stopped. So: systemctl start ovirt-engine-dwhd 7. Verify that dwhd eventually managed to come up and work well. You can try to check logs etc. Note that it most likely will refuse to come up, saying in the log something like: 2018-02-20 11:33:57|ON8ZJr|bPo6MJ|bPo6MJ|OVIRT_ENGINE_DWH|MinimalVersionCheck|Default|5|tDie|tDie_1|2018-02-20 11:33:57|You have upgraded your oVirt Engine and now require an upgrade of the ovirt-engine-dwh package. Please run engine-setup to upgrade the version. Service will now exit.|4 2018-02-20 11:33:57|You have upgraded your oVirt Engine and now require an upgrade of the ovirt-engine-dwh package. Please run engine-setup to upgrade the version. Service will now exit. Exception in component tRunJob_2 java.lang.RuntimeException: Child job running failed at ovirt_engine_dwh.historyetl_4_1.HistoryETL.tRunJob_2Process(HistoryETL.java:8186) at ovirt_engine_dwh.historyetl_4_1.HistoryETL$3.run(HistoryETL.java:11674) I think this should be enough. If you want to cheat, and just to make sure, you can run this on the engine database, and then restart dwhd. I am pretty certain this is harmless, although dwhd (and the engine) was definitely not designed to allow that: update vdc_options set option_value='4.1.0' where option_name = 'MinimalETLVersion'; Then dwhd should come up nicely.
In principle you can run more than one attempt in step 5 above, and for current bug it should be enough to verify that the ovirt_engine_history database is not empty, e.g. using: su - postgres -c 'pg_dump -s ovirt_engine_history' | grep '^CREATE TABLE' | wc -l But if/when you automate this, perhaps better to check some more things.
I ran stage fails from 5. in Comment 1, some of them twice. Database is not empty after rollback and dhw service works as expected. verified in ovirt-engine-dwh-setup-4.2.2.1-1.el7ev.noarch I leave test_plan_complete to ?, automation will be done later.
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.