Bug 1547018 - engine-setup clears dwh database on rollback if failure is at/after schema update
Summary: engine-setup clears dwh database on rollback if failure is at/after schema up...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine-dwh
Classification: oVirt
Component: Setup
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.2
: 4.2.2.1
Assignee: Yedidyah Bar David
QA Contact: Lucie Leistnerova
URL:
Whiteboard:
Depends On: 1547016
Blocks: 1631202
TreeView+ depends on / blocked
 
Reported: 2018-02-20 11:09 UTC by Yedidyah Bar David
Modified: 2018-12-10 09:24 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-dwh-4.2.2.1
Clone Of: 1547016
Environment:
Last Closed: 2018-03-29 11:10:13 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
lleistne: testing_plan_complete+
lsvaty: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 87929 0 master MERGED packaging: setup: postgres95: Do not clear db on upgrade rollback 2018-02-20 14:25:56 UTC
oVirt gerrit 87945 0 ovirt-engine-dwh-4.2 MERGED packaging: setup: postgres95: Do not clear db on upgrade rollback 2018-02-20 14:52:11 UTC

Description Yedidyah Bar David 2018-02-20 11:09:02 UTC
See below for engine bug. dwh code is quite similar, but not identical - because it allows, on failure, to postpone restore of the database to a later stage (so that we can start the engine quickly).

+++ This bug was initially created as a clone of Bug #1547016 +++

Description of problem:

The fix for bug 1492138 is not enough. Currently, if failure is after adding SchemaTransaction, abort clears the database.

Need to check what's the best solution. Definitely, if we do not backup, and it's not a new database, we should not clear.

Comment 1 Yedidyah Bar David 2018-02-20 15:12:29 UTC
QE: Reproduction/verification flows follow. For engine+dwh on same machine, see bug 1547016 comment 1.

For different machines:

On engine machine:

1. Install and Setup 4.1 engine

On dwh machine:

2. Install and setup 4.1 dwh to work with the engine on the engine machine. Also install otopi-debug-plugins if you want - see below

On engine machine:

3. Upgrade to 4.2 (update setup packages, engine-setup)

On dwh machine:

4. Update setup packages to 4.2

5. Run engine-setup and fail it. Please test with different fail points. You can automate this using the force_fail plugin of otopi-debug-plugins using one of these (please try all, please automate for the future):

OTOPI_FORCE_FAIL_STAGE=STAGE_EARLY_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_LOW engine-setup

OTOPI_FORCE_FAIL_STAGE=STAGE_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_HIGH engine-setup

OTOPI_FORCE_FAIL_STAGE=STAGE_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_DEFAULT engine-setup

OTOPI_FORCE_FAIL_STAGE=STAGE_MISC OTOPI_FORCE_FAIL_PRIORITY=PRIORITY_LOW engine-setup

6. engine-setup, also if rolling back successfully, leaves all services stopped. So:

systemctl start ovirt-engine-dwhd

7. Verify that dwhd eventually managed to come up and work well. You can try to check logs etc. Note that it most likely will refuse to come up, saying in the log something like:

2018-02-20 11:33:57|ON8ZJr|bPo6MJ|bPo6MJ|OVIRT_ENGINE_DWH|MinimalVersionCheck|Default|5|tDie|tDie_1|2018-02-20 11:33:57|You have upgraded your oVirt Engine and now require an upgrade of the ovirt-engine-dwh package. Please run engine-setup to upgrade the version. Service will now exit.|4
2018-02-20 11:33:57|You have upgraded your oVirt Engine and now require an upgrade of the ovirt-engine-dwh package. Please run engine-setup to upgrade the version. Service will now exit.
Exception in component tRunJob_2
java.lang.RuntimeException: Child job running failed
        at ovirt_engine_dwh.historyetl_4_1.HistoryETL.tRunJob_2Process(HistoryETL.java:8186)
        at ovirt_engine_dwh.historyetl_4_1.HistoryETL$3.run(HistoryETL.java:11674)

I think this should be enough. If you want to cheat, and just to make sure, you can run this on the engine database, and then restart dwhd. I am pretty certain this is harmless, although dwhd (and the engine) was definitely not designed to allow that:

update vdc_options set option_value='4.1.0' where option_name = 'MinimalETLVersion';

Then dwhd should come up nicely.

Comment 2 Yedidyah Bar David 2018-02-20 15:19:42 UTC
In principle you can run more than one attempt in step 5 above, and for current bug it should be enough to verify that the ovirt_engine_history database is not empty, e.g. using:

su - postgres -c 'pg_dump -s ovirt_engine_history' | grep '^CREATE TABLE' | wc -l

But if/when you automate this, perhaps better to check some more things.

Comment 3 Lucie Leistnerova 2018-03-07 15:29:18 UTC
I ran stage fails from 5. in Comment 1, some of them twice. Database is not empty after rollback and dhw service works as expected.

verified in ovirt-engine-dwh-setup-4.2.2.1-1.el7ev.noarch

I leave test_plan_complete to ?, automation will be done later.

Comment 4 Sandro Bonazzola 2018-03-29 11:10:13 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.