Bug 1877279 - dwh status is overwritten by engine-setup
Summary: dwh status is overwritten by engine-setup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Setup.Engine
Version: 4.4.1
Hardware: Unspecified
OS: Unspecified
medium
medium vote
Target Milestone: ovirt-4.4.3
: 4.4.3.3
Assignee: Yedidyah Bar David
QA Contact: Pavel Novotny
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-09 09:41 UTC by Yedidyah Bar David
Modified: 2020-11-11 06:45 UTC (History)
2 users (show)

Fixed In Version: ovirt-engine-4.4.3.3
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-02 11:34:10 UTC
oVirt Team: Integration
pm-rhel: ovirt-4.4+
pm-rhel: planning_ack+
sbonazzo: devel_ack+
lleistne: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 111192 0 master MERGED dbscripts: Do not overwrite dwh_history_timekeeping 2020-11-02 11:38:00 UTC

Description Yedidyah Bar David 2020-09-09 09:41:29 UTC
Description of problem:

The following flow does not work as expected (see also: bug 1118350, bug 1024028, a few others):

1. Setup engine+dwh on machine A
2. Setup dwh on machine B. It prompts you to disconnect dwh on A.
3. Stop and disable dwh on A
4. Reply Yes to setup on machine B and continue.
5. Run engine-setup on machine A. It prompts you to disconnect dwh on B.
6. Reply "Yes", but do not stop/disable dwh on B.

Setup continues, but does not stop dwhd on B.
The code to do this was added for bug 1024028.

If you try the opposite - e.g., skip step 3 - it does fail as expected. So it seems to only happen on the engine machine.

I think that the bug exists since 4.0.0: https://gerrit.ovirt.org/50774 .

BTW - the patch for bug 1024028 "works" only on the engine machine - it does not try to stop dwh on a separate machine if engine-setup is not running on the engine machine. In principle, it should be easy to fix this. In practice, I wonder if it's better to completely revert bug 1024028, because we can't _disable_ the remote dwh, only stop it - so a next reboot will start it again.

Comment 1 Yedidyah Bar David 2020-09-13 07:38:34 UTC
Setting doc text -, mainly because in my testing, I still managed to make two dwhd's (on two machines) run against a single engine. Still not certain about the exact flow.

For now, if engine-setup prompts you to disconnect another dwh, if you want to continue (by replying 'Yes'), you should manually find this dwh instance, stop and disable it. Do not rely on engine-setup (or dwh) to prevent such a race.

Comment 2 Lucie Leistnerova 2020-09-30 09:55:31 UTC
What should be verified here please? That engine-setup stops/disables dwh service on separate machine when prompted?

Comment 3 Yedidyah Bar David 2020-10-11 06:14:46 UTC
(In reply to Lucie Leistnerova from comment #2)
> What should be verified here please? That engine-setup stops/disables dwh
> service on separate machine when prompted?

TL;DR: Either stops, or at least refuses to continue.

1. The specific flow to test, for verifying the linked fix, is exactly as
in comment 0.

2. It can't disable it (on the other machine). It also can't stop it, but
it _can_ ask the remote dwh to stop itself, and in certain conditions it
also does - but not on others. See also [1][2] if interested. I decided
to not include them in the scope of current bug, not even sure yet we want
them at all. So stopping/disabling is not strictly in scope.

3. What _is_ in scope is to make sure we do not allow two DWH instances to
run in parallel against a single engine database. Either that when setting
the "other", the "existing" one stops, or that the "other" refuses to continue
and fails (where the machines for "existing" and "other" can change roles
as many times as you wish, during testing, and you are also welcome to think
about other flows - involving more than two machines, whatever).

4. In my own testing, I did manage to cause this, filed bug 1878742, and
we decided to not fix it, being a rather unlikely flow.

[1] https://gerrit.ovirt.org/111200
[2] https://gerrit.ovirt.org/111201

Comment 6 Sandro Bonazzola 2020-11-11 06:45:31 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.