Bug 1877279

Summary: dwh status is overwritten by engine-setup
Product: [oVirt] ovirt-engine Reporter: Yedidyah Bar David <didi>
Component: Setup.EngineAssignee: Yedidyah Bar David <didi>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Novotny <pnovotny>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4.1CC: bugs, lleistne
Target Milestone: ovirt-4.4.3Flags: pm-rhel: ovirt-4.4+
pm-rhel: planning_ack+
sbonazzo: devel_ack+
lleistne: testing_ack+
Target Release: 4.4.3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.3.3 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-02 11:34:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2020-09-09 09:41:29 UTC
Description of problem:

The following flow does not work as expected (see also: bug 1118350, bug 1024028, a few others):

1. Setup engine+dwh on machine A
2. Setup dwh on machine B. It prompts you to disconnect dwh on A.
3. Stop and disable dwh on A
4. Reply Yes to setup on machine B and continue.
5. Run engine-setup on machine A. It prompts you to disconnect dwh on B.
6. Reply "Yes", but do not stop/disable dwh on B.

Setup continues, but does not stop dwhd on B.
The code to do this was added for bug 1024028.

If you try the opposite - e.g., skip step 3 - it does fail as expected. So it seems to only happen on the engine machine.

I think that the bug exists since 4.0.0: https://gerrit.ovirt.org/50774 .

BTW - the patch for bug 1024028 "works" only on the engine machine - it does not try to stop dwh on a separate machine if engine-setup is not running on the engine machine. In principle, it should be easy to fix this. In practice, I wonder if it's better to completely revert bug 1024028, because we can't _disable_ the remote dwh, only stop it - so a next reboot will start it again.

Comment 1 Yedidyah Bar David 2020-09-13 07:38:34 UTC
Setting doc text -, mainly because in my testing, I still managed to make two dwhd's (on two machines) run against a single engine. Still not certain about the exact flow.

For now, if engine-setup prompts you to disconnect another dwh, if you want to continue (by replying 'Yes'), you should manually find this dwh instance, stop and disable it. Do not rely on engine-setup (or dwh) to prevent such a race.

Comment 2 Lucie Leistnerova 2020-09-30 09:55:31 UTC
What should be verified here please? That engine-setup stops/disables dwh service on separate machine when prompted?

Comment 3 Yedidyah Bar David 2020-10-11 06:14:46 UTC
(In reply to Lucie Leistnerova from comment #2)
> What should be verified here please? That engine-setup stops/disables dwh
> service on separate machine when prompted?

TL;DR: Either stops, or at least refuses to continue.

1. The specific flow to test, for verifying the linked fix, is exactly as
in comment 0.

2. It can't disable it (on the other machine). It also can't stop it, but
it _can_ ask the remote dwh to stop itself, and in certain conditions it
also does - but not on others. See also [1][2] if interested. I decided
to not include them in the scope of current bug, not even sure yet we want
them at all. So stopping/disabling is not strictly in scope.

3. What _is_ in scope is to make sure we do not allow two DWH instances to
run in parallel against a single engine database. Either that when setting
the "other", the "existing" one stops, or that the "other" refuses to continue
and fails (where the machines for "existing" and "other" can change roles
as many times as you wish, during testing, and you are also welcome to think
about other flows - involving more than two machines, whatever).

4. In my own testing, I did manage to cause this, filed bug 1878742, and
we decided to not fix it, being a rather unlikely flow.

[1] https://gerrit.ovirt.org/111200
[2] https://gerrit.ovirt.org/111201

Comment 6 Sandro Bonazzola 2020-11-11 06:45:31 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.