Bug 1973640 - Hosted engine deploy fail in version 1.5.1 - VM is not managed by the engine
Summary: Hosted engine deploy fail in version 1.5.1 - VM is not managed by the engine
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-ansible-collection
Classification: oVirt
Component: hosted-engine-setup
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.7
: ---
Assignee: Yedidyah Bar David
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-18 11:29 UTC by Jiri Macku
Modified: 2021-07-06 07:28 UTC (History)
7 users (show)

Fixed In Version: ovirt-ansible-collection-1.5.2
Clone Of:
Environment:
Last Closed: 2021-07-06 07:28:22 UTC
oVirt Team: Integration
Embargoed:
pm-rhel: ovirt-4.4+
mperina: blocker?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-collection pull 301 0 None closed ovirt_vm: add default to check_placement_policy 2021-06-21 09:09:57 UTC
Github oVirt ovirt-ansible-collection pull 305 0 None open role: hosted_engine_setup: Do not "Sync on engine machine" in full_ex… 2021-06-22 13:28:57 UTC

Comment 5 Asaf Rachmani 2021-06-20 18:53:30 UTC
There are two issues here:
1) Task "Make the engine aware that the external VM is stopped" failed - this is probably because of the last change in the 'ovirt_vm' module [1], "igonore_erros" is used in this task so it will not cause the deployment to fail.
2) Task "Sync on engine machine" failed - probably a result of [2], the task runs at the end of the deployment when the local VM is already up and running (with a new IP address) but tries to reach the bootstrap VM.

[1] https://github.com/oVirt/ovirt-ansible-collection/pull/294
[2] https://github.com/oVirt/ovirt-ansible-collection/pull/277/files#diff-9ea40bc76fed1e1af239e7aebb3f7e93b018777b9d9fc40825c2ef05d6ddc282

Comment 6 Martin Necas 2021-06-21 09:09:58 UTC
The first part of this issue should be fixed after https://github.com/oVirt/ovirt-ansible-collection/pull/301

Comment 7 Nikolai Sednev 2021-06-21 10:17:26 UTC
Everything works just fine in case of deployment using 4.4.6 engine from old rhvm-appliance-4.4-20210527.0.el8ev.x86_64 and then during the deployment upgrading the engine to ovirt-engine-setup-base-4.4.7.3-0.3.el8ev.noarch and then continue with the deployment.
To make such deployment you will be required to run it using "hosted-engine --deploy --ansible-extra-vars=he_pause_host=true".

Comment 8 Yedidyah Bar David 2021-06-21 13:09:43 UTC
(In reply to Nikolai Sednev from comment #7)
> Everything works just fine in case of deployment using 4.4.6 engine from old
> rhvm-appliance-4.4-20210527.0.el8ev.x86_64 and then during the deployment
> upgrading the engine to ovirt-engine-setup-base-4.4.7.3-0.3.el8ev.noarch and
> then continue with the deployment.
> To make such deployment you will be required to run it using "hosted-engine
> --deploy --ansible-extra-vars=he_pause_host=true".

I think the bug is on ovirt-ansible-collection, not something inside the appliance.
Which ovirt-ansible-collection did you use?

Comment 10 Nikolai Sednev 2021-06-21 13:39:58 UTC
(In reply to Yedidyah Bar David from comment #8)
> (In reply to Nikolai Sednev from comment #7)
> > Everything works just fine in case of deployment using 4.4.6 engine from old
> > rhvm-appliance-4.4-20210527.0.el8ev.x86_64 and then during the deployment
> > upgrading the engine to ovirt-engine-setup-base-4.4.7.3-0.3.el8ev.noarch and
> > then continue with the deployment.
> > To make such deployment you will be required to run it using "hosted-engine
> > --deploy --ansible-extra-vars=he_pause_host=true".
> 
> I think the bug is on ovirt-ansible-collection, not something inside the
> appliance.
> Which ovirt-ansible-collection did you use?
On hosts:
alma03 ~]# rpm -qa | grep ansible
ansible-2.9.21-1.el8ae.noarch
ovirt-ansible-collection-1.5.0-1.el8ev.noarch

Same is on engine and even works fine with ansible-2.9.22-1.el8ae.noarch on engine.

Comment 11 Nikolai Sednev 2021-06-21 13:40:32 UTC
Here is the engine:
nsednev-he-1 ~]# rpm -qa | grep ansible
ansible-2.9.22-1.el8ae.noarch
python3-ansible-runner-1.4.6-2.el8ar.noarch
ovirt-ansible-collection-1.5.0-1.el8ev.noarch
ansible-runner-service-1.0.7-1.el8ev.noarch

Comment 12 Martin Necas 2021-06-21 15:47:03 UTC
The issue was introduced in ovirt-ansible-collection-1.5.1 so it is good that it works for you with ovirt-ansible-collection-1.5.0.

Comment 13 Yedidyah Bar David 2021-06-22 09:04:45 UTC
I think this bug only affects full_execution, not used by otopi cli frontend. Not sure about cockpit.

Comment 14 Yedidyah Bar David 2021-06-22 14:16:49 UTC
When discussing this in private with Asaf, he said the 'sync on engine machine' at the end of full_execution, which was added with the hope to make the later task to fetch logs more effective, is never useful - if we reached that point, the VM was already copied and started on the shared storage, and the logs inside the local vm are not interesting. PR 305 just removes this sync, and is an alternative to PR 303.

Comment 15 Nikolai Sednev 2021-06-22 16:25:23 UTC
Martin, please take a look on https://bugzilla.redhat.com/show_bug.cgi?id=1953029#c14 . I restored without any issues over ovirt-ansible-collection-1.5.1-1.el8ev.noarch on engine, from engine which was running ovirt-ansible-collection-1.5.0-1.el8ev.noarch.

Comment 16 Martin Necas 2021-06-22 17:36:12 UTC
Strange, you should get at least an error for the first part of this issue, as for the second part it might not have been full_execution or something like that.
Either way, I'll create a new release of the collection that will contain the fixes.

Comment 17 Asaf Rachmani 2021-06-22 23:17:27 UTC
Nikolai, in case you use "hosted-engine --deploy" you will not see the issues described in comment 5.
The task in the first issue uses "igonore_erros", so you probably can see it only in hosted-engine-setup log file.
The second one is when using full_execution, which means run the hosted-engine-setup role directly, "hosted-engine --deploy" uses partial_execution.

Comment 18 Nikolai Sednev 2021-06-23 06:40:18 UTC
What will do the customer? "hosted-engine --deploy". Then I think the severity of the bug should not be high as for me, manual deployment works just fine.

Comment 19 Yedidyah Bar David 2021-06-23 06:44:22 UTC
(In reply to Nikolai Sednev from comment #18)
> What will do the customer? "hosted-engine --deploy". Then I think the
> severity of the bug should not be high as for me, manual deployment works
> just fine.

Not sure about cockpit, perhaps it uses full_execution.

I considered this bug severe mainly because QE uses full_execution, thus
should probably be marked AutomationBlocker.

Comment 20 Yedidyah Bar David 2021-06-23 06:46:34 UTC
(In reply to Yedidyah Bar David from comment #19)
> I considered this bug severe mainly because QE uses full_execution, thus
> should probably be marked AutomationBlocker.

(And also because it was introduced by the patch for bug 1953029, so a new regression - no need to carry it over to future versions).

Comment 32 Kobi Hakimi 2021-06-27 06:03:11 UTC
verified with ovirt-ansible-collection-1.5.3

Comment 33 Sandro Bonazzola 2021-07-06 07:28:22 UTC
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.