There are two issues here: 1) Task "Make the engine aware that the external VM is stopped" failed - this is probably because of the last change in the 'ovirt_vm' module [1], "igonore_erros" is used in this task so it will not cause the deployment to fail. 2) Task "Sync on engine machine" failed - probably a result of [2], the task runs at the end of the deployment when the local VM is already up and running (with a new IP address) but tries to reach the bootstrap VM. [1] https://github.com/oVirt/ovirt-ansible-collection/pull/294 [2] https://github.com/oVirt/ovirt-ansible-collection/pull/277/files#diff-9ea40bc76fed1e1af239e7aebb3f7e93b018777b9d9fc40825c2ef05d6ddc282
The first part of this issue should be fixed after https://github.com/oVirt/ovirt-ansible-collection/pull/301
Everything works just fine in case of deployment using 4.4.6 engine from old rhvm-appliance-4.4-20210527.0.el8ev.x86_64 and then during the deployment upgrading the engine to ovirt-engine-setup-base-4.4.7.3-0.3.el8ev.noarch and then continue with the deployment. To make such deployment you will be required to run it using "hosted-engine --deploy --ansible-extra-vars=he_pause_host=true".
(In reply to Nikolai Sednev from comment #7) > Everything works just fine in case of deployment using 4.4.6 engine from old > rhvm-appliance-4.4-20210527.0.el8ev.x86_64 and then during the deployment > upgrading the engine to ovirt-engine-setup-base-4.4.7.3-0.3.el8ev.noarch and > then continue with the deployment. > To make such deployment you will be required to run it using "hosted-engine > --deploy --ansible-extra-vars=he_pause_host=true". I think the bug is on ovirt-ansible-collection, not something inside the appliance. Which ovirt-ansible-collection did you use?
(In reply to Yedidyah Bar David from comment #8) > (In reply to Nikolai Sednev from comment #7) > > Everything works just fine in case of deployment using 4.4.6 engine from old > > rhvm-appliance-4.4-20210527.0.el8ev.x86_64 and then during the deployment > > upgrading the engine to ovirt-engine-setup-base-4.4.7.3-0.3.el8ev.noarch and > > then continue with the deployment. > > To make such deployment you will be required to run it using "hosted-engine > > --deploy --ansible-extra-vars=he_pause_host=true". > > I think the bug is on ovirt-ansible-collection, not something inside the > appliance. > Which ovirt-ansible-collection did you use? On hosts: alma03 ~]# rpm -qa | grep ansible ansible-2.9.21-1.el8ae.noarch ovirt-ansible-collection-1.5.0-1.el8ev.noarch Same is on engine and even works fine with ansible-2.9.22-1.el8ae.noarch on engine.
Here is the engine: nsednev-he-1 ~]# rpm -qa | grep ansible ansible-2.9.22-1.el8ae.noarch python3-ansible-runner-1.4.6-2.el8ar.noarch ovirt-ansible-collection-1.5.0-1.el8ev.noarch ansible-runner-service-1.0.7-1.el8ev.noarch
The issue was introduced in ovirt-ansible-collection-1.5.1 so it is good that it works for you with ovirt-ansible-collection-1.5.0.
I think this bug only affects full_execution, not used by otopi cli frontend. Not sure about cockpit.
When discussing this in private with Asaf, he said the 'sync on engine machine' at the end of full_execution, which was added with the hope to make the later task to fetch logs more effective, is never useful - if we reached that point, the VM was already copied and started on the shared storage, and the logs inside the local vm are not interesting. PR 305 just removes this sync, and is an alternative to PR 303.
Martin, please take a look on https://bugzilla.redhat.com/show_bug.cgi?id=1953029#c14 . I restored without any issues over ovirt-ansible-collection-1.5.1-1.el8ev.noarch on engine, from engine which was running ovirt-ansible-collection-1.5.0-1.el8ev.noarch.
Strange, you should get at least an error for the first part of this issue, as for the second part it might not have been full_execution or something like that. Either way, I'll create a new release of the collection that will contain the fixes.
Nikolai, in case you use "hosted-engine --deploy" you will not see the issues described in comment 5. The task in the first issue uses "igonore_erros", so you probably can see it only in hosted-engine-setup log file. The second one is when using full_execution, which means run the hosted-engine-setup role directly, "hosted-engine --deploy" uses partial_execution.
What will do the customer? "hosted-engine --deploy". Then I think the severity of the bug should not be high as for me, manual deployment works just fine.
(In reply to Nikolai Sednev from comment #18) > What will do the customer? "hosted-engine --deploy". Then I think the > severity of the bug should not be high as for me, manual deployment works > just fine. Not sure about cockpit, perhaps it uses full_execution. I considered this bug severe mainly because QE uses full_execution, thus should probably be marked AutomationBlocker.
(In reply to Yedidyah Bar David from comment #19) > I considered this bug severe mainly because QE uses full_execution, thus > should probably be marked AutomationBlocker. (And also because it was introduced by the patch for bug 1953029, so a new regression - no need to carry it over to future versions).
verified with ovirt-ansible-collection-1.5.3
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.