Bug 1827135 - failed to deploy hosted-engine 4.4 from 4.3 backup file due to versionlock
Summary: failed to deploy hosted-engine 4.4 from 4.3 backup file due to versionlock
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-ansible-collection
Classification: oVirt
Component: hosted-engine-setup
Version: 1.0.35
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ovirt-4.4.0
: 1.1.3
Assignee: Evgeny Slutsky
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-23 10:20 UTC by Evgeny Slutsky
Modified: 2020-05-20 20:03 UTC (History)
6 users (show)

Fixed In Version: ovirt-ansible-hosted-engine-setup-1.1.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 20:03:56 UTC
oVirt Team: Integration
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: blocker?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
sbonazzo: testing_ack?


Attachments (Terms of Use)
sosreport from the engine (5.64 MB, application/x-xz)
2020-05-18 17:11 UTC, Nikolai Sednev
no flags Details
sosreport from alma03 (6.19 MB, application/x-xz)
2020-05-18 17:12 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-hosted-engine-setup pull 318 0 None closed remove version lock from the engine 2020-11-11 02:42:47 UTC

Description Evgeny Slutsky 2020-04-23 10:20:13 UTC
Version-Release number of selected component (if applicable):

ovirt-ansible-engine-setup-1.2.3-1.el8.noarch



How reproducible:


Steps to Reproduce:
1. install 4.3 HE setup on EL7
2. backup 4.3 engine into shared storage
   engine-backup --mode=backup --file=engine_backup.tar.gz --log=engine_backup.log
2. reinstall EL8
3. run HE-4.4 deployment from 4.3 backup file:
  hosted-engine --deploy --restore-from-file=<file-he>


Actual results:
Deployment fails on error:

 [ ERROR ] fatal: [localhost -> engine2.es.localvms.com]: FAILED! => {"changed": false, "failures": ["No package ovirt-engine available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}


during `engine-backup --recover`  - versionlock is set to 4.3 packages,
so engine-setup ansible role fails on installing ovirt-engine package:
on https://github.com/oVirt/ovirt-ansible-engine-setup/blob/baa98047ba902201aac4f595986c29816ad4d8fc/tasks/install_packages.yml#L2



/usr/share/ansible/roles/ovirt.engine-setup/tasks/install_packages.yml
tries to install ovirt-engine package.

Expected results:
hosted-engine  4.4 deployed with 4.3 engine DB+Files backup.


Additional info:

Comment 1 Martin Necas 2020-04-23 11:03:34 UTC
I might misunderstand something, but shouldn't this issue be for hosted_engine_setup instead of engine_setup?
You should check if there are proper repositories installed.

Comment 2 Sandro Bonazzola 2020-04-23 13:26:51 UTC
(In reply to Martin Necas from comment #1)
> I might misunderstand something, but shouldn't this issue be for
> hosted_engine_setup instead of engine_setup?
> You should check if there are proper repositories installed.

repositories are right, provided by ovirt-release44 included in appliance.
Issue is that version.lock was backed up on 4.3.9 and restored on 4.4 so in version.lock there are still listed the 4.3.9 packages.
Please sync with Evgeny on what's the best flow here for not having version.lock interfering here.

Comment 3 Evgeny Slutsky 2020-04-23 13:31:18 UTC
we can resolve it in ovirt-ansible-hosted-engine by clearing the lock after running `engine-backup --mode=restore` and before triggering the role.

Comment 4 Michal Skrivanek 2020-04-23 15:33:01 UTC
why don't you fix restore not to restore versionlock when not on a same version?

Comment 5 Evgeny Slutsky 2020-04-25 07:53:32 UTC
(In reply to Michal Skrivanek from comment #4)
> why don't you fix restore not to restore versionlock when not on a same
> version?

I guess that this flow never been requested before from `engine-backup`. 
it does make sense if we decide that  `engine-backup` should support it,
we can add an exception to the 'engine-backup --restore' to exclude restoring versionlock.list  on different OS. 
@didi, what do you think?

Comment 6 Yedidyah Bar David 2020-04-27 06:44:47 UTC
(In reply to Evgeny Slutsky from comment #5)
> (In reply to Michal Skrivanek from comment #4)
> > why don't you fix restore not to restore versionlock when not on a same
> > version?

We didn't have to, in 3.6/el6->4.0/el7, and I'd first like to make sure this
is the best solution, right now.

> 
> I guess that this flow never been requested before from `engine-backup`. 
> it does make sense if we decide that  `engine-backup` should support it,
> we can add an exception to the 'engine-backup --restore' to exclude
> restoring versionlock.list  on different OS. 

We can also make the engine-setup role not try to update the engine, or
not fail the role if this fails. Did anyone analyze Pros/Cons of each approach?

Personally, I am not even sure the role should ever install the package.
If it does, perhaps it should somehow (automatically or optionally) do this
only on new setups, not upgrades.

Personally, while both patches seem simple and rather harmless, I tend to not
touch engine-backup this way, as it breaks an assumption we have that users
should never update the engine outside of engine-setup.

Perhaps it's time to reconsider this assumption, though. Eli - how hard/risky
would it be to stop using versionlock? And instead e.g. make the engine check,
on start, if it's compatible with the DB, and refuse to start otherwise? Was
this ever discussed?

Comment 7 Eli Mesika 2020-05-12 00:05:50 UTC
(In reply to Yedidyah Bar David from comment #6)

> Perhaps it's time to reconsider this assumption, though. Eli - how hard/risky
> would it be to stop using versionlock? And instead e.g. make the engine
> check,
> on start, if it's compatible with the DB, and refuse to start otherwise? Was
> this ever discussed?

I think this is possible, however, never discussed IIRC

Comment 8 Nikolai Sednev 2020-05-18 17:10:18 UTC
Tested backup and restore from engine:
rhvm-4.3.10.1-0.1.master.el7.noarch
Linux 3.10.0-1127.8.1.el7.x86_64 #1 SMP Fri Apr 24 14:56:59 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.8 (Maipo)

Host:
rhvm-appliance.x86_64 2:4.3-20200507.0.el7 rhv-4.3.10
ovirt-hosted-engine-setup-2.3.13-1.el7ev.noarch
ovirt-hosted-engine-ha-2.3.6-1.el7ev.noarch
Linux 3.10.0-1127.8.2.el7.x86_64 #1 SMP Thu May 7 19:30:37 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.8 (Maipo)

Restored on RHEL8.2 host, the same one, reprovisioned to RHEL8.2 and failed with:

[ INFO  ] TASK [ovirt.hosted_engine_setup : Fail with generic error]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, more info can be found in the engine logs, fix accordingly and re-deploy."}
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook


In engine log I see:
2020-05-18 20:00:03,367+03 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engi
ne-Thread-35) [7289cff0] Failed to migrate one or more VMs.
2020-05-18 19:50:23,852+03 ERROR [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread Pool -- 57) 
[] Error getting info for CPU ' ', not in expected format.


It looks like a different issue now, but the end result is the same, I was unable to restore the engine over 4.4 from 4.3.
Moving back to assigned and attaching sosreports from host alma03 and the engine.

Comment 9 Nikolai Sednev 2020-05-18 17:11:20 UTC
Created attachment 1689640 [details]
sosreport from the engine

Comment 10 Nikolai Sednev 2020-05-18 17:12:06 UTC
Created attachment 1689641 [details]
sosreport from alma03

Comment 11 Yedidyah Bar David 2020-05-19 06:14:13 UTC
(In reply to Nikolai Sednev from comment #8)
> Tested backup and restore from engine:
> rhvm-4.3.10.1-0.1.master.el7.noarch
> Linux 3.10.0-1127.8.1.el7.x86_64 #1 SMP Fri Apr 24 14:56:59 EDT 2020 x86_64
> x86_64 x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 7.8 (Maipo)
> 
> Host:
> rhvm-appliance.x86_64 2:4.3-20200507.0.el7 rhv-4.3.10
> ovirt-hosted-engine-setup-2.3.13-1.el7ev.noarch
> ovirt-hosted-engine-ha-2.3.6-1.el7ev.noarch
> Linux 3.10.0-1127.8.2.el7.x86_64 #1 SMP Thu May 7 19:30:37 EDT 2020 x86_64
> x86_64 x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 7.8 (Maipo)
> 
> Restored on RHEL8.2 host, the same one, reprovisioned to RHEL8.2 and failed
> with:
> 
> [ INFO  ] TASK [ovirt.hosted_engine_setup : Fail with generic error]
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host
> has been set in non_operational status, please check engine logs, more info
> can be found in the engine logs, fix accordingly and re-deploy."}
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
> system may not be provisioned according to the playbook results: please
> check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
> [ ERROR ] Failed to execute stage 'Closing up': Failed executing
> ansible-playbook
> 
> 
> In engine log I see:
> 2020-05-18 20:00:03,367+03 ERROR
> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
> (EE-ManagedThreadFactory-engi
> ne-Thread-35) [7289cff0] Failed to migrate one or more VMs.
> 2020-05-18 19:50:23,852+03 ERROR
> [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread
> Pool -- 57) 
> [] Error getting info for CPU ' ', not in expected format.
> 
> 
> It looks like a different issue now,

Indeed, so please create a new bug.

There, please state versions of the new components (host packages, appliance, etc.) and attach logs (sosreport is fine) also from the host. Thanks.

Current bug is about failure during engine-setup ansible role, and it seems you managed to pass this point, considering that your attachment includes an engine-setup log that finished successfully.

> but the end result is the same,

It's not the same. It failed in a different, later, point.

> I was
> unable to restore the engine over 4.4 from 4.3.

Many things can cause such a failure. Not all of them are the same bug.

Reusing BZ bugs for unrelated real bugs causes confusion later on. In current case, most likely even unrelated components/products. Please don't. Thanks!

Comment 12 Nikolai Sednev 2020-05-19 08:18:55 UTC
I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1837266 .
Closing this bug as verified forth to comment #11.

Comment 13 Sandro Bonazzola 2020-05-20 20:03:56 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.