Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1991171

Summary: Backup was created by version '4.4.7.7' and can not be restored using the installed version 4.4.7.6
Product: Red Hat Enterprise Virtualization Manager Reporter: schandle
Component: ovirt-ansible-collectionAssignee: Yedidyah Bar David <didi>
Status: CLOSED ERRATA QA Contact: Pavol Brilla <pbrilla>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.7CC: dfediuck, emarcus, gdeolive, mavital, michal.skrivanek, mperina
Target Milestone: ovirt-4.4.8Flags: emarcus: needinfo-
Target Release: 4.4.8   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-ansible-collection-1.6.0-1 Doc Type: Enhancement
Doc Text:
Since Red Hat Virtualization 4.4.7, the engine-backup refuses to restore to a version older than the one used for backup. This causes 'hosted-engine --restore-from-file' to fail if the latest appliance is older than the latest Manager. In this release, such a scenario does not fail, but prompts the user to connect via SSH to the Manager virtual machine and fix the restore issue.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-08 14:12:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1991914    

Description schandle 2021-08-07 21:01:57 UTC
Description of problem:
The available rhv-appliance will not allow the user to restore from backup if they are on the latest RHV 4.4.7.7.


Version-Release number of selected component (if applicable):
rhvm-appliance-4.4-20210715.0.el8ev.ova       contains rhvm-4.4.7.6-0.11.el8ev.noarch

Backup: rhvm-4.4.7.7-0.1.el8ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Upgrade SHE to latest 4.4 release 
2. Take backup 
3. Restore from backup on latest RHVH host or download ova from customer portal 

Actual results:
Fails to restore for version mismatch
~~~
2021-08-06 15:24:12,347-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:105 {'msg': 'non-zero return code', 'cmd': 'engine-backup --mode=restore --log=/var/log/ovirt-engine/setup/restore-backup-$(date -u +%Y%m%d%H%M%S).log --file=/root/engine_backup --provision-all-databases --restore-permissions', 'stdout': "Start of engine-backup with mode 'restore'\nscope: all\narchive file: /root/engine_backup\nlog file: /var/log/ovirt-engine/setup/restore-backup-20210806192411.log\nPreparing to restore:\n- Unpacking file '/root/engine_backup'", 'stderr': "FATAL: Backup was created by version '4.4.7.7' and can not be restored using the installed version 4.4.7.6", 'rc': 1, 'start': '2021-08-06 15:24:11.634373', 'end': '2021-08-06 15:24:12.127210', 'delta': '0:00:00.492837', 'changed': True, 'invocation': {'module_args': {'_raw_params': 'engine-backup --mode=restore --log=/var/log/ovirt-engine/setup/restore-backup-$(date -u +%Y%m%d%H%M%S).log --file=/root/engine_backup --provision-all-databases --restore-permissions', '_uses_shell': True, 'warn': True, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': ["Start of engine-backup with mode 'restore'", 'scope: all', 'archive file: /root/engine_backup', 'log file: /var/log/ovirt-engine/setup/restore-backup-20210806192411.log', 'Preparing to restore:', "- Unpacking file '/root/engine_backup'"], 'stderr_lines': ["FATAL: Backup was created by version '4.4.7.7' and can not be restored using the installed version 4.4.7.6"], '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_host': '192.168.222.114', 'ansible_port': None, 'ansible_user': 'root'}}
~~~


Expected results:
To download the latest RHV 4.4 appliance and be able to restore from backup 

Additional info:
Cu restored all the hosts, and now looking to restore self hosted engine and is stuck

Comment 2 Michal Skrivanek 2021-08-09 08:33:06 UTC
Documentation is stating wrong requirement only about major versions:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/administration_guide/index#Restoring_a_Backup_with_the_engine-backup_Command
With major changes done within 4.4.z we can't really keep supporting restoring on any 4.4.z

Documentation should also ask to upgrade the appliance deployment before restoring from backup, thought the error massage can be improved as well.

Comment 3 Yedidyah Bar David 2021-08-09 09:19:17 UTC
Some comments/thoughts:

1. This behavior is deliberate, started with the fix to bug 1932392.

2. One possible solution is that we'll release an appliance with every engine release. Right now that's not very easy to commit to, as it involves quite some manual work.

3. Another solution/workaround is to patch the code, or add an enginevm_before_engine_setup hook, to update relevant/all packages. This should be rather easy, but won't work if the updated packages are not available - e.g. on a disconnected system.

4. We can also rethink bug 1932392, although I do not think there is any fault in the reasoning there, in theory.

Comment 4 Yedidyah Bar David 2021-08-09 10:24:41 UTC
Another workaround: Use he_pause_before_engine_setup, so that you can login to the engine machine, register/subscribe/add channels, update packages, then continue. See also bug 1960188, bug 1959273.

Comment 5 Yedidyah Bar David 2021-08-09 10:33:46 UTC
Another solution, more complex but might be worth considering:

Add another option to hosted-engine --deploy, e.g. "--restore-from-image", which would allow restoring the engine VM from an image in some format (e.g. qcow/ova) instead of the appliance, and provide tools/docs for taking a backup of the engine machine in that format.

If we go that way, I'd personally never consider it as the only option - I think we should always keep the option to "start from scratch" using --restore-from-file, also to clean accumulated dirt on the VM - and such a '--restore-from-image' option would likely take more time/space to use, for both backups and restores - because the image will also include exactly this "dirt", specifically also logs, databases, etc. - but be much simpler to use.

In a way, we already do this - that's "he_appliance_ova" - but:
1. We do not document how, or provide tools for, converting an existing machine to such an image
2. We do not test this, so will likely run into issues initially.

Comment 6 Yedidyah Bar David 2021-08-09 12:04:49 UTC
Right now, the engine refuses to export the engine VM - this tries to create a snapshot on it, which fails with:

2021-08-09 14:01:37,354+02 WARN  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engine-Thread-283767) [0bba83fc-fcce-4fcd-a531-1d46ba94eafc] Validation of action 'CreateSnapshotForVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_CANNOT_RUN_ACTION_ON_NON_MANAGED_VM
2021-08-09 14:01:37,390+02 ERROR [org.ovirt.engine.core.bll.exportimport.ExportVmToOvaCommand] (EE-ManagedThreadFactory-engine-Thread-283767) [0bba83fc-fcce-4fcd-a531-1d46ba94eafc] Failed to create VM snapshot

Comment 7 Michal Skrivanek 2021-08-09 13:08:05 UTC
I think first and foremost we should document the need to upgrade.
We can think of even implementing enginevm_before_engine_setup hook by default attempting to update - it would still rely on having channels properly configured but with proper documentation (and more suggestive error message) it should be fine. In fact, I would start with that
- suggest to upgrade in engine-backup on version mismatch
- fix docs to require upgrading prior to restore using one of the methods (hook or pause), and fix the "major version" in comment #2.

Comment 8 Yedidyah Bar David 2021-08-09 13:35:31 UTC
(In reply to Michal Skrivanek from comment #7)
> I think first and foremost we should document the need to upgrade.

Not sure what you mean here. The only thing users can do in this flow
is "upgrade" the appliance before restoring. That does not help if it's
too old.

> We can think of even implementing enginevm_before_engine_setup hook by
> default attempting to update - it would still rely on having channels
> properly configured but with proper documentation (and more suggestive error
> message) it should be fine.

If we decide to patch the code, no need to implement that as a hook -
can be done directly. A hook is useful as a workaround.

Not sure how a code patch (as opposed to a custom hook) can help with
registration, channels, etc.

> In fact, I would start with that
> - suggest to upgrade in engine-backup on version mismatch

Text suggestions are welcome.

> - fix docs to require upgrading prior to restore using one of the methods
> (hook or pause), and fix the "major version" in comment #2.

Yes, that definitely requires a doc update - we now require strictly >=, not
just major.minor.

Comment 9 Yedidyah Bar David 2021-08-09 13:36:09 UTC
I thought about some other option: If restore failed, pause the deployment and let the user fix and continue, like we did in bug 1893385.

Comment 10 Yedidyah Bar David 2021-08-10 11:55:55 UTC
QE: To reproduce/verify this bug:

1. Deploy hosted-engine
2. Update the engine to the latest available one.
3. Take a backup with engine-backup.
4. Deploy with --restore-from-file, with an appliance older than the one in (2.), but >= 4.4.7.3 (see bug 1932392). Often, that's your only choice, because we do not release an appliance per each engine release.

With a broken version, it will fail.

With a fixed version, it will prompt the user with something like:

[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Run engine-backup]
[ INFO  ] engine-backup --mode=restore failed:
         FATAL: Backup was created by version '4.4.8.4_master' and can not be restored using the installed version 4.4.7.6
         You can now connect from this host to the bootstrap engine VM using ssh as root and the temporary IP address - 192.168.222.83 - and fix this issue. Please continue only after the backup is restored.
         To retry the command that failed, you can run, on the bootstrap engine VM:
         engine-backup --mode=restore --file=/root/engine_backup --provision-all-databases --restore-permissions
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until /tmp/ansible.oyhxe1k2_he_setup_lock is removed, delete it once ready to proceed]

Then, you should be able to:

1. Ssh from the deploy host to the local engine vm using the noted IP address (192.168.222.83, in the example above)
2. Update the engine to at least the backup version (4.4.7.6, in above example),  or likely latest. For RHV, this most likely means to register/subscribe/configure channels as needed, then 'dnf update ovirt-engine'.
3. Run the restore command as noted (in the example)
4. Once it finishes successfully, you can logout from the engine VM, and remove the lock file on the deploy host (/tmp/ansible.oyhxe1k2_he_setup_lock, in the example above).
5. Deploy should then continue successfully.

Comment 13 Pavol Brilla 2021-08-24 10:34:06 UTC
Steps were followed and procedure passed, but it has to be properly documented by doc team ( they have separate bug for that )

Comment 14 Yedidyah Bar David 2021-08-29 07:19:43 UTC
Eli, two notes about the doc text:

1. "if the latest appliance is older than the latest Manager" - not necessarily the _latest_ engine (manager), but the one used when taking the backup.

2. "and fix the restore issue." - it might be worth adding that this particular error should be handled by registering the engine machine, configuring channels, updating the engine packages, and then trying again to run the restore command. See comment 10 for details. Instead of the doc-text, you might want to add this to the documentation, in bug 1991914.

Thanks!

Comment 19 errata-xmlrpc 2021-09-08 14:12:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Engine and Host Common Packages security update [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3461

Comment 20 meital avital 2022-08-08 19:34:34 UTC
Due to QE capacity, we are not going to cover this issue in our automation