Description of problem: When HE is being deployed, the playbook creates a new hosted_storage using the ansible ovirt_storage_domain. The engine creates a new Storage Domain with the highest available storage format (i.e. V5 on a 4.3 deployment). However, when restoring from backup, even if the deployment is with a 4.3 appliance on a 4.3 host, the DC of the HE VM may not be at 4.3 Compatibility Level yet. If that is the case, the V5 storage domain cannot be attached to the DC, and the deployment fails: [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Cannot attach Storage. Storage Domain format V5 is illegal.]". HTTP response code is 409. Because SD is created with V5, which cannot be attached to a 4.2 DC as it only supports up to V4. This is not terrible if the user is just moving the hosted_engine to a new Storage Domain and can simply go back and finish the upgrade. But it is a big deal on disaster recovery scenarios, as the restore from backup simply fails. Version-Release number of selected component (if applicable): RHV 4.3 with DC in 4.2 CL How reproducible: Always Steps to Reproduce: Restore a backup of a 4.3 Hosted-Engine on a 4.3 host, while the backup contains the HE DC set to 4.2 compatibility level. Actual results: Restore from backup fails Expected results: Restore from backup succeeds. Additional info: CL Storage Format 4.0 V3 4.1 V4 4.2 V4 4.3 V5 4.4 V5
(In reply to Germano Veit Michel from comment #0) > The engine creates a new Storage Domain with the highest available storage > format (i.e. V5 on a 4.3 deployment). To make it clearer: this is a detached SD and the he deployment does not request any specific storage format via API.
*** Bug 1951548 has been marked as a duplicate of this bug. ***
QE: Reproduction/Verification flow: 1. Deploy 4.2 HE 2. Upgrade to 4.3 3. Take a backup with engine-backup 4. Try to upgrade to 4.4 following the standard documented flow ('hosted-engine --deploy --restore-from-file') Alternatively: 1. Deploy 4.4 HE 2. Create new DC/Cluster with compatibility version 4.2 and note their names 3. Backup with engine-backup 4. Try restoring on a new host with 'hosted-engine --deploy --restore-from-file', supplying the names of the DC/cluster created above when prompted With a broken version, deploy/restore will fail, and you'll get the error 'Cannot attach Storage. Storage Domain format V5 is illegal.'. Please note that if it didn't fail, or failed without this message, you probably didn't reproduce the flow of current bug. With a fixed version, deploy/restore will succeed, creating the new storage domain with format 'v4'.
(In reply to Yedidyah Bar David from comment #7) > QE: Reproduction/Verification flow: > > 1. Deploy 4.2 HE > 2. Upgrade to 4.3 > 3. Take a backup with engine-backup > 4. Try to upgrade to 4.4 following the standard documented flow > ('hosted-engine --deploy --restore-from-file') > > Alternatively: > > 1. Deploy 4.4 HE > 2. Create new DC/Cluster with compatibility version 4.2 and note their names You can't do that on 4.4 engine, you simply doesn't have the ability to change DC/Cluster to compatibility version 4.2, you have no such option from the drop down menu. The DC compatibility cersion is set to 4.6 without any other option, the Cluster compatibility version is set to 4.6 without any other option. Take a look at the attachment. Checked on ovirt-engine-setup-4.4.10.8-548.g6b5767a.2.el8ev.noarch > 3. Backup with engine-backup > 4. Try restoring on a new host with 'hosted-engine --deploy > --restore-from-file', supplying the names of the DC/cluster created above > when prompted > > With a broken version, deploy/restore will fail, and you'll get the error > 'Cannot attach Storage. Storage Domain format V5 is illegal.'. Please note > that if it didn't fail, or failed without this message, you probably didn't > reproduce the flow of current bug. > > With a fixed version, deploy/restore will succeed, creating the new storage > domain with format 'v4'.
Created attachment 1871040 [details] screenshot 1
Created attachment 1871041 [details] Screenshot 2
Following comment #10, the alternative option doesn't working here.
(In reply to Nikolai Sednev from comment #10) > (In reply to Yedidyah Bar David from comment #7) > > QE: Reproduction/Verification flow: > > > > 1. Deploy 4.2 HE > > 2. Upgrade to 4.3 > > 3. Take a backup with engine-backup > > 4. Try to upgrade to 4.4 following the standard documented flow > > ('hosted-engine --deploy --restore-from-file') > > > > Alternatively: > > > > 1. Deploy 4.4 HE > > 2. Create new DC/Cluster with compatibility version 4.2 and note their names > You can't do that on 4.4 engine, you simply doesn't have the ability to > change DC/Cluster to compatibility version 4.2, you have no such option from > the drop down menu. The DC compatibility cersion is set to 4.6 without any > other option, the Cluster compatibility version is set to 4.6 without any > other option. > Take a look at the attachment. The attached screenshots seem to show that you are trying to change the level for Default. The instructions I wrote above are to create _new_ DC/cluster.
Alternatively: 1. Deploy 4.4 HE 2. Create new DC/Cluster with compatibility version 4.2 and note their names 3. Backup with engine-backup 4. Try restoring on a new host with 'hosted-engine --deploy --restore-from-file', supplying the names of the DC/cluster created above when prompted With a broken version, deploy/restore will fail, and you'll get the error 'Cannot attach Storage. Storage Domain format V5 is illegal.'. Please note that if it didn't fail, or failed without this message, you probably didn't reproduce the flow of current bug. With a fixed version, deploy/restore will succeed, creating the new storage domain with format 'v4'. 1.Deployed 4.4 HE on serval16 and serval17 under dc and cl named default. 2.I created dc1 and cl1 without any hosts and set them to compatibility version 4.2. 3.I made a backup. 4.I powered-off both serval16 and serval17, then I ran "hosted-engine --deploy --4 --restore-from-file=/root/nsednev_from_serval16_rhevm_4_4" on new clean RHEL8.5 host serval15 and received this error: [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Run engine-backup] [ INFO ] engine-backup --mode=restore failed: FATAL: Backup was created by version '4.4.10.8' and can not be restored using the installed version 4.4.9.5 You can now connect from this host to the bootstrap engine VM using ssh as root and the temporary IP address - 192.168.222.233 - and fix this issue. Please continue only after the backup is restored. To retry the command that failed, you can run, on the bootstrap engine VM: engine-backup --mode=restore --file=/root/engine_backup --provision-all-databases --restore-permissions [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file] [ INFO ] changed: [localhost -> localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until /tmp/ansible.8lu0hvqw_he_setup_lock is removed, delete it once ready to proceed] Probably the appliance engine was too old, so I fetched the up to date repos to the engine and updated its rpms, then manually continued with the restore from the engine, by copying the backup file to the engine and then running "engine-backup --mode=restore --file=/root/nsednev_from_serval16_rhevm_4_4 --provision-all-databases --restore-permissions" as was suggested during the initial restore. Once manual restore successfully finished, I ran serval15 ~]# rm -rf /tmp/ansible.8lu0hvqw_he_setup_lock and deployment continued until I received just the original error: [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Activate storage domain] [ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot attach Storage. Storage Domain format V5 is illegal.]". HTTP response code is 409. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Cannot attach Storage. Storage Domain format V5 is illegal.]\". HTTP response code is 409."} Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: Environment is still available for farther investigation if required.
Engine components: ovirt-engine-setup-4.4.10.8-548.g6b5767a.2.el8ev.noarch Linux 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Mon Nov 8 13:30:15 EST 2021 x86_64 x86_64 x86_64 GNU/Linux Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: Please specify the nfs version you would like to use (auto, v3, v4, v4_0, v4_1, v4_2)[auto]: [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Activate storage domain] [ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot attach Storage. Storage Domain format V5 is illegal.]". HTTP response code is 409. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Cannot attach Storage. Storage Domain format V5 is illegal.]\". HTTP response code is 409."} Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]:
Hosts were running on these components: ovirt-hosted-engine-ha-2.4.10-1.el8ev.noarch ovirt-hosted-engine-setup-2.5.4-2.el8ev.noarch ovirt-ansible-collection-2.0.0-0.4.BETA.el8ev.noarch ansible-2.9.27-1.el8ae.noarch Red Hat Enterprise Linux release 8.5 (Ootpa) Linux 4.18.0-348.21.1.el8_5.x86_64 #1 SMP Tue Mar 22 10:35:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
(In reply to Nikolai Sednev from comment #15) > 1.Deployed 4.4 HE on serval16 and serval17 under dc and cl named default. > 2.I created dc1 and cl1 without any hosts and set them to compatibility > version 4.2. > 3.I made a backup. > 4.I powered-off both serval16 and serval17, then I ran "hosted-engine > --deploy --4 --restore-from-file=/root/nsednev_from_serval16_rhevm_4_4" on > new clean RHEL8.5 host serval15 and received this error: > > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Run engine-backup] > [ INFO ] engine-backup --mode=restore failed: > FATAL: Backup was created by version '4.4.10.8' and can not be > restored using the installed version 4.4.9.5 > You can now connect from this host to the bootstrap engine VM using > ssh as root and the temporary IP address - 192.168.222.233 - and fix this > issue. Please continue only after the backup is restored. > To retry the command that failed, you can run, on the bootstrap > engine VM: > engine-backup --mode=restore --file=/root/engine_backup > --provision-all-databases --restore-permissions > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] > [ INFO ] ok: [localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file] > [ INFO ] changed: [localhost -> localhost] > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until > /tmp/ansible.8lu0hvqw_he_setup_lock is removed, delete it once ready to > proceed] > > Probably the appliance engine was too old, Correct. > so I fetched the up to date repos > to the engine and updated its rpms, then manually continued with the restore > from the engine, by copying the backup file to the engine and then running > "engine-backup --mode=restore --file=/root/nsednev_from_serval16_rhevm_4_4 > --provision-all-databases --restore-permissions" as was suggested during the > initial restore. Good, in principle, but to which version? Probably latest 4.4. > Once manual restore successfully finished, I ran serval15 ~]# rm -rf > /tmp/ansible.8lu0hvqw_he_setup_lock and deployment continued until I > received just the original error: > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Activate storage domain] > [ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail > is "[Cannot attach Storage. Storage Domain format V5 is illegal.]". HTTP > response code is 409. Assuming you used 4.4, this is simply a reproduction of the current bug. It's not new for 4.5. As comment 0 says, it affects also 4.3. But: If you upgraded to 4.5, engine-backup --mode=restore would have failed. This is because we currently do not allow that. I think we might want to. If you want, create a bug for this (like bug 1812906), and make current bug depend on it. Or: 1. Deploy 4.4 HE 2. Create a new DC and cluster with compatibility version 4.2 3. Upgrade the engine to 4.5 4. Then take a backup and deploy/restore it to a new 4.5 HE
(In reply to Yedidyah Bar David from comment #18) > (In reply to Nikolai Sednev from comment #15) > > 1.Deployed 4.4 HE on serval16 and serval17 under dc and cl named default. > > 2.I created dc1 and cl1 without any hosts and set them to compatibility > > version 4.2. > > 3.I made a backup. > > 4.I powered-off both serval16 and serval17, then I ran "hosted-engine > > --deploy --4 --restore-from-file=/root/nsednev_from_serval16_rhevm_4_4" on > > new clean RHEL8.5 host serval15 and received this error: > > > > > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Run engine-backup] > > [ INFO ] engine-backup --mode=restore failed: > > FATAL: Backup was created by version '4.4.10.8' and can not be > > restored using the installed version 4.4.9.5 > > You can now connect from this host to the bootstrap engine VM using > > ssh as root and the temporary IP address - 192.168.222.233 - and fix this > > issue. Please continue only after the backup is restored. > > To retry the command that failed, you can run, on the bootstrap > > engine VM: > > engine-backup --mode=restore --file=/root/engine_backup > > --provision-all-databases --restore-permissions > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] > > [ INFO ] ok: [localhost] > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file] > > [ INFO ] changed: [localhost -> localhost] > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until > > /tmp/ansible.8lu0hvqw_he_setup_lock is removed, delete it once ready to > > proceed] > > > > Probably the appliance engine was too old, > > Correct. > > > so I fetched the up to date repos > > to the engine and updated its rpms, then manually continued with the restore > > from the engine, by copying the backup file to the engine and then running > > "engine-backup --mode=restore --file=/root/nsednev_from_serval16_rhevm_4_4 > > --provision-all-databases --restore-permissions" as was suggested during the > > initial restore. > > Good, in principle, but to which version? Probably latest 4.4. ovirt-engine-setup-4.4.10.8-548.g6b5767a.2.el8ev.noarch > > > Once manual restore successfully finished, I ran serval15 ~]# rm -rf > > /tmp/ansible.8lu0hvqw_he_setup_lock and deployment continued until I > > received just the original error: > > > > [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Activate storage domain] > > [ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail > > is "[Cannot attach Storage. Storage Domain format V5 is illegal.]". HTTP > > response code is 409. > > Assuming you used 4.4, this is simply a reproduction of the current bug. > It's not new for 4.5. As comment 0 says, it affects also 4.3. You've asked to backup&restore on the same 4.4 in alternative flow, and so I followed the scenario, restoring on 4.5 was not related to this flow. I thought the fix was merged to latest ovirt-engine-setup-4.4.10.8-548.g6b5767a.2.el8ev.noarch. > > But: If you upgraded to 4.5, engine-backup --mode=restore would have failed. > This is because we currently do not allow that. I think we might want to. > If you want, create a bug for this (like bug 1812906), and make current bug > depend on it. Or: > > 1. Deploy 4.4 HE > 2. Create a new DC and cluster with compatibility version 4.2 > 3. Upgrade the engine to 4.5 > 4. Then take a backup and deploy/restore it to a new 4.5 HE Its totally new flow. Please fill in "Fixed in version field".
1. Deploy 4.4 HE 1.1 I deployed ovirt-engine-setup-4.4.10.8-548.g6b5767a.2.el8ev.noarch 2. Create a new DC and cluster with compatibility version 4.2 2.1 I created dc1 and cl1 with compatibility version 4.2 3. Upgrade the engine to 4.5 3.1 I upgraded the engine to Software Version:4.5.0.1-607.fad80f26da78.25.el8ev 4. Then take a backup and deploy/restore it to a new 4.5 HE 4.4 I tried to backup&restore on third clean host using these components: ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch ovirt-hosted-engine-setup-2.6.3-1.el8ev.noarch ansible-core-2.12.2-3.1.el8.x86_64 ovirt-ansible-collection-2.0.2-1.el8ev.noarch Linux 4.18.0-372.6.1.el8.x86_64 #1 SMP Fri Apr 1 16:31:01 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.6 (Ootpa) And failed. [ INFO ] engine-backup --mode=restore failed: FATAL: Backup was created by version '4.5.0.1' and can not be restored using the installed version 4.5.0 You can now connect from this host to the bootstrap engine VM using ssh as root and the temporary IP address - 192.168.222.233 - and fix this issue. Please continue only after the backup is restored. To retry the command that failed, you can run, on the bootstrap engine VM: engine-backup --mode=restore --file=/root/engine_backup --provision-all-databases --restore-permissions [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file] [ INFO ] changed: [localhost -> localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until /tmp/ansible.qs21phs4_he_setup_lock is removed, delete it once ready to proceed] This happened again because of the outdated appliance being taken from the repos I've used. I tried to update the engine from inside the engine and to finish with the restore, but failed. I decided to retry again from scratch, by manually installing rhvm-appliance-4.5-20220407.0.el8ev.ova on host, which contains ovirt-engine-setup-base-4.5.0.1-0.26.el8ev.noarch, this time I also chose to pause the engine to fetch all newest components and repos to the latest appliance that I could reach (Today is 7.04.22 and the appliance is also from Today): hosted-engine --deploy --4 --restore-from-file=/root/nsednev_from_serval16_rhevm_4_5 --ansible-extra-vars=he_pause_before_engine_setup=true And the engine got paused exactly where I needed: [ INFO ] You can now connect from this host to the bootstrap engine VM using ssh as root and the temporary IP address - 192.168.222.233 [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file] [ INFO ] changed: [localhost -> localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until /tmp/ansible.k6bikipw_he_setup_lock is removed, delete it once ready to proceed] Then I updated all components to latest and greatest, the engine-setup have been updated to ovirt-engine-setup-4.5.0.2-608.40ddbf0c8eb3.3.el8ev.noarch and then I released the lock to continue with the restore: serval15 ~]# rm -rf /tmp/ansible.k6bikipw_he_setup_lock [ INFO ] ok: [localhost -> localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Copy the backup file to the engine VM for restore] [ INFO ] changed: [localhost -> 192.168.222.233] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Run engine-backup] . . . [ INFO ] Hosted Engine successfully deployed [ INFO ] Other hosted-engine hosts have to be reinstalled in order to update their storage configuration. From the engine, host by host, please set maintenance mode and then click on reinstall button ensuring you choose DEPLOY in hosted engine tab. [ INFO ] Please note that the engine VM ssh keys have changed. Please remove the engine VM entry in ssh known_hosts on your clients. This time the restore succeeded. HE have been restored to dc1/cl1. "With a fixed version, deploy/restore will succeed, creating the new storage domain with format 'v4'." I saw V4 format was used for hosted_storage on restored Storage Domain, see the attached screenshot. Moving to verified.
Nikolai, any chance you can attach relevant logs from the verification? See also recent comments in bug 1932147. Thanks.
(In reply to Yedidyah Bar David from comment #22) > Nikolai, any chance you can attach relevant logs from the verification? See > also recent comments in bug 1932147. Thanks. The environment lone gone, no logs available for this issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Engine and Host Common Packages security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4712