Bug 1822535 - Hosted-engine restore from file fails when there are VM's having snapshots with old compatibility levels.
Summary: Hosted-engine restore from file fails when there are VM's having snapshots wi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: rhvm-appliance
Version: 4.3.9
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.4.1
: ---
Assignee: Yedidyah Bar David
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-09 09:45 UTC by Siddhant Rao
Modified: 2023-09-07 22:45 UTC (History)
14 users (show)

Fixed In Version: ovirt-engine-appliance-4.4.1_rc5
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-04 16:21:42 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-engine-setup pull 85 0 None closed Add OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL to 4.3 2020-11-18 23:58:52 UTC
Red Hat Product Errata RHEA-2020:3315 0 None None None 2020-08-04 16:21:51 UTC
oVirt gerrit 109605 0 master MERGED answer file: Ignore incompatible snapshots 2020-11-18 23:58:52 UTC

Comment 5 Michal Skrivanek 2020-04-10 04:26:05 UTC
Easiest would be to flip the default answer to that question

Comment 6 Juan Orti 2020-04-15 06:52:25 UTC
With this hook we could continue and HE was deployed successfully:

/usr/share/ansible/roles/ovirt.hosted_engine_setup/hooks/enginevm_before_engine_setup/fixOldSnapshot.yml

~~~
- name: Adding env variable to accept old snapshots
  lineinfile:
    path: /root/ovirt-engine-answers
    line: "OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL=str:yes"
    state: present

- name: Change default answer to accept old snapshots
  lineinfile:
    path: '/usr/share/ovirt-engine/setup/plugins/ovirt-engine-setup/ovirt-engine/db/schema.py'
    backrefs: yes
    regexp: '(\s+)default=False(.*)'
    line: '\1default=True\2'
    state: present
    backup: yes
    owner: root
    group: root
    mode: 0644
~~~

Comment 8 Yedidyah Bar David 2020-04-19 06:12:17 UTC
Some notes:

1. One might claim it's not a bug in hosted-engine restore, but in the fact that we repeatedly (I think, didn't try) ask about the same snapshots on each upgrade - that we should mark the ones we prompted about as "confirmed", and do not ask again about them.

2. Another way to think about this is "Please remove these snapshots as soon as possible". Do we allow that? Not sure. If so, we can update the prompt like this.

3. We can also (unrelated) allow passing a custom answer file, to ease the workaround.

4. If we do not do any of above, I'd personally still prefer CLOSE WONTFIX, because I think that changing the default to Yes is risky for users doing an actual upgrade and not noticing it, then breaking their snapshots.

Comment 9 Siddhant Rao 2020-04-21 15:23:53 UTC
(In reply to Yedidyah Bar David from comment #8)
> Some notes:
> 
> 1. One might claim it's not a bug in hosted-engine restore, but in the fact
> that we repeatedly (I think, didn't try) ask about the same snapshots on
> each upgrade - that we should mark the ones we prompted about as
> "confirmed", and do not ask again about them.
> 
> 2. Another way to think about this is "Please remove these snapshots as soon
> as possible". Do we allow that? Not sure. If so, we can update the prompt
> like this.

With respect to both of the above, Are you referring to the prompt we give in engine-setup when we go to the next version?.


> 3. We can also (unrelated) allow passing a custom answer file, to ease the
> workaround.

right, I tried inserting the answer to this option as OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL via the hook mentioned in comment #3
But for some reason it did not work, could you confirm if OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL is the correct parameter to pass
in the answers file for resolving this?.


> 4. If we do not do any of above, I'd personally still prefer CLOSE WONTFIX,
> because I think that changing the default to Yes is risky for users doing an
> actual upgrade and not noticing it, then breaking their snapshots.

Understood and agreed
However, I was suggesting if we could prompt the user this question if old snapshots are detected, _only_ when we run hosted-engine deploy with "--restore-from-file"
IMO, we should at least give the user a choice here,
this because many a times many a times when they restore, users don't have the old manager server to go back and delete the snapshots.
During restore in such situations, we fail because of the default answer.


Let me know your views.

Comment 10 Yedidyah Bar David 2020-04-22 07:44:31 UTC
(In reply to Siddhant Rao from comment #9)
> (In reply to Yedidyah Bar David from comment #8)
> > Some notes:
> > 
> > 1. One might claim it's not a bug in hosted-engine restore, but in the fact
> > that we repeatedly (I think, didn't try) ask about the same snapshots on
> > each upgrade - that we should mark the ones we prompted about as
> > "confirmed", and do not ask again about them.
> > 
> > 2. Another way to think about this is "Please remove these snapshots as soon
> > as possible". Do we allow that? Not sure. If so, we can update the prompt
> > like this.
> 
> With respect to both of the above, Are you referring to the prompt we give
> in engine-setup when we go to the next version?.

Yes. AFAIU this is the prompt that is breaking the restore, no?
(Also partially replying to your point below:)
I assume that normally, people do not have to restore to a different version
than they backed up. So the flow is:

1. Install and setup an old engine, create VMs and snapshots
2. Upgrade to a newer one, be asked about the snapshots and confirm upgrade
3. Take a backup
4. Try to restore it to same version

If so, then if we make step (2.) mark somewhere that the user already confirmed
upgrade with these snapshots, no reason to ask about them again at (4.).

> 
> 
> > 3. We can also (unrelated) allow passing a custom answer file, to ease the
> > workaround.
> 
> right, I tried inserting the answer to this option as
> OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL via the hook mentioned in
> comment #3
> But for some reason it did not work, could you confirm if
> OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL is the correct parameter to
> pass
> in the answers file for resolving this?.

Sorry, no.

I do not have an engine right now to test this on. Please try this manually
and check the generated answer file. It should be something like:

QUESTION/1/OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL=str:yes

I am sorry I didn't notice this when reading previously and wasted your time :-(.

> 
> 
> > 4. If we do not do any of above, I'd personally still prefer CLOSE WONTFIX,
> > because I think that changing the default to Yes is risky for users doing an
> > actual upgrade and not noticing it, then breaking their snapshots.
> 
> Understood and agreed
> However, I was suggesting if we could prompt the user this question if old
> snapshots are detected, _only_ when we run hosted-engine deploy with
> "--restore-from-file"
> IMO, we should at least give the user a choice here,
> this because many a times many a times when they restore, users don't have
> the old manager server to go back and delete the snapshots.

See the start of my comment. If we do this well (mark these old snapshots
as ACKed also for future upgrades), we should be ok.

For now, I do not object to making restore-from-file add this option to
the answer file. I see why it makes sense.

> During restore in such situations, we fail because of the default answer.
> 
> 
> Let me know your views.

Also:

1. In the past we did have specific interaction in deploy to affect engine-setup, see e.g. bug 1686445. So in principle we can do this again, although in practice it adds lots of duplication (in hosted-engine deploy and engine-setup), while engine-setup was really simply not designed to be used like that.

2. You can also try to reply 'Yes' to 'Pause the execution after adding this host to the engine?', see bug 1712667 comment 13.

Looking at the code, I think it would not have helped, because it makes deploy wait after trying to add the host, while in your case you failed before that. Perhaps we should add another such pause after engine-setup, if user replied Yes and it failed. Then, user can login to the engine machine, run engine-setup interactively, and continue (by removing the lock file).

Comment 11 Yedidyah Bar David 2020-04-27 07:53:52 UTC
Did you try updating your workaround with the correct line?

Did it work?

If so, is that ok for you?

I'd like also someone from storage team to comment. Nir - other than the warning/prompt telling people their old snapshots will not work, can/should we do anything else? Can/Should we mark each such snapshot as "confirmed"? I do not think this can be a single value for all snapshots, because future versions/upgrades might introduce new incompatibilities. E.g. if a future 4.5 version supports only snapshots created by 4.4 and later (just an example), we'll want to prompt again then. Currently we always prompt, if we find such snapshots. Can a user remove them, safely? If not, can they remove them safely before upgrade? If so, I guess we should simply do nothing, with the assumption that users that want to remove these snapshots must do that before upgrade. Then we need to decide what to do about hosted-engine restore/upgrade.

Comment 12 Michal Skrivanek 2020-04-27 11:42:22 UTC
(In reply to Yedidyah Bar David from comment #11)
> Can/Should we mark each such snapshot as "confirmed"? I
> do not think this can be a single value for all snapshots, because future
> versions/upgrades might introduce new incompatibilities. E.g. if a future
> 4.5 version supports only snapshots created by 4.4 and later (just an
> example), we'll want to prompt again then. Currently we always prompt, if we
> find such snapshots. Can a user remove them, safely? If not, can they remove
> them safely before upgrade? If so, I guess we should simply do nothing, with
> the assumption that users that want to remove these snapshots must do that
> before upgrade. Then we need to decide what to do about hosted-engine
> restore/upgrade.

it doesn't have tot be complicated. You can always ask, just default to ignore them. You can remove them later on too, the only thing the check is telling you is that you won't be able to restore to them

Comment 13 Michal Skrivanek 2020-04-27 12:29:58 UTC
I'd suggest to add that QUESTION/1/OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL=str:yes to either 
https://github.com/oVirt/ovirt-ansible-engine-setup/blob/master/templates/basic_answerfile.txt.j2 or the upgrade one. Or both, as it makes sense to skip in every non-interactive engine-setup executions

Comment 14 Siddhant Rao 2020-04-29 09:02:33 UTC
(In reply to Yedidyah Bar David from comment #11)
> Did you try updating your workaround with the correct line?
> 
> Did it work?
> 

comment #6 did resolve the issue, but there we actually changed the default from False to True in ovirt-engine/db/schema.py, which seems to be literally changing the source code.
Not sure if that would be feasible everytime.


(In reply to Michal Skrivanek from comment #12)
> (In reply to Yedidyah Bar David from comment #11)
> > Can/Should we mark each such snapshot as "confirmed"? I
> > do not think this can be a single value for all snapshots, because future
> > versions/upgrades might introduce new incompatibilities. E.g. if a future
> > 4.5 version supports only snapshots created by 4.4 and later (just an
> > example), we'll want to prompt again then. Currently we always prompt, if we
> > find such snapshots. Can a user remove them, safely? If not, can they remove
> > them safely before upgrade? If so, I guess we should simply do nothing, with
> > the assumption that users that want to remove these snapshots must do that
> > before upgrade. Then we need to decide what to do about hosted-engine
> > restore/upgrade.
> 
> it doesn't have tot be complicated. You can always ask, just default to
> ignore them. You can remove them later on too, the only thing the check is
> telling you is that you won't be able to restore to them

Agreed.


(In reply to Michal Skrivanek from comment #13)
> I'd suggest to add that
> QUESTION/1/OVESETUP_IGNORE_SNAPSHOTS_WITH_OLD_COMPAT_LEVEL=str:yes to either 
> https://github.com/oVirt/ovirt-ansible-engine-setup/blob/master/templates/
> basic_answerfile.txt.j2 or the upgrade one. Or both, as it makes sense to
> skip in every non-interactive engine-setup executions

Again, agreed

Comment 15 Martin Necas 2020-05-11 14:07:22 UTC
Will do the build of ovirt-ansible-engine-setup as soon as possible.

Comment 16 Nir Soffer 2020-05-20 23:59:09 UTC
(In reply to Yedidyah Bar David from comment #11)
> I'd like also someone from storage team to comment. Nir - other than the
> warning/prompt telling people their old snapshots will not work, can/should
> we do anything else?

I don't know what are these snapshots. Benny, can you help with this?

Comment 17 Martin Necas 2020-06-05 08:32:22 UTC
the fix was released on 12th May (https://github.com/oVirt/ovirt-ansible-engine-setup/releases/tag/1.2.4)

Comment 18 Petr Matyáš 2020-06-10 12:12:41 UTC
This still fails HE setup, are there any specific arguments that need to be added when running the deploy from backup?
I think it should be made default to ignore the old snapshots.
Of course I'm running through the regular cmd line tool as I far as I know that is still the default way to install HE.

Using ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch

Comment 19 Martin Necas 2020-06-10 12:24:39 UTC
So the fix on the ovirt-ansible-engine-setup did not help maybe the issue is somewhere else.

Comment 20 Yedidyah Bar David 2020-06-10 12:51:05 UTC
(In reply to Petr Matyáš from comment #18)
> This still fails HE setup, are there any specific arguments that need to be
> added when running the deploy from backup?
> I think it should be made default to ignore the old snapshots.
> Of course I'm running through the regular cmd line tool as I far as I know
> that is still the default way to install HE.
> 
> Using ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch

Please note that the fix was on ovirt-ansible-engine-setup, not ovirt-hosted-engine-setup. Did you use the correct version? If so, please upload relevant logs (probably a sosreport is enough).

Comment 21 Petr Matyáš 2020-06-10 13:00:05 UTC
I'm using ovirt-ansible-engine-setup-1.2.4-1.el8ev.noarch

I deployed HE, created couple VMs and added two snapshots to each, then changed compat level in DB for each of those snapshots and verified by running engine-setup that it does find snapshots with old compat level.
After reinstalling the HE host I installed HE packages and ran 'hosted-engine --deploy --restore-from-file=file.backup' and provided all necessary information.

As the bug was reported on hosted-engine-setup I guess verifying regular flow is in place.

Comment 25 Yedidyah Bar David 2020-06-10 13:52:40 UTC
Ok, sorry for bothering Martin, we actually do not use the answer file(s) contained in ovirt-ansible-engine-setup, but rely on the default one provided in the appliance. Moving the bug there.

(In reply to Petr Matyáš from comment #21)
> I'm using ovirt-ansible-engine-setup-1.2.4-1.el8ev.noarch
> 
> I deployed HE, created couple VMs and added two snapshots to each, then
> changed compat level in DB for each of those snapshots and verified by
> running engine-setup that it does find snapshots with old compat level.
> After reinstalling the HE host I installed HE packages and ran
> 'hosted-engine --deploy --restore-from-file=file.backup' and provided all
> necessary information.
> 
> As the bug was reported on hosted-engine-setup I guess verifying regular
> flow is in place.

I agree, for flows using this ansible role.

Comment 33 Petr Matyáš 2020-07-13 19:29:18 UTC
Verified on rhvm-appliance-2:4.4-20200707.0.el8ev.x86_64

Comment 38 errata-xmlrpc 2020-08-04 16:21:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Appliance (rhvm-appliance) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3315


Note You need to log in before you can comment on or make changes to this bug.