Bug 1883446 - [Upgrade] migration of VMs from a 4.3 host with rhel-7.9 to v4.4.3 host with rhel-8.3 fails
Summary: [Upgrade] migration of VMs from a 4.3 host with rhel-7.9 to v4.4.3 host with ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.3
: ---
Assignee: Milan Zamazal
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks: 1883817
TreeView+ depends on / blocked
 
Reported: 2020-09-29 08:49 UTC by Roni
Modified: 2020-11-11 06:42 UTC (History)
14 users (show)

Fixed In Version: vdsm-4.40.33
Doc Type: Bug Fix
Doc Text:
Migrations of VMs started on hosts < 4.4 with a payload stored on a floppy failed when migrating to 4.4 hosts. This has been fixed and migrations work when migrating to up-to-date 4.4 hosts.
Clone Of:
Environment:
Last Closed: 2020-11-11 06:42:44 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+
aoconnor: blocker+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 111537 0 master MERGED mkimage: Don't fail on old floppy payload paths 2020-12-07 08:04:31 UTC

Description Roni 2020-09-29 08:49:51 UTC
Created attachment 1717471 [details]
logs

Created attachment 1717471 [details]
logs

Description of problem:
VM migration fails when upgrading from RHV 4.3 to 4.4


Version-Release number of selected component (if applicable):
v4.4: 4.4.3.3-0.19.el8ev  -> host with rhel-8.3
v4.3: 4.3.11              -> host with rhel-7.9

How reproducible:
100%

Steps to Reproduce:
1. Use v4.3 with VM running on hosts
2. Backup Engine 4.3
3. Provisioning Engine with OS 'rhel-8.3'
4. Restore Engine from backup
5. Enter first rhel-7.9 host into maintenance, 
   reprovision with 'rhel-8.3' and add to Engine
6. Enter second rhel-7.9 host that include running VM into maintenance

Actual results:
Host is stuck in "Preparing for maintenance"

Expected results:
Host should enter into maintenance after migrating the VM to the new rhel-8.3 host

Additional info:
See attached logs...
It seems that VM migration fails with the following error
The host is still holding the VM and that's why it stuck in 
"Preparing for maintenance" state

ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-19) [] Migration of VM 'Win2016_Cvm_64b' to host 'host_mixed_1' failed: Image /var/run/vdsm/payload/4809e0c7-f4fd-4ea9-8d86-d452b942929c.5560c76d66146eb12ef8b4165f475c5c.img is not inside /run/vdsm/payload directory.

Comment 2 Arik 2020-09-29 21:30:49 UTC
I see that the payload was saved into a file named <vm_id>.<hash>.md:
/var/run/vdsm/payload/4809e0c7-f4fd-4ea9-8d86-d452b942929c.5560c76d66146eb12ef8b4165f475c5c.img

But then the VM is migrated from a 4.3.11-8 host to a 4.4.3-6 host.
In 4.4 the <hash> part was removed from the payload filename (https://gerrit.ovirt.org/#/c/102698/) and so I suspect that on the target host the payload is saved to
/var/run/vdsm/payload/4809e0c7-f4fd-4ea9-8d86-d452b942929c.img
And therefore the domain cannot access the payload source that is specified in the cd-rom device and the migration fails.

This combination of migrating VMs with payload from 4.3 to 4.4 is not that common and this shouldn't block automation.
That said, we should preserve backward compatibility for migrating VMs - Milan, what do you think?

Comment 3 Arik 2020-09-30 07:48:03 UTC
(In reply to Arik from comment #2)
> This combination of migrating VMs with payload from 4.3 to 4.4 is not that
> common and this shouldn't block automation.

Roni, can you please repeat the tests on a homogeneous environment?

Comment 4 Milan Zamazal 2020-09-30 18:04:17 UTC
I suspect the problem is caused by the fact that we switched from /var/run to /run recently in Vdsm and the error message apparently comes from injectFilesToFs function that would fail exactly on that. We must find a way how to handle migrations with payloads from older Vdsm versions.

Comment 5 Roni 2020-10-01 10:48:04 UTC
(In reply to Arik from comment #3)
> (In reply to Arik from comment #2)
> > This combination of migrating VMs with payload from 4.3 to 4.4 is not that
> > common and this shouldn't block automation.
> 
> Roni, can you please repeat the tests on a homogeneous environment?

When I shut down the VM that was running on the v4.3 host and then start it on the v4.4 host 
then I can successfully migrate it to v4.4 host and to v4.3 host and vice versa.
It is still an upgrade issue if we want to keep the VMs running during the upgrade

Comment 6 Arik 2020-10-01 10:54:10 UTC
(In reply to Roni from comment #5)
> When I shut down the VM that was running on the v4.3 host and then start it
> on the v4.4 host 
> then I can successfully migrate it to v4.4 host and to v4.3 host and vice
> versa.
> It is still an upgrade issue if we want to keep the VMs running during the
> upgrade

Yes, it seems the VM initially ran with run-once + payload.
When shutting down the VM and starting it again, it starts without the payload and then migration from a 4.3 host to a 4.4 host would work

Comment 7 Arik 2020-10-04 07:40:24 UTC
(In reply to Milan Zamazal from comment #4)
> I suspect the problem is caused by the fact that we switched from /var/run
> to /run recently in Vdsm and the error message apparently comes from
> injectFilesToFs function that would fail exactly on that. We must find a way
> how to handle migrations with payloads from older Vdsm versions.

Good, that explains why this issue was reported as a recently introduced regression (the change I've mentioned in comment 2 got in long time ago)

Comment 8 Arik 2020-10-26 21:17:16 UTC
Petr, why does it depend on bz 1883817?

Comment 9 Petr Matyáš 2020-10-27 08:36:32 UTC
Because that bug blocks our upgrade flow where this bug happened.

Comment 10 Arik 2020-10-27 09:00:20 UTC
(In reply to Petr Matyáš from comment #9)
> Because that bug blocks our upgrade flow where this bug happened.

So wouldn't it make more sense then to set it the other way around - that this bug blocks bz 1883817 and to verify this one not in the context of upgrade?

Comment 11 Petr Matyáš 2020-10-30 08:18:40 UTC
Verified on vdsm-4.40.35-1.el8ev

Comment 12 Sandro Bonazzola 2020-11-11 06:42:44 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.