Bug 1877632 - VM stuck in Migrating status after migration completed due to incorrect status reported by VDSM after restart
Summary: VM stuck in Migrating status after migration completed due to incorrect statu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.4.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.3
: ---
Assignee: Milan Zamazal
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-10 03:25 UTC by Germano Veit Michel
Modified: 2023-12-15 19:15 UTC (History)
6 users (show)

Fixed In Version: vdsm-4.40.33
Doc Type: Bug Fix
Doc Text:
Previously, when the VDSM was restarted during a Virtual Machine migration on the migration destination host, the VM status wasn't identified correctly. In this release, the VDSM identifies the migration destination status correctly.
Clone Of:
Environment:
Last Closed: 2020-11-24 13:10:05 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5386641 0 None None None 2020-09-10 03:58:54 UTC
Red Hat Product Errata RHSA-2020:5179 0 None None None 2020-11-24 13:10:26 UTC
oVirt gerrit 111553 0 master MERGED virt: Fix migration destination detection in recovery 2020-11-27 15:42:29 UTC

Description Germano Veit Michel 2020-09-10 03:25:57 UTC
Description of problem:

Virtual Machine is stuck in Migrating status in the engine but the migration already completed.

engine=# select vm_name,status,migrating_to_vds from vms;
 vm_name | status | migrating_to_vds 
---------+--------+------------------
 Ubuntu  |      5 | 

From engine side:

1. Migration Starts

2020-09-10 12:06:49,708+10 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-11) [8c0d1349-c17f-4d97-ad7c-ddbbf3f89643] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: Ubuntu, Source: devel.kvm, Destination: host1.kvm, User: admin@internal-authz).

2. Destination VDSM restarted

2020-09-10 12:07:11,987+10 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1708) [] EVENT_ID: VDS_HOST_NOT_RESPONDING(9,027), Host host1.kvm is not responding. Host cannot be fenced automatically because power management for the host is disabled.

3. Migration finished, source resports VM as down:

2020-09-10 12:07:13,215+10 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-25) [25e0f57e] VM '45c63b1d-64e2-4e93-b84c-be6d343594d2' was reported as Down on VDS 'c306685d-2fd9-4bb1-b9d2-32c51d37b500'(devel.kvm)

4. VM handover

2020-09-10 12:07:13,532+10 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-25) [25e0f57e] Handing over VM '45c63b1d-64e2-4e93-b84c-be6d343594d2'(Ubuntu) to Host '4815fb17-70fa-4c29-82b6-c3c6c9043237'. Setting VM to status 'MigratingTo'

5. Destroy on Source

2020-09-10 12:07:13,215+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-25) [25e0f57e] START, DestroyVDSCommand(HostName = devel.kvm, DestroyVmVDSCommandParameters:{hostId='c306685d-2fd9-4bb1-b9d2-32c51d37b500', vmId='45c63b1d-64e2-4e93-b84c-be6d343594d2', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 37cb9588

6. Destination VDSM restarted, host is now Up

2020-09-10 12:07:16,492+10 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [16554103] EVENT_ID: VDS_DETECTED(13), Status of host host1.kvm was set to Up.

7. Migration finished

2020-09-10 12:07:26,999+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-98) [] START, MigrateStatusVDSCommand(HostName = host1.kvm, MigrateStatusVDSCommandParameters:{hostId='4815fb17-70fa-4c29-82b6-c3c6c9043237', vmId='45c63b1d-64e2-4e93-b84c-be6d343594d2'}), log id: 6b956136
2020-09-10 12:07:27,002+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-98) [] FINISH, MigrateStatusVDSCommand, return: , log id: 6b956136
2020-09-10 12:07:27,015+10 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-98) [] EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM: Ubuntu, Source: devel.kvm, Destination: host1.kvm, Duration: 26 seconds, Total: 37 seconds, Actual downtime: (N/A))

8. So far so good. But... the destination host is now reporting the VM as "Migration Source" instead of "Migration Destination" or "Up".

2020-09-10 12:07:26,986+10 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-98) [] VM '45c63b1d-64e2-4e93-b84c-be6d343594d2'(Ubuntu) moved from 'MigratingTo' --> 'MigratingFrom'

-------

Let's take a look at the VDSM side (destination):

1. Migration Create, status is "Migration Destination"

2020-09-10 12:06:48,966+1000 INFO  (jsonrpc/2) [api.virt] FINISH migrationCreate return={'status': {'code': 0, 'message': 'Done'}, 'migrationPort': 0, 'params': {'vmId': '45c63b1d-64e2-4e93-b84c-be6d343594d2', 'status': 'Migration Destination'

2. VDSM restarted

2020-09-10 12:07:06,683+1000 INFO  (MainThread) [vds] (PID: 5499) I am the actual vdsm 4.40.22.1 host1.kvm (4.18.0-193.13.2.el8_2.x86_64) (vdsmd:155)

3. Recovery of the VM

2020-09-10 12:07:07,043+1000 INFO  (vmrecovery) [vds] recovery [1:1/1]: recovered domain 45c63b1d-64e2-4e93-b84c-be6d343594d2 (recovery:123)

4. VM is running fine on the destination host, but VDSM is reporting "Migration Source" forever:

2020-09-10 12:19:43,042+1000 INFO  (jsonrpc/6) [throttled] Current getAllVmStats: {'45c63b1d-64e2-4e93-b84c-be6d343594d2': 'Migration Source'} (throttledlog:104)
2020-09-10 12:32:14,755+1000 INFO  (jsonrpc/4) [throttled] Current getAllVmStats: {'45c63b1d-64e2-4e93-b84c-be6d343594d2': 'Migration Source'} (throttledlog:104)
2020-09-10 12:44:46,547+1000 INFO  (jsonrpc/6) [throttled] Current getAllVmStats: {'45c63b1d-64e2-4e93-b84c-be6d343594d2': 'Migration Source'} (throttledlog:104)
2020-09-10 12:57:18,347+1000 INFO  (jsonrpc/4) [throttled] Current getAllVmStats: {'45c63b1d-64e2-4e93-b84c-be6d343594d2': 'Migration Source'} (throttledlog:104)
2020-09-10 13:09:50,085+1000 INFO  (jsonrpc/6) [throttled] Current getAllVmStats: {'45c63b1d-64e2-4e93-b84c-be6d343594d2': 'Migration Source'} (throttledlog:104)

So the VDSM restart made the VM status swtich from "Migration Destination" to "Migration Source", and it doesn't report the VM as 'Up'. The VM is UP and the underlying
migration finished sucessfully.
 

Version-Release number of selected component (if applicable):
vdsm-4.40.22-1.el8ev.x86_64
ovirt-engine-4.4.1.10-0.1.el8ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Start Migration
2. Restart VDSM on the destination host

Actual results:
- VM stuck in migrating state on the engine
- Destination VDSM reporting wrong status

Expected results:
- VM Up status


Additional info:

See this sequence:
2020-09-10 12:07:07,042+1000 INFO  (vm/45c63b1d) [virt.vm] (vmId='45c63b1d-64e2-4e93-b84c-be6d343594d2') VM wrapper has started (vm:2497)
2020-09-10 12:07:07,043+1000 INFO  (vmrecovery) [vds] VM recovered: 45c63b1d-64e2-4e93-b84c-be6d343594d2 (recovery:97)
2020-09-10 12:07:07,043+1000 INFO  (vmrecovery) [vds] recovery [1:1/1]: recovered domain 45c63b1d-64e2-4e93-b84c-be6d343594d2 (recovery:123)
2020-09-10 12:07:07,276+1000 INFO  (vm/45c63b1d) [virt.vm] (vmId='45c63b1d-64e2-4e93-b84c-be6d343594d2') Starting connection (guestagent:246)
2020-09-10 12:07:07,278+1000 INFO  (vm/45c63b1d) [virt.vm] (vmId='45c63b1d-64e2-4e93-b84c-be6d343594d2') CPU stopped: domain initialization (vm:5551)
2020-09-10 12:07:07,289+1000 WARN  (vm/45c63b1d) [vds] Not ready yet, ignoring event '|virt|VM_status|45c63b1d-64e2-4e93-b84c-be6d343594d2' args={'45c63b1d-64e2-4e93-b84c-be6d343594d2': {'status': 'Migration Source', 'elapsedTime': '1026', 'monitorResponse': '0', 'clientIp': '', 'timeOffset': '0', 'pauseCode': 'NOERR', 'cpuUser': '0.00', 'cpuSys': '0.00', 'cpuUsage': '0.00', 'displayInfo': [{'type': 'spice', 'port': '5900', 'tlsPort': '5901', 'ipAddress': '192.168.100.1'}, {'type': 'vnc', 'port': '5902', 'tlsPort': '', 'ipAddress': '192.168.100.1'}], 'hash': '-6523322131148256054', 'vcpuQuota': '-1', 'vcpuPeriod': 100000}} (clientIF:215)
2020-09-10 12:07:10,593+1000 INFO  (libvirt/events) [virt.vm] (vmId='45c63b1d-64e2-4e93-b84c-be6d343594d2') CPU running: onResume (vm:5551)
2020-09-10 12:07:11,641+1000 INFO  (jsonrpc/1) [throttled] Current getAllVmStats: {'45c63b1d-64e2-4e93-b84c-be6d343594d2': 'Migration Source'} (throttledlog:104)
2020-09-10 12:07:18,058+1000 INFO  (vmrecovery) [vds] recovery [1/1]: preparing paths for domain 45c63b1d-64e2-4e93-b84c-be6d343594d2 (clientIF:738)
2020-09-10 12:07:18,088+1000 INFO  (vmrecovery) [virt.vm] (vmId='45c63b1d-64e2-4e93-b84c-be6d343594d2') Enabling drive monitoring (drivemonitor:56)
2020-09-10 12:07:26,079+1000 INFO  (jsonrpc/4) [api.virt] START getMigrationStatus() from=::ffff:192.168.100.253,48716, vmId=45c63b1d-64e2-4e93-b84c-be6d343594d2 (api:48)
2020-09-10 12:07:26,079+1000 INFO  (jsonrpc/4) [api.virt] FINISH getMigrationStatus return={'status': {'code': 0, 'message': 'Done'}, 'migrationStats': {'status': {'code': 0, 'message': 'Migration in progress'}, 'progress': 0}} from=::ffff:192.168.100.253,48716, vmId=45c63b1d-64e2-4e93-b84c-be6d343594d2 (api:54)

Comment 1 Germano Veit Michel 2020-09-10 03:28:19 UTC
(In reply to Germano Veit Michel from comment #0)
> Version-Release number of selected component (if applicable):
> vdsm-4.40.22-1.el8ev.x86_64
> ovirt-engine-4.4.1.10-0.1.el8ev.noarch

Above are the versions I reproduced the bug. But customer hit it on:
ovirt-engine-4.3.9.4-11.el7.noarch
vdsm-4.30.44-1.el7ev.x86_64

Comment 2 Germano Veit Michel 2020-09-10 03:36:04 UTC
What is the logic in setting the VM as Migration Source here if its not post-copy?

https://github.com/oVirt/vdsm/blob/master/lib/vdsm/virt/vm.py#L915

Comment 4 Germano Veit Michel 2020-09-10 04:01:49 UTC
A second VDSM restart fixes the status of the VM to Up.

Comment 6 Nikolai Sednev 2020-10-21 13:14:12 UTC
Works fine on vdsm-4.40.33-1.el8ev.x86_64.

Comment 12 errata-xmlrpc 2020-11-24 13:10:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5179


Note You need to log in before you can comment on or make changes to this bug.