Bug 2010478 - After storage error HA VMs failed to auto resume.
Summary: After storage error HA VMs failed to auto resume.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.5.0
: 4.5.0
Assignee: Milan Zamazal
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-04 18:23 UTC by Frank DeLorey
Modified: 2022-08-21 11:17 UTC (History)
7 users (show)

Fixed In Version: vdsm-4.50.0.5
Doc Type: Bug Fix
Doc Text:
Previously, if storage problems occurred and disappeared during a VM migration attempt, it sometimes led to the VM being paused and not resuming even if the VM had an auto-resume policy set. In this release, the VM is handled according to its resume behavior policy when the storage state changes during a VM migration attempt.
Clone Of:
Environment:
Last Closed: 2022-05-26 17:22:44 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43758 0 None None None 2021-10-04 18:29:35 UTC
Red Hat Product Errata RHSA-2022:4764 0 None None None 2022-05-26 17:23:06 UTC
oVirt gerrit 117958 0 master MERGED virt: Prevent migrations of VMs paused due to an I/O error 2022-01-07 21:25:07 UTC
oVirt gerrit 117960 0 master MERGED virt: Postpone VM resume if blocked due to the VM status 2022-01-07 21:25:09 UTC
oVirt gerrit 117985 0 master MERGED virt: Permit resuming migrating VMs stopped due to I/O errors 2022-01-07 21:27:15 UTC

Comment 7 Polina 2022-04-24 08:41:32 UTC
verified on ovirt-engine-tools-4.5.0.2-0.7.el8ev.noarch

The following scenarios tested:

1. Run High Available VM with lease .Block connection to storage from host with the running VM . The VM is paused with IO ERROR storage error. No attempts of migrations in the engine.log. When storage is back the VM is recovered back to UP.

2. Run the High Available VM configured with 16384 MB , 4 CPUs and post-copy migration policy . on another hosts create CPU load so that the migration there is possible but slower because all of this configurations. Start migration process . while migration block connection to storage from host with the running VM. The VM is paused  

2022-04-20 17:49:09,251+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [2f7d31b1] Migration of VM 'golden_env_mixed_virtio_2' to host 'host_mixed_2' failed: VM destroyed during the startup.
2022-04-20 17:49:09,289+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-27) [2f7d31b1] VM '92e17fe6-fdea-4a85-8685-4abadea95dd9'(golden_env_mixed_virtio_2) moved from 'MigratingFrom' --> 'Paused'
2022-04-20 17:49:09,301+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-27) [2f7d31b1] EVENT_ID: VM_PAUSED(1,025), VM golden_env_mixed_virtio_2 has been paused.
2022-04-20 17:49:09,311+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-27) [2f7d31b1] EVENT_ID: VM_PAUSED_ERROR(139), VM golden_env_mixed_virtio_2 has been paused
...

2022-04-20 17:51:34,978+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-31) [] VM '92e17fe6-fdea-4a85-8685-4abadea95dd9'(golden_env_mixed_virtio_2) moved from 'Paused' --> 'Up'
2022-04-20 17:51:34,991+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-31) [] EVENT_ID: VM_RECOVERED_FROM_PAUSE_ERROR(196), VM golden_env_mixed_virtio_2 has recovered from paused back to up.

3. The same as the case 2 with the difference that storage was disconnected on two hosts - source and destination.
The HA VM is paused and remains paused after the storage domain is fixed since it is not I/O error, but unknown error(only VMs paused with I/O Error resumed automatically). Upon user resume attempt the VM is restarted. 


2022-04-24 10:50:13,033+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-80879) [74a5150f-1c65-469c-a676-18fff3bcca28] EVENT_ID: USER_RESUME_VM(40), VM golden_env_mixed_virtio_2 was resumed by admin@internal-authz (Host: host_mixed_1).
2022-04-24 10:50:21,247+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-63) [] VM '92e17fe6-fdea-4a85-8685-4abadea95dd9'(golden_env_mixed_virtio_2) moved from 'Paused' --> 'NotResponding'
...
2022-04-24 10:51:54,529+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-17) [43f0c3f6] Failed to destroy VM '92e17fe6-fdea-4a85-8685-4abadea95dd9' because VM does not exist, ignoring
2022-04-24 10:51:54,529+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-17) [43f0c3f6] FINISH, DestroyVDSCommand, return: , log id: 226d4497
2022-04-24 10:51:54,529+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-17) [43f0c3f6] VM '92e17fe6-fdea-4a85-8685-4abadea95dd9'(golden_env_mixed_virtio_2) moved from 'Paused' --> 'Down'
2022-04-24 10:51:54,615+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-17) [43f0c3f6] EVENT_ID: VM_DOWN_ERROR(119), VM golden_env_mixed_virtio_2 is down with error. Exit message: Down because paused for too long.
2022-04-24 10:51:54,616+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-17) [43f0c3f6] add VM '92e17fe6-fdea-4a85-8685-4abadea95dd9'(golden_env_mixed_virtio_2) to HA rerun treatment
2022-04-24 10:51:54,633+03 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-80919) [704a0b1d] Running command: ProcessDownVmCommand internal: true.
2022-04-24 10:51:54,656+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-17) [43f0c3f6] EVENT_ID: HA_VM_FAILED(9,602), Highly Available VM golden_env_mixed_virtio_2 failed. It will be restarted automatically.
2022-04-24 10:51:55,299+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-72) [43df18ba] START, CreateBrokerVDSCommand(HostName = host_mixed_1, CreateVDSCommandParameters:{hostId='d9bae774-64a1-411c-a9ed-ddadbc2095f9', vmId='92e17fe6-fdea-4a85-8685-4abadea95dd9', vm='VM [golden_env_mixed_virtio_2]'}), log id: bf2ab5

022-04-24 10:53:06,573+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM golden_env_mixed_virtio_2 on Host host_mixed_1

VM restarted

Comment 14 errata-xmlrpc 2022-05-26 17:22:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: RHV RHEL Host (ovirt-host) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4764


Note You need to log in before you can comment on or make changes to this bug.