Created attachment 1623794 [details] first_time_VmPowerOff_failure_logs Description of problem: After disk live migration failed due to engine to host communication issues host also VM moves to non-responsive and VM poweroff failed with errors and VM is stuck in "powering down" state. Also after the host went up I tried again to poweroff the VM manually but still I get the same errors (see below) only restart to libvirtd changes the VM state to down. VDSM error: Virtual machine destroy error', 'code': 42 VDSM: 2019-10-08 19:39:17,535+0300 INFO (jsonrpc/5) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:10.46.16.248,52740, flow_id=vms_syncAction_b679707a-a717-483b, vmId=72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7 (api:48) 2019-10-08 19:39:47,537+0300 INFO (jsonrpc/5) [api.virt] FINISH destroy return={'status': {'message': 'Virtual machine destroy error', 'code': 42}} from=::ffff:10.46.16.248,52740, flow_id=vms_syncAction_b679707a-a717-483b, vmId=72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7 (api:54) Libvirt: 2019-10-08 16:39:23.885+0000: 5661: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (query, none, none) for domain vm_TestCase4925_0819323921; current job is (modify, none, none) owned by (5662 remoteDispatchDomainBlockJobAbort, 0 <null>, 0 <null> (flags=0x0)) for (222s, 0s, 0s) 2019-10-08 16:39:23.885+0000: 5661: error : qemuDomainObjBeginJobInternal:6744 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockJobAbort) 2019-10-08 16:39:23.885+0000: 5661: info : virObjectUnref:344 : OBJECT_UNREF: obj=0x7f4d74120270 2019-10-08 16:39:23.885+0000: 5661: error : virFileReadAll:1460 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d68\x2dvmTestCase49250819323921.scope/cpuacct.usage': No such file or directory 2019-10-08 16:39:23.885+0000: 5661: error : virCgroupGetValueStr:836 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d68\x2dvmTestCase49250819323921.scope/cpuacct.usage': No such file or directory 2019-10-08 16:39:23.885+0000: 5661: error : virFileReadAll:1460 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d68\x2dvmTestCase49250819323921.scope/cpuacct.stat': No such file o r directory 2019-10-08 16:39:23.885+0000: 5661: error : virCgroupGetValueStr:836 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d68\x2dvmTestCase49250819323921.scope/cpuacct.stat': No such f ile or directory 2019-10-08 16:39:23.885+0000: 5661: warning : qemuGetProcessInfo:1404 : cannot parse process status data 2019-10-08 16:39:23.885+0000: 5661: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found 2019-10-08 16:39:47.536+0000: 5660: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (destroy, none, none) for domain vm_TestCase4925_0819323921; current job is (modify, none, none) owned by (566 2 remoteDispatchDomainBlockJobAbort, 0 <null>, 0 <null> (flags=0x0)) for (246s, 0s, 0s) 2019-10-08 16:39:47.536+0000: 5660: error : qemuDomainObjBeginJobInternal:6744 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockJobAbort) Engine: 2019-10-08 19:39:17,535+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] Running command: StopVmCommand internal: false. Entities affected : ID: 72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7 Type: VMAction group STOP_VM with role type USER 2019-10-08 19:39:17,539+03 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] START, DestroyVmVDSCommand( DestroyVmVDSCommandParameters:{hostId='9e10e475-33fe-4aa6-8df2-5e6bfeb94356', vmId='72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'}), log id: 446f4cc5 2019-10-08 19:39:17,541+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] START, DestroyVDSCommand(HostName = host_mixed_3, DestroyVmVDSCommandParameters:{hostId='9e10e475-33fe-4aa6-8df2-5e6bfeb94356', vmId='72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'}), log id: 5d75f260 2019-10-08 19:39:29,737+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM '72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7'(vm_TestCase4925_0819323921) moved from 'NotResponding' --> 'PoweringDown' 2019-10-08 19:39:38,462+03 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [ead2c4d] Lock Acquired to object 'EngineLock:{exclusiveLocks='[9e1df8ec-2ac3-4496-81af-c9c76b839ac2=PROVIDER]', sharedLocks=''}' 2019-10-08 19:39:38,475+03 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [ead2c4d] Running command: SyncNetworkProviderCommand internal: true. 2019-10-08 19:39:38,627+03 INFO [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-83) [] User admin@internal successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access 2019-10-08 19:39:38,803+03 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [ead2c4d] Lock freed to object 'EngineLock:{exclusiveLocks='[9e1df8ec-2ac3-4496-81af-c9c76b839ac2=PROVIDER]', sharedLocks=''}' 2019-10-08 19:39:47,548+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] Failed in 'DestroyVDS' method 2019-10-08 19:39:47,553+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-84) [vms_syncAction_b679707a-a717-483b] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command DestroyVDS failed: Virtual machine destroy error 2019-10-08 19:39:47,553+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand' return value 'StatusOnlyReturn [status=Status [code=42, message=Virtual machine destroy error]]' 2019-10-08 19:39:47,553+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] HostName = host_mixed_3 2019-10-08 19:39:47,554+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] Command 'DestroyVDSCommand(HostName = host_mixed_3, DestroyVmVDSCommandParameters:{hostId='9e10e475-33fe-4aa6-8df2-5e6bfeb94356', vmId='72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'})' execution failed: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42 2019-10-08 19:39:47,554+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] FINISH, DestroyVDSCommand, return: , log id: 5d75f260 2019-10-08 19:39:47,554+03 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] Failed to destroy VM '72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7' in VDS = '9e10e475-33fe-4aa6-8df2-5e6bfeb94356' , error = 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42' 2019-10-08 19:39:47,554+03 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] FINISH, DestroyVmVDSCommand, return: , log id: 446f4cc5 2019-10-08 19:39:47,554+03 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (default task-84) [vms_syncAction_b679707a-a717-483b] Command 'org.ovirt.engine.core.bll.StopVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42 (Failed with error destroyErr and code 42) 2019-10-08 19:39:47,564+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-84) [vms_syncAction_b679707a-a717-483b] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM vm_TestCase4925_0819323921 (Host: host_mixed_3, User: admin@internal-authz). 2019-10-08 19:39:47,569+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-84) [] Operation Failed: [Virtual machine destroy error] Version-Release number of selected component (if applicable): How reproducible: Not a planned test, happened once due to a network issue between engine->host. Steps to Reproduce: TestCase4925.test_live_migrate_RO_disk[iscsi] ran + Network issue occured during live migration phase: 1) Started live disk migration from iscsi_0 SD to iscsi_1 2) right after that communication issues between engine -> host occurs and VM goes to NotResponding 2019-10-08 19:36:44,539+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] VM '72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7'(vm_TestCase4925_08193239 21) moved from 'Up' --> 'NotResponding' 3) Failed live migration 2019-10-08 19:38:40,226+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-72) [disks_syncAction_c0dd7821-d829-4b85] Failed VmReplicateDiskFi nish (Disk 'cc33e230-a2a6-48cd-a67c-0dc531ef4fdb', VM '72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7') 4) host goes up : 2019-10-08 19:38:47,157+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [551ea79f] EVENT_ID: VDS_DETECTED(13), Status of host host_mixed_3 was set to Up. 4) Trying to power down the VM fails once: VDSM logs shows that time the destroy started when host was not still up: 2019-10-08 19:39:17,535+0300 INFO (jsonrpc/5) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:10.46.16.248,52740, flow_id=vms_syncAction_b679707a-a717-483b, vmId=72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7 (api:48) 2019-10-08 19:39:47,537+0300 INFO (jsonrpc/5) [api.virt] FINISH destroy return={'status': {'message': 'Virtual machine destroy error', 'code': 42}} from=::ffff:10.46.16.248,52740, flow_id=vms_syncAction_b679707a-a717-483b, vmId=72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7 (api:54) 5)Hours later I powered up the host and tried to power down the VM again Still, I got the same results and VM stay in 'Powering down' state forever. Engine: 2019-10-09 15:06:17,026+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-78) [603c2621] EVENT_ID: VDS_DETECTED(13), Status of host host_mixed_3 was set to Up. 2019-10-09 15:06:17,030+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-78) [603c2621] EVENT_ID: VDS_ALERT_FENCE_IS_NOT_CONFIGURED(9,000), Failed to verify Power Management configuration for Host host_mixed_3. 2019-10-09 15:06:17,065+03 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-78) [31e97303] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 9e10e475-33fe-4aa6-8df2-5e6bfeb94356 Type: VDS 2019-10-09 15:06:25,090+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] Running command: StopVmCommand internal: false. Entities affected : ID: 72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7 Type: VMAction group STOP_VM with role type USER 2019-10-09 15:06:25,094+03 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] START, DestroyVmVDSCommand( DestroyVmVDSCommandParameters:{hostId='9e10e475-33fe-4aa6-8df2-5e6bfeb94356', vmId='72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'}), log id: 2b8455f4 2019-10-09 15:06:25,094+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] START, DestroyVDSCommand(HostName = host_mixed_3, DestroyVmVDSCommandParameters:{hostId='9e10e475-33fe-4aa6-8df2-5e6bfeb94356', vmId='72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'}), log id: 42dc40d2 2019-10-09 15:06:55,102+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] Failed in 'DestroyVDS' method 2019-10-09 15:06:55,116+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command DestroyVDS failed: Virtual machine destroy error 2019-10-09 15:06:55,116+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand' return value 'StatusOnlyReturn [status=Status [code=42, message=Virtual machine destroy error]]' 2019-10-09 15:06:55,116+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] HostName = host_mixed_3 2019-10-09 15:06:55,116+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] Command 'DestroyVDSCommand(HostName = host_mixed_3, DestroyVmVDSCommandParameters:{hostId='9e10e475-33fe-4aa6-8df2-5e6bfeb94356', vmId='72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'})' execution failed: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42 2019-10-09 15:06:55,116+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] FINISH, DestroyVDSCommand, return: , log id: 42dc40d2 2019-10-09 15:06:55,116+03 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] Failed to destroy VM '72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7' in VDS = '9e10e475-33fe-4aa6-8df2-5e6bfeb94356' , error = 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42' 2019-10-09 15:06:55,116+03 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] FINISH, DestroyVmVDSCommand, return: , log id: 2b8455f4 2019-10-09 15:06:55,116+03 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] Command 'org.ovirt.engine.core.bll.StopVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42 (Failed with error destroyErr and code 42) 2019-10-09 15:06:55,127+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-50940) [0a60c03e-33ad-42d5-a020-c6693790f8a8] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM vm_TestCase4925_0819323921 (Host: host_mixed_3, User: admin@internal-authz). VDSM: 2019-10-09 15:06:55,096+0300 WARN (jsonrpc/2) [virt.vm] (vmId='72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7') Failed to destroy VM '72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7' gracefully (error=68) (vm:5230) 2019-10-09 15:06:55,097+0300 INFO (jsonrpc/2) [api.virt] FINISH destroy return={'status': {'message': 'Virtual machine destroy error', 'code': 42}} from=::ffff:10.46.16.248,52740, flow_id=0a60c03e-33ad-42d5-a0 20-c6693790f8a8, vmId=72a8ad35-7a5e-4b8a-9d1e-9d9ea9d1d5a7 (api:54) 2019-10-09 15:06:55,097+0300 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call VM.destroy failed (error 42) in 30.01 seconds (__init__:312) Actual results: VM power down should work after Expected results: It's expected that LSM fails(due to communication issues) and that VM moves to non-responsive but after host goes up VM should go be able to change state and to power down successfully. Additional info: restart to libvirtd on the host where the VM resides solved the issue- vm went to down state.
Created attachment 1623795 [details] host_up_VmPowerOff_failure_logs
Also, another impact on this issue is that as long as the host as a VM which is not in 'down' state it can not be deactivated.
Version-Release number of selected component (if applicable): ovirt-engine-4.3.6.7-0.1.el7.noarch vdsm-4.30.33-1.el7ev.x86_64 libvirt-4.5.0-23.el7_7.1.x86_64 qemu-img-rhev-2.12.0-33.el7_7.4.x86_64
I saw this issue again: Description of problem: VM became unresponsive after VmReplicateDiskFinishVDSCommand and caused LSM failure and VM destroy error. Engine log: 2019-11-20 08:20:19,230+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] START, VmReplicateDiskFinishVDSCommand(HostName = host_mixed_2, VmReplicateDiskParameters:{hostId='666bdfa6-a483-431e-891a-1e9eb2197278', vmId='cfda2240-2db8-430b-9a09-2d37b97e33d6', storagePoolId='1b798999-f94b-46f3-850a-b002ab228fd6', srcStorageDomainId='288b173a-cb9e-444d-be42-43ec6aa73d5f', targetStorageDomainId='56c56243-35c2-4027-b095-0d013c12e1dd', imageGroupId='dd1b4ba9-3a01-4eba-b8bb-aeec6f1e048a', imageId='ba60a7f5-9893-474c-b7a4-ce67088a82de'}), log id: 27ed9a7a 2019-11-20 08:20:47,343+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-80) [] Setting new tasks map. The map contains now 2 tasks 2019-11-20 08:21:11,485+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14) [] VM 'cfda2240-2db8-430b-9a09-2d37b97e33d6'(vm_TestCase4925_2008172480) moved from 'Up' --> 'NotResponding' 2019-11-20 08:21:11,566+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14) [] EVENT_ID: VM_NOT_RESPONDING(126), VM vm_TestCase4925_2008172480 is not responding. 2019-11-20 08:21:17,344+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-92) [] Setting new tasks map. The map contains now 0 tasks 2019-11-20 08:21:17,344+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-92) [] Cleared all tasks of pool '1b798999-f94b-46f3-850a-b002ab228fd6'. 2019-11-20 08:23:19,442+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command VmReplicateDiskFinishVDS failed: Message timeout which can be caused by communication issues 2019-11-20 08:23:19,443+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand' return value 'StatusOnlyReturn [status=Status [code=5022, message=Message timeout which can be caused by communication issues]]' 2019-11-20 08:23:19,443+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] HostName = host_mixed_2 2019-11-20 08:23:19,443+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_2, VmReplicateDiskParameters:{hostId='666bdfa6-a483-431e-891a-1e9eb2197278', vmId='cfda2240-2db8-430b-9a09-2d37b97e33d6', storagePoolId='1b798999-f94b-46f3-850a-b002ab228fd6', srcStorageDomainId='288b173a-cb9e-444d-be42-43ec6aa73d5f', targetStorageDomainId='56c56243-35c2-4027-b095-0d013c12e1dd', imageGroupId='dd1b4ba9-3a01-4eba-b8bb-aeec6f1e048a', imageId='ba60a7f5-9893-474c-b7a4-ce67088a82de'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2019-11-20 08:23:19,443+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] FINISH, VmReplicateDiskFinishVDSCommand, return: , log id: 27ed9a7a 2019-11-20 08:23:19,443+02 INFO [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-27924) [disks_syncAction_202f4230-8e2e-4b9b] Clearing domains data for host host_mixed_2 2019-11-20 08:23:19,443+02 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-27924) [disks_syncAction_202f4230-8e2e-4b9b] Host 'host_mixed_2' is not responding. 2019-11-20 08:23:19,449+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-27924) [disks_syncAction_202f4230-8e2e-4b9b] EVENT_ID: VDS_HOST_NOT_RESPONDING(9,027), Host host_mixed_2 is not responding. Host cannot be fenced automatically because power management for the host is disabled. 2019-11-20 08:23:19,464+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] Failed VmReplicateDiskFinish (Disk 'dd1b4ba9-3a01-4eba-b8bb-aeec6f1e048a', VM 'cfda2240-2db8-430b-9a09-2d37b97e33d6') 2019-11-20 08:23:19,464+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] Command 'LiveMigrateDisk' id: 'cfb39447-69cb-489a-b1d1-10c634e76f3a' with children [8a6ee64e-1f55-4d05-9742-f1a8a6510603, da329ab8-6758-4e69-a24a-00e36500fc06, 88a452ba-e71a-4d18-bac1-6e0680d80c15] failed when attempting to perform the next operation, marking as 'ACTIVE' 2019-11-20 08:23:19,464+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] EngineException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022): org.ovirt.engine.core.common.errors.EngineException: EngineException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022) at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.replicateDiskFinish(LiveMigrateDiskCommand.java:429) [bll.jar:] at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.completeLiveMigration(LiveMigrateDiskCommand.java:393) [bll.jar:] at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:246) [bll.jar:] at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:] at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_232] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_232] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0.0.redhat-1] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redhat-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_232] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_232] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1] 2019-11-20 08:23:19,465+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-12) [disks_syncAction_202f4230-8e2e-4b9b] Command 'LiveMigrateDisk' id: 'cfb39447-69cb-489a-b1d1-10c634e76f3a' child commands '[8a6ee64e-1f55-4d05-9742-f1a8a6510603, da329ab8-6758-4e69-a24a-00e36500fc06, 88a452ba-e71a-4d18-bac1-6e0680d80c15]' executions were completed, status 'FAILED' 2019-11-20 08:23:20,661+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [disks_syncAction_202f4230-8e2e-4b9b] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure. 2019-11-20 08:23:20,662+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [disks_syncAction_202f4230-8e2e-4b9b] Failed during live storage migration of disk 'dd1b4ba9-3a01-4eba-b8bb-aeec6f1e048a' of vm 'cfda2240-2db8-430b-9a09-2d37b97e33d6', attempting to end replication before deleting the target disk 2019-11-20 08:23:50,670+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [disks_syncAction_202f4230-8e2e-4b9b] Failed in 'VmReplicateDiskFinishVDS' method 2019-11-20 08:23:50,681+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [disks_syncAction_202f4230-8e2e-4b9b] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command VmReplicateDiskFinishVDS failed: General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockJobAbort)',) 2019-11-20 08:23:50,682+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [disks_syncAction_202f4230-8e2e-4b9b] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_2, VmReplicateDiskParameters:{hostId='666bdfa6-a483-431e-891a-1e9eb2197278', vmId='cfda2240-2db8-430b-9a09-2d37b97e33d6', storagePoolId='1b798999-f94b-46f3-850a-b002ab228fd6', srcStorageDomainId='288b173a-cb9e-444d-be42-43ec6aa73d5f', targetStorageDomainId='288b173a-cb9e-444d-be42-43ec6aa73d5f', imageGroupId='dd1b4ba9-3a01-4eba-b8bb-aeec6f1e048a', imageId='ba60a7f5-9893-474c-b7a4-ce67088a82de'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockJobAbort)',), code = 100 2019-11-20 08:23:50,682+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [disks_syncAction_202f4230-8e2e-4b9b] Replication end of disk 'dd1b4ba9-3a01-4eba-b8bb-aeec6f1e048a' in vm 'cfda2240-2db8-430b-9a09-2d37b97e33d6' back to the source failed, skipping deletion of the target disk 2019-11-20 08:23:50,798+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [disks_syncAction_202f4230-8e2e-4b9b] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User admin@internal-authz have failed to move disk disk_virtio_scsicow_2008174840 to domain iscsi_1. 2019-11-20 08:23:53,762+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-86) [] Operation Failed: [Cannot move Virtual Disk. The Virtual Machine should be either in Up or Paused status.] 2019-11-20 08:23:56,669+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-71) [] VM 'cfda2240-2db8-430b-9a09-2d37b97e33d6'(vm_TestCase4925_2008172480) moved from 'NotResponding' --> 'PoweringDown' 2019-11-20 08:24:24,349+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] Failed in 'DestroyVDS' method 2019-11-20 08:24:24,355+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command DestroyVDS failed: Virtual machine destroy error 2019-11-20 08:24:24,356+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] Command 'DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='666bdfa6-a483-431e-891a-1e9eb2197278', vmId='cfda2240-2db8-430b-9a09-2d37b97e33d6', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'})' execution failed: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42 2019-11-20 08:24:24,356+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] FINISH, DestroyVDSCommand, return: , log id: 25d9bfc5 2019-11-20 08:24:24,356+02 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] Failed to destroy VM 'cfda2240-2db8-430b-9a09-2d37b97e33d6' in VDS = '666bdfa6-a483-431e-891a-1e9eb2197278' , error = 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42' 2019-11-20 08:24:24,356+02 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] FINISH, DestroyVmVDSCommand, return: , log id: 2ca3017f 2019-11-20 08:24:24,356+02 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] Command 'org.ovirt.engine.core.bll.StopVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42 (Failed with error destroyErr and code 42) 2019-11-20 08:24:24,367+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-86) [vms_syncAction_158dc9fe-bc60-4869] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM vm_TestCase4925_2008172480 (Host: host_mixed_2, User: admin@internal-authz). 2019-11-20 08:24:24,371+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-86) [] Operation Failed: [Virtual machine destroy error] VDSM: 2019-11-20 08:23:50,668+0200 ERROR (jsonrpc/7) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockJobAbort) (api:134) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 580, in diskReplicateFinish return self.vm.diskReplicateFinish(srcDisk, dstDisk) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4650, in diskReplicateFinish blkJobInfo = self._dom.blockJobInfo(drive.name, 0) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f raise toe TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockJobAbort) Version-Release number of selected component (if applicable): vdsm-4.30.38-1.el7ev.x86_64 ovirt-engine-4.3.7.2-0.1.el7.noarch How reproducible: Once (automation TC) Steps to Reproduce: 1. create VM with os 2. Attach read-only disks to the VM (all possible permutation) 3. Activate the disks 4. try to move them (LSM) to the another storage domain Actual results: Operation should succeed Expected results: Operation failed. Additional info: Relevant logs are attached
Created attachment 1639220 [details] logs
*** Bug 1776003 has been marked as a duplicate of this bug. ***
VM poweroff failed with errors and VM is stuck in "powering down" state forever(until restarting libvirt manually). Also after the host went up I tried again to poweroff the VM manually but still I get the same errors and VM is stuck in "powering down" state. This issue cause many TC's to fail as the host (which this VM resides on) can not be deactivated or move to maintenance for the rest of the test suit. Adding this as an automation blocker and raising severity to high. Please consider fixing this sooner, on 4.3.X.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Created attachment 1640357 [details] libvirt_qemu_logs Manual reproducing steps: - Create VM with OS - Add new R/O disk d1 - Start VM - Try to move (LSM) d1 to another storage domain - Operartion fails - VM enters into non responding state, try to power off VM - VM stays in 'powering down' state forever. * After libvirtd restart, the VM moves to 'Down' state. libvirt and qemu logs are attached.
Benny, can you please take a look at this? this reproduces 100% of the time with read only disk. Effects: - LSM fails - VM stays in 'powering down' state forever - This Block host to go to maintenance(and move to another cluster) as VM is not in down state. - Many tests fails(that depends on the host going to maintenance/moving to another cluster) due to this failure
(In reply to Avihai from comment #11) > Benny, can you please take a look at this? this reproduces 100% of the time > with read only disk. > > Effects: > - LSM fails > - VM stays in 'powering down' state forever > - This Block host to go to maintenance(and move to another cluster) as VM is > not in down state. > - Many tests fails(that depends on the host going to maintenance/moving to > another cluster) due to this failure Does it reproduce on an el8 host? The error is similar to this one: https://bugzilla.redhat.com/show_bug.cgi?id=1687176 And the bug being worked on is: https://bugzilla.redhat.com/show_bug.cgi?id=1655367
(In reply to Benny Zlotnik from comment #12) > (In reply to Avihai from comment #11) > > Benny, can you please take a look at this? this reproduces 100% of the time > > with read only disk. > > > > Effects: > > - LSM fails > > - VM stays in 'powering down' state forever > > - This Block host to go to maintenance(and move to another cluster) as VM is > > not in down state. > > - Many tests fails(that depends on the host going to maintenance/moving to > > another cluster) due to this failure > Does it reproduce on an el8 host? Evelina can you please check the same manual scenario from comment 10 in an 4.4 ENV and update? > The error is similar to this one: > https://bugzilla.redhat.com/show_bug.cgi?id=1687176 This bug is on RHEL7.6 and it's closed-deferred meaning nothing will be done with it. We see this issue on RHV4.3 meaning RHEL7.7 hosts and we started seeing this only on rhv-4.3.6/7 (was not seen before), it's weird as RHEL7.7 hosts were existing in 4.3.5 RHV and was not > And the bug being worked on is: > https://bugzilla.redhat.com/show_bug.cgi?id=1655367 This is a bug for RHEL8, not for RHEL7.7 meaning this bug will only be fixed in RHV4.4 and not for customers in 4.3. I think this bug should also be fixed in RHEL7.7 as we are seeing this all the time since 4.3.6.
(In reply to Avihai from comment #13) > (In reply to Benny Zlotnik from comment #12) > > (In reply to Avihai from comment #11) > > > Benny, can you please take a look at this? this reproduces 100% of the time > > > with read only disk. > > > > > > Effects: > > > - LSM fails > > > - VM stays in 'powering down' state forever > > > - This Block host to go to maintenance(and move to another cluster) as VM is > > > not in down state. > > > - Many tests fails(that depends on the host going to maintenance/moving to > > > another cluster) due to this failure > > Does it reproduce on an el8 host? > Evelina can you please check the same manual scenario from comment 10 in an > 4.4 ENV and update? > > > The error is similar to this one: > > https://bugzilla.redhat.com/show_bug.cgi?id=1687176 > This bug is on RHEL7.6 and it's closed-deferred meaning nothing will be done > with it. > We see this issue on RHV4.3 meaning RHEL7.7 hosts and we started seeing this > only on rhv-4.3.6/7 (was not seen before), it's weird as RHEL7.7 hosts were > existing in 4.3.5 RHV and was not > > > And the bug being worked on is: > > https://bugzilla.redhat.com/show_bug.cgi?id=1655367 > This is a bug for RHEL8, not for RHEL7.7 meaning this bug will only be fixed > in RHV4.4 and not for customers in 4.3. > I think this bug should also be fixed in RHEL7.7 as we are seeing this all > the time since 4.3.6. From what I checked now it seems live merge doesn't work at all with read only disks. libvirt.libvirtError: internal error: unable to execute QEMU command 'block-commit': Block node is read-only I have never used this flow previously so I don't know if it used to work in the past. But if live merge doesn't work, LSM can't work either. So perhaps the solution is to actually block these operations? And I don't get the lock issue you see on an el8 host, just the live merge error
I tried the flow in 4.4.0-0.13 ENV, VM doesn't enter into non responding state but the LSM fails.
Can we recheck this?
Evelina, can you please recheck this at latest 4.4 ds ? Engine changed to RHEL8.2 as well with other changes...
(In reply to Avihai from comment #18) > Evelina, can you please recheck this at latest 4.4 ds ? > Engine changed to RHEL8.2 as well with other changes... Yes, VM doesn't enter into non-responding state but the LSM fails.
I'm trying to understand when did this flow work It looks like libvirt blocks snapshot creation for read-only disks entirely (which makes sense in a way) libvirt.libvirtError: internal error: unable to execute QEMU command 'block-commit': Block node is read-only <-- When did live migrating a read-only disk work? If it didn't used to work, then we should just block this, if it did then it's a regression in libvirt/qemu
(In reply to Benny Zlotnik from comment #21) > I'm trying to understand when did this flow work > It looks like libvirt blocks snapshot creation for read-only disks entirely > (which makes sense in a way) > libvirt.libvirtError: internal error: unable to execute QEMU command > 'block-commit': Block node is read-only <-- When did live migrating a > read-only disk work? > If it didn't used to work, then we should just block this, if it did then > it's a regression in libvirt/qemu We test this flow since 2014, it did used to work.
(In reply to Evelina Shames from comment #22) > (In reply to Benny Zlotnik from comment #21) > > I'm trying to understand when did this flow work > > It looks like libvirt blocks snapshot creation for read-only disks entirely > > (which makes sense in a way) > > libvirt.libvirtError: internal error: unable to execute QEMU command > > 'block-commit': Block node is read-only <-- When did live migrating a > > read-only disk work? > > If it didn't used to work, then we should just block this, if it did then > > it's a regression in libvirt/qemu > > We test this flow since 2014, it did used to work. I see, so to fix this we need fixes in both qemu and libvirt (which added a fairly recent validation to block external snapshot creation for r/o disks). I guess it would be simpler to just fallback to regular move when dealing with r/o disks, so I'll add this workaround. Is there a bug for live snapshot creation for r/o disks?
Moving to 4.4.0, due to Regression keyword
It's more complicated than I initially thought Hi Peter (not sure who is the right person to ask), Is there a way to restore the old behaviour? Blocking the snapshot breaks live storage migration of read-only disks for us, I assume this was introduced here[1] I tried to find a way around this, without the snapshot but I get "Block node is read-only" for blockCopy which are coming from qemu [1] https://www.redhat.com/archives/libvir-list/2018-October/msg01322.html
So is the read-only snapshot used only for the 'live storage migration' case? I think it would be straightforward with the new blockdev infrastructure to allow direct copy without the need to do a snapshot. Same is theoretically also possible for snapshots, but if that's not needed we might not want to go that way. After we clarify what's really needed we can then open a libvirt bug/rfe for this.
*** Bug 1738520 has been marked as a duplicate of this bug. ***
(In reply to Benny Zlotnik from comment #23) > (In reply to Evelina Shames from comment #22) > > (In reply to Benny Zlotnik from comment #21) > > > I'm trying to understand when did this flow work > > > It looks like libvirt blocks snapshot creation for read-only disks entirely > > > (which makes sense in a way) > > > libvirt.libvirtError: internal error: unable to execute QEMU command > > > 'block-commit': Block node is read-only <-- When did live migrating a > > > read-only disk work? > > > If it didn't used to work, then we should just block this, if it did then > > > it's a regression in libvirt/qemu > > > > We test this flow since 2014, it did used to work. > > I see, so to fix this we need fixes in both qemu and libvirt (which added a > fairly recent validation to block external snapshot creation for r/o disks). > I guess it would be simpler to just fallback to regular move when dealing > with r/o disks, so I'll add this workaround. > Is there a bug for live snapshot creation for r/o disks? There wasn't, I opened one: Bug 1821627
Verified on rhv-4.4.1-5
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.