Description of problem: On 4.1.3, when restarting vdsmd service on the SPM host during cold storage migration, the migration fails and raises error - 2017-07-09 11:46:31,395 - MainThread - api_utils - ERROR - Failed to syncAction element NOT as expected: Status: 409 Reason: Conflict Detail: [Cannot move Virtual Disk. The relevant Storage Domain's status is Unknown.] On 4.2 this error is no longer shown in the same scenario and in the REST log no helpful information is shown about the reason the operation failed. 2017-07-09 11:59:18,886 - MainThread - disks - INFO - Using Correlation-Id: disks_syncAction_3842a0cc-3429-4a82 2017-07-09 11:59:19,039 - MainThread - core_api - DEBUG - Request POST response time: 0.017 2017-07-09 11:59:19,039 - MainThread - disks - DEBUG - Cleaning Correlation-Id: disks_syncAction_3842a0cc-3429-4a82 2017-07-09 11:59:19,040 - MainThread - disks - DEBUG - Response code is valid: [200, 201] 2017-07-09 11:59:19,042 - MainThread - disks - DEBUG - Action status is valid: ['complete'] 2017-07-09 11:59:19,093 - MainThread - art.logging - ERROR - Status: failed Version-Release number of selected component (if applicable): ovirt-engine-4.2.0-0.0.master.20170707124946.gitf15a6d9.el7.centos.noarch vdsm-4.20.1-157.git79aca9e.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. create vm with few disks 2. start migrating the disks to another storage domain 3. restart vdsmd service on spm host Actual results: Migration fails but no exception is raised Expected results: Migration should fail with DiskException Additional info: 4.1.3 engine log 2017-07-09 11:46:31,103+03 INFO [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Running command: MoveDisksCommand internal: false. Entities affected : ID: a9c880ae-5fc3-4d0f-a124-8f477945c890 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER 2017-07-09 11:46:31,290+03 INFO [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a9c880ae-5fc3-4d0f-a124-8f477945c890=DISK]', sharedLocks='[80252462-ea54-4aaa-83ab-5d0997186088=VM]'}' 2017-07-09 11:46:31,354+03 WARN [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Validation of action 'MoveOrCopyDisk' failed for user admin@internal-authz. Reasons: VAR__ACTION__MOVE,VAR__TYPE__DISK,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Unknown 2017-07-09 11:46:31,360+03 INFO [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Lock freed to object 'EngineLock:{exclusiveLocks='[a9c880ae-5fc3-4d0f-a124-8f477945c890=DISK]', sharedLocks='[80252462-ea54-4aaa-83ab-5d0997186088=VM]'}' 2017-07-09 11:46:31,374+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-10) [] Operation Failed: [Cannot move Virtual Disk. The relevant Storage Domain's status is Unknown.] 4.2 engine.log 2017-07-09 12:27:29,572+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-25) [disks_syncAction_207d937a-6eea-43a7] EVENT_ID: USER_MOVED_DISK(2,008), User admin@internal-authz moving disk disk_virtiocow_0912252 731 to domain nfs_1. 2017-07-09 12:27:36,055+03 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages Broken pipe 2017-07-09 12:27:36,057+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler8) [ad72377] Command 'GetAllVmStatsVDSCommand(HostName = host_mixed_3, VdsIdVDSCommandParametersBase:{hostId='f4085187-a3a2-4b3b-b46f-02d0b9e553fe'} )' execution failed: VDSGenericException: VDSNetworkException: Broken pipe
Created attachment 1295612 [details] logs
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
The described scenario doesn't sound like a regression but merely a change in behavior. Moving a disk is an async operation, hence when using rest-api its status should be polled. The mentioned exception [1] is part of the validation upon initiating the operation. I.e. when moving a disk, we validate the domain status *before* the operation begins, not during the operation. [1] "Cannot move Virtual Disk. The relevant Storage Domain's status is Unknown."