Bug 1468883 - Cold storage migration failure doesn't raise DiskException as expected
Cold storage migration failure doesn't raise DiskException as expected
Status: CLOSED NOTABUG
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
4.2.0
Unspecified Unspecified
unspecified Severity medium (vote)
: ovirt-4.2.0
: ---
Assigned To: Daniel Erez
Raz Tamir
: Automation, Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-09 07:37 EDT by Lilach Zitnitski
Modified: 2017-07-10 05:47 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-10 05:47:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2+
rule-engine: blocker+


Attachments (Terms of Use)
logs (188.38 KB, application/zip)
2017-07-09 07:37 EDT, Lilach Zitnitski
no flags Details

  None (edit)
Description Lilach Zitnitski 2017-07-09 07:37:11 EDT
Description of problem:
On 4.1.3, when restarting vdsmd service on the SPM host during cold storage migration, the migration fails and raises error - 
2017-07-09 11:46:31,395 - MainThread - api_utils - ERROR - Failed to syncAction element NOT as expected:
        Status: 409
        Reason: Conflict
        Detail: [Cannot move Virtual Disk. The relevant Storage Domain's status is Unknown.]
On 4.2 this error is no longer shown in the same scenario and in the REST log no helpful information is shown about the reason the operation failed.

2017-07-09 11:59:18,886 - MainThread - disks - INFO - Using Correlation-Id: disks_syncAction_3842a0cc-3429-4a82
2017-07-09 11:59:19,039 - MainThread - core_api - DEBUG - Request POST response time: 0.017
2017-07-09 11:59:19,039 - MainThread - disks - DEBUG - Cleaning Correlation-Id: disks_syncAction_3842a0cc-3429-4a82
2017-07-09 11:59:19,040 - MainThread - disks - DEBUG - Response code is valid: [200, 201]
2017-07-09 11:59:19,042 - MainThread - disks - DEBUG - Action status is valid: ['complete']
2017-07-09 11:59:19,093 - MainThread - art.logging - ERROR - Status: failed

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20170707124946.gitf15a6d9.el7.centos.noarch
vdsm-4.20.1-157.git79aca9e.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. create vm with few disks 
2. start migrating the disks to another storage domain
3. restart vdsmd service on spm host

Actual results:
Migration fails but no exception is raised 

Expected results:
Migration should fail with DiskException 

Additional info:

4.1.3 engine log

2017-07-09 11:46:31,103+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Running command: MoveDisksCommand internal: false. Entities affected :  ID: a9c880ae-5fc3-4d0f-a124-8f477945c890 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER
2017-07-09 11:46:31,290+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a9c880ae-5fc3-4d0f-a124-8f477945c890=DISK]', sharedLocks='[80252462-ea54-4aaa-83ab-5d0997186088=VM]'}'
2017-07-09 11:46:31,354+03 WARN  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Validation of action 'MoveOrCopyDisk' failed for user admin@internal-authz. Reasons: VAR__ACTION__MOVE,VAR__TYPE__DISK,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Unknown
2017-07-09 11:46:31,360+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [disks_syncAction_3a6224da-b3c5-46fd] Lock freed to object 'EngineLock:{exclusiveLocks='[a9c880ae-5fc3-4d0f-a124-8f477945c890=DISK]', sharedLocks='[80252462-ea54-4aaa-83ab-5d0997186088=VM]'}'
2017-07-09 11:46:31,374+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-10) [] Operation Failed: [Cannot move Virtual Disk. The relevant Storage Domain's status is Unknown.]

4.2 engine.log

2017-07-09 12:27:29,572+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-25) [disks_syncAction_207d937a-6eea-43a7] EVENT_ID: USER_MOVED_DISK(2,008), User admin@internal-authz moving disk disk_virtiocow_0912252
731 to domain nfs_1.
2017-07-09 12:27:36,055+03 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages Broken pipe
2017-07-09 12:27:36,057+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler8) [ad72377] Command 'GetAllVmStatsVDSCommand(HostName = host_mixed_3, VdsIdVDSCommandParametersBase:{hostId='f4085187-a3a2-4b3b-b46f-02d0b9e553fe'}
)' execution failed: VDSGenericException: VDSNetworkException: Broken pipe
Comment 1 Lilach Zitnitski 2017-07-09 07:37 EDT
Created attachment 1295612 [details]
logs
Comment 2 Red Hat Bugzilla Rules Engine 2017-07-09 12:43:19 EDT
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Comment 3 Daniel Erez 2017-07-10 05:47:22 EDT
The described scenario doesn't sound like a regression but merely a change in behavior. Moving a disk is an async operation, hence when using rest-api its status should be polled. The mentioned exception [1] is part of the validation upon initiating the operation. I.e. when moving a disk, we validate the domain status *before* the operation begins, not during the operation.

[1] "Cannot move Virtual Disk. The relevant Storage Domain's status is Unknown."

Note You need to log in before you can comment on or make changes to this bug.