Bug 1404727 - Storage domain remain locked after engine restart while attachment is in progress due to NPE in the compensation infrastructure
Summary: Storage domain remain locked after engine restart while attachment is in prog...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.1.0-beta
: 4.1.0.2
Assignee: Benny Zlotnik
QA Contact: Lilach Zitnitski
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-14 13:56 UTC by Lilach Zitnitski
Modified: 2017-02-01 14:50 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-01 14:50:34 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: blocker+
rule-engine: planning_ack+
amureini: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)
logs zip (352.99 KB, application/zip)
2016-12-14 14:15 UTC, Lilach Zitnitski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 68570 0 master MERGED core: fix NPE when compensating after engine shutdown 2016-12-18 07:56:55 UTC

Description Lilach Zitnitski 2016-12-14 13:56:34 UTC
Description of problem:
When attaching storage domain to a DC, and restarting the engine service while the attachment in progress, storage domain looks Locked in the UI and no actions can be performed, accept destroy. 

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.2.master.20161203231307.gitd7d920b.el7.centos.noarch
vdsm-4.18.999-1138.git6c51957.el7.centos.x86_64

How reproducible:
Tried with 2 SDs, reproduced on both. 

Steps to Reproduce:
1. attach storage domain to a dc
2. while the process is still running, restart the ovirt-engine service
3. wait for the UI to come back and check the storage domains' status 

Actual results:
Storage domain appears Locked and no actions can be performed on it (expect destroy)

Expected results:
Storage domain should be unattached and the user should be able to attach it to the dc

Additional info:

vdsm.log

2016-12-14 14:23:36,308 INFO  (jsonrpc/1) [dispatcher] Run and protect: connectStoragePool(spUUID=u'cb93d507-6f32-4eda-b916-c99ff6a7afe1', hostID=1, msdUUID=u'bd0c9dd0-ca22-4ce5-bb47-3c903409baec', masterVersion=12, domainsMap={u'bd0c9dd0-ca22-4ce5-bb47-3c903409baec': u'active', u'8c14efe4-c881-47e7-a5b8-0fa8d3179e07': u'active', u'67944510-99f4-4746-88a0-ba5c6aeaf21d': u'active', u'ed6f577a-2d9c-4c31-ac08-720edf376940': u'active'}, options=None) (logUtils:49)

engine.log

2016-12-14 14:21:48,529+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (org.ovirt.thread.pool-6-thread-6) [ed406bd5-ba7c-401e-a444-4b6be6b1
7010] FINISH, HSMGetStorageDomainInfoVDSCommand, return: <StorageDomainStatic:{name='unattached_sd2', id='67944510-99f4-4746-88a0-ba5c6aeaf21d'}, null>, log id: 5dc0c956
2016-12-14 14:21:48,533+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-6) [ed406bd5-ba7c-401e-a444-4b6be6b17010
] START, AttachStorageDomainVDSCommand( AttachStorageDomainVDSCommandParameters:{runAsync='true', storagePoolId='cb93d507-6f32-4eda-b916-c99ff6a7afe1', ignoreFailoverLimit='false'
, storageDomainId='67944510-99f4-4746-88a0-ba5c6aeaf21d'}), log id: 5a796712

Comment 1 Lilach Zitnitski 2016-12-14 14:15:17 UTC
Created attachment 1231753 [details]
logs zip

engine.log
vdsm.log

Comment 2 Allon Mureinik 2016-12-18 09:09:21 UTC
Looking through the patch attached to to the BZ is a bit unsettling.

While it should indeed solve the bug described here, the issue is deeper than just this flow. The bug occurs in the compensation infrastructure, and would, in theory, affect all the flow that use it if the engine is restarted in the middle of them.

Raz - at the very least I think we wait with engine-restart tests till QA has a build with this fix. Do you want to track this here, or open a separate BZ(s) for it?

Comment 3 Raz Tamir 2016-12-19 09:06:58 UTC
Allon,
We can track it here

Comment 4 Lilach Zitnitski 2017-01-01 15:11:57 UTC
--------------------------------------
Tested with the following code:
----------------------------------------
rhevm-4.1.0-0.3.beta2.el7.noarch
vdsm-4.19.1-1.el7ev.x86_64


Tested with the following scenario:

Steps to Reproduce:
1. attach storage domain to a dc
2. while the process is still running, restart the ovirt-engine service
3. wait for the UI to come back and check the storage domains' status 

Actual results:
After ovirt-engine restart, the attached storage domain appears unattached and can be attached again. 

Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.