Bug 875651

Summary: [engine] Storage will remain in locked forever in case remove storage pool fail on cannot find master domain
Product: Red Hat Enterprise Virtualization Manager Reporter: Gadi Ickowicz <gickowic>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED CURRENTRELEASE QA Contact: Gadi Ickowicz <gickowic>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, laravot, lpeer, nlevinki, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---Keywords: ZStream
Target Release: 3.2.0   
Hardware: All   
OS: Linux   
Whiteboard: storage
Fixed In Version: sf3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 890203 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 890203, 915537    
Attachments:
Description Flags
engine logs none

Description Gadi Ickowicz 2012-11-12 09:46:22 UTC
Created attachment 643348 [details]
engine logs

Description of problem:
Attempting to remove a storage domain from rhevm gui when the storage domain no longer exists will cause the storage domain to move to locked forever:

79b6c6-c0c0-40b4-9fb7-476ec12bb537 and after that failed to stop spm because of org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt
.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'sp
UUID=ed79b6c6-c0c0-40b4-9fb7-476ec12bb537, msdUUID=00000000-0000-0000-0000-000000000000': org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException:
 org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master d
omain: 'spUUID=ed79b6c6-c0c0-40b4-9fb7-476ec12bb537, msdUUID=00000000-0000-0000-0000-000000000000'
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:212) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$9.runInTransaction(RemoveStoragePoolCommand.java:239) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$9.runInTransaction(RemoveStoragePoolCommand.java:236) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.handleDestroyStoragePoolCommand(RemoveStoragePoolCommand.java:236) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.access$900(RemoveStoragePoolCommand.java:42) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$7.runInTransaction(RemoveStoragePoolCommand.java:184) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$7.runInTransaction(RemoveStoragePoolCommand.java:180) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.regularRemoveStorageDomains(RemoveStoragePoolCommand.java:180) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.executeCommand(RemoveStoragePoolCommand.java:70) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:825) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:916) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1300) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:168) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:107) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:931) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:285) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommands(MultipleActionsRunner.java:182) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.RunCommands(MultipleActionsRunner.java:162) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner$1.run(MultipleActionsRunner.java:84) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_09-icedtea]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09-icedtea]

2012-11-12 10:36:31,820 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-47) [4ce52b9d] transaction rolled back


Version-Release number of selected component (if applicable):
rhevm-3.1.0-22.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a storage domain attach is as master to a datacenter
2. manually delete the storage domain data
3. try to remove the datacenter from gui (*not* force remove)
  
Actual results:
Storage domain enters "locked" state because vdsm fails due to "cannot find master domain"

Expected results:
If the master domain cannot be found (during connectStoragePool, during remove storage pool flow), the datacenter should be removed from the engine

Additional info:

Comment 2 Gadi Ickowicz 2012-11-12 10:26:02 UTC
*** Bug 851154 has been marked as a duplicate of this bug. ***

Comment 3 Ayal Baron 2012-11-21 10:25:36 UTC
Are your hosts in maintenance?
Can you move all hosts to maintenance and run 'force remove'?

Comment 4 Ayal Baron 2012-11-25 09:30:52 UTC
Gadi?

Comment 5 Gadi Ickowicz 2012-11-25 15:17:36 UTC
(In reply to comment #3)
> Are your hosts in maintenance?
> Can you move all hosts to maintenance and run 'force remove'?

After entering the locked state, even moving the hosts to maintenance and using 'force remove' I receive the same error message from the GUI: "Error: cannot remove Data Center which contains active/locked Storage Domains. Please deactivate all domains and wait for tasks to finish before removing the Data Center.

Comment 6 Liron Aravot 2012-12-11 13:22:05 UTC
solved by 
http://gerrit.ovirt.org/#/c/9843/

Comment 7 Allon Mureinik 2012-12-11 15:22:11 UTC
Devel ack for fixing serializtion.

Liron - leys also see if the flow can be streamlined and simplified and discuss.

Comment 8 Allon Mureinik 2012-12-12 10:30:49 UTC
merged upsteream

Comment 10 Gadi Ickowicz 2013-03-14 08:59:47 UTC
Verified on SF9
Storage domain moves back to inactive. Can then force remove datacenter if needed

Comment 11 Itamar Heim 2013-06-11 09:45:46 UTC
3.2 has been released

Comment 12 Itamar Heim 2013-06-11 09:45:53 UTC
3.2 has been released

Comment 13 Itamar Heim 2013-06-11 09:56:08 UTC
3.2 has been released