Bug 875651 - [engine] Storage will remain in locked forever in case remove storage pool fail on cannot find master domain
[engine] Storage will remain in locked forever in case remove storage pool fa...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
All Linux
unspecified Severity high
: ---
: 3.2.0
Assigned To: Liron Aravot
Gadi Ickowicz
storage
: ZStream
: 851154 (view as bug list)
Depends On:
Blocks: 890203 915537
  Show dependency treegraph
 
Reported: 2012-11-12 04:46 EST by Gadi Ickowicz
Modified: 2016-02-10 12:19 EST (History)
12 users (show)

See Also:
Fixed In Version: sf3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 890203 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
engine logs (83.85 KB, application/x-gzip)
2012-11-12 04:46 EST, Gadi Ickowicz
no flags Details

  None (edit)
Description Gadi Ickowicz 2012-11-12 04:46:22 EST
Created attachment 643348 [details]
engine logs

Description of problem:
Attempting to remove a storage domain from rhevm gui when the storage domain no longer exists will cause the storage domain to move to locked forever:

79b6c6-c0c0-40b4-9fb7-476ec12bb537 and after that failed to stop spm because of org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt
.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'sp
UUID=ed79b6c6-c0c0-40b4-9fb7-476ec12bb537, msdUUID=00000000-0000-0000-0000-000000000000': org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException:
 org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master d
omain: 'spUUID=ed79b6c6-c0c0-40b4-9fb7-476ec12bb537, msdUUID=00000000-0000-0000-0000-000000000000'
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:212) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$9.runInTransaction(RemoveStoragePoolCommand.java:239) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$9.runInTransaction(RemoveStoragePoolCommand.java:236) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.handleDestroyStoragePoolCommand(RemoveStoragePoolCommand.java:236) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.access$900(RemoveStoragePoolCommand.java:42) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$7.runInTransaction(RemoveStoragePoolCommand.java:184) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand$7.runInTransaction(RemoveStoragePoolCommand.java:180) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.regularRemoveStorageDomains(RemoveStoragePoolCommand.java:180) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand.executeCommand(RemoveStoragePoolCommand.java:70) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:825) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:916) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1300) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:168) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:107) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:931) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:285) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommands(MultipleActionsRunner.java:182) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.RunCommands(MultipleActionsRunner.java:162) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner$1.run(MultipleActionsRunner.java:84) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_09-icedtea]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09-icedtea]

2012-11-12 10:36:31,820 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-4-thread-47) [4ce52b9d] transaction rolled back


Version-Release number of selected component (if applicable):
rhevm-3.1.0-22.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a storage domain attach is as master to a datacenter
2. manually delete the storage domain data
3. try to remove the datacenter from gui (*not* force remove)
  
Actual results:
Storage domain enters "locked" state because vdsm fails due to "cannot find master domain"

Expected results:
If the master domain cannot be found (during connectStoragePool, during remove storage pool flow), the datacenter should be removed from the engine

Additional info:
Comment 2 Gadi Ickowicz 2012-11-12 05:26:02 EST
*** Bug 851154 has been marked as a duplicate of this bug. ***
Comment 3 Ayal Baron 2012-11-21 05:25:36 EST
Are your hosts in maintenance?
Can you move all hosts to maintenance and run 'force remove'?
Comment 4 Ayal Baron 2012-11-25 04:30:52 EST
Gadi?
Comment 5 Gadi Ickowicz 2012-11-25 10:17:36 EST
(In reply to comment #3)
> Are your hosts in maintenance?
> Can you move all hosts to maintenance and run 'force remove'?

After entering the locked state, even moving the hosts to maintenance and using 'force remove' I receive the same error message from the GUI: "Error: cannot remove Data Center which contains active/locked Storage Domains. Please deactivate all domains and wait for tasks to finish before removing the Data Center.
Comment 6 Liron Aravot 2012-12-11 08:22:05 EST
solved by 
http://gerrit.ovirt.org/#/c/9843/
Comment 7 Allon Mureinik 2012-12-11 10:22:11 EST
Devel ack for fixing serializtion.

Liron - leys also see if the flow can be streamlined and simplified and discuss.
Comment 8 Allon Mureinik 2012-12-12 05:30:49 EST
merged upsteream
Comment 10 Gadi Ickowicz 2013-03-14 04:59:47 EDT
Verified on SF9
Storage domain moves back to inactive. Can then force remove datacenter if needed
Comment 11 Itamar Heim 2013-06-11 05:45:46 EDT
3.2 has been released
Comment 12 Itamar Heim 2013-06-11 05:45:53 EDT
3.2 has been released
Comment 13 Itamar Heim 2013-06-11 05:56:08 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.