Bug 958095

Summary: RHEVM - Backend: ConcurrentModificationException failing domain/host fail process
Product: Red Hat Enterprise Virtualization Manager Reporter: Daniel Paikov <dpaikov>
Component: ovirt-engineAssignee: mkublin <mkublin>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.2.0CC: acathrow, bazulay, dyasny, hateya, iheim, lpeer, mkublin, Rhev-m-bugs, sgrinber, yeylon, ykaul, yzaslavs
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: sf16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-11 09:14:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 948448    
Attachments:
Description Flags
engine.log
none
vdsm.log none

Description Daniel Paikov 2013-04-30 11:36:28 UTC
Created attachment 741837 [details]
engine.log

* DC with 3 hosts, 2 domains on different storage servers.
* Block connections between 2 HSMs and the non-master domain.
* 2 HSMs should become Non Operational, but only 1 becomes Non Operational.
* The following exception is seen:


2013-04-30 13:55:28,112 ERROR [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (pool-7-thread-50) Exception during process of events for p
ool 5849b030-626e-47cb-ad90-3ce782d831b3, error is java.util.concurrent.ExecutionException: java.util.ConcurrentModificationException: java.util.c
oncurrent.ExecutionException: java.util.ConcurrentModificationException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) [rt.jar:1.7.0_b147-icedtea]
        at java.util.concurrent.FutureTask.get(FutureTask.java:111) [rt.jar:1.7.0_b147-icedtea]
        at org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor$InternalEventQueueThread.run(EventQueueMonitor.java:157) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_b147-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_b147-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_b147-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_b147-icedtea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_b147-icedtea]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_b147-icedtea]
Caused by: java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(HashMap.java:806) [rt.jar:1.7.0_b147-icedtea]
        at java.util.HashMap$KeyIterator.next(HashMap.java:841) [rt.jar:1.7.0_b147-icedtea]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand$IrsProxyData.ProcessDomainRecovery(IrsBrokerCommand.java:1284) [engine-vdsbr
oker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand$IrsProxyData.access$600(IrsBrokerCommand.java:121) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand$IrsProxyData$6.call(IrsBrokerCommand.java:1222) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand$IrsProxyData$6.call(IrsBrokerCommand.java:1216) [engine-vdsbroker.jar:]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_b147-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_b147-icedtea]
        ... 6 more


* Block the same domain from the 3rd host (SPM).
* The domain does not become Inactive, remains Active.

Comment 1 Daniel Paikov 2013-04-30 11:39:51 UTC
Created attachment 741838 [details]
vdsm.log

Comment 2 mkublin 2013-04-30 13:59:13 UTC
Bug is easy, this is not race, it is a simple wrong code, we are trying to go through collection and in the same time we are trying to modified it.
Simple java. I will provide patch soon.

Comment 6 Elad 2013-05-13 11:55:32 UTC
After connectivity lost to storage, both HSM's become non-operational. After that, when blocking the storage to SPM, the domain become inactive.


Verified on RHEVM-3.2 - SF16:
rhevm-3.2.0-10.25.beta3.el6ev.noarch
vdsm-4.10.2-18.0.el6ev.x86_64

Comment 7 Itamar Heim 2013-06-11 09:14:57 UTC
3.2 has been released

Comment 8 Itamar Heim 2013-06-11 09:40:38 UTC
3.2 has been released