ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs. This Issue was found during investigation of Bug 891270. Description: ************ I Started 10000 VMs with 256MB Memory (100 by 100) on 13 hosts. Environment: ************* rhevm3.2 (build sf2.1) rhevm-3.2.0-2.el6ev.noarch rhevm-backend-3.2.0-2.el6ev.noarch vdsm-cli-4.10.2-2.0.el6.noarch (on hosts) vdsm-4.10.2-2.0.el6.x86_64 (on hosts) Results: ******** Found one Java-level deadlock: <See Console.log> ============================= "pool-3-thread-33": waiting to lock monitor 0x00007ff678002ee8 (object 0x00000000c398be80, a java.lang.Object), which is held by "QuartzScheduler_Worker-99" "QuartzScheduler_Worker-99": waiting for ownable synchronizer 0x00000000c398c800, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "pool-3-thread-30" "pool-3-thread-30": waiting to lock monitor 0x00007ff678002ee8 (object 0x00000000c398be80, a java.lang.Object), which is held by "QuartzScheduler_Worker-99"
Created attachment 671471 [details] engine.log
Created attachment 671472 [details] console_log
Fixed Description: ***************** I Started *1000* VMs with 256MB Memory (100 by 100) on 13 hosts.
the deadlock is caused by the refresh thread holding the VdsManager lock waiting on decreasedPending lock and RunVm thread performing rerun() and holding the decreasedPending lock waiting to perform UpdateVdsDynamicData ( a VDS command which acquires the VdsManager lock) I see 2 main ways to solve this: 1. get rid of the decreasedPending lock and make it AtomicInteger to ensure atomicity and visibility without blocking 2. fix the order of lock acquisition in decreasedPending() method - first get the VdsManager lock and then perform decreasPending and call
During doing some work on phantom vds status, the deadlock also will be solved a patch is added to bug
Currently we do not have the resources (Lab) to test it. will have to push it forward to 3.4
QE cannot verify this bug in 3.3, will verify in 3.4
The bug is identical to bug *Bug 1060692* <https://bugzilla.redhat.com/show_bug.cgi?id=1060692> -ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs. which was fixed and verified in 3.3.2 So, closed