Bug 891316

Summary: ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs.
Product: Red Hat Enterprise Virtualization Manager Reporter: Omri Hochman <ohochman>
Component: ovirt-engineAssignee: Omer Frenkel <ofrenkel>
Status: CLOSED NOTABUG QA Contact: Yuri Obshansky <yobshans>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.2.0CC: acathrow, bazulay, eedri, iheim, jkt, jturner, lpeer, michal.skrivanek, ofrenkel, ohochman, pstehlik, rgolan, Rhev-m-bugs, srevivo, yeylon
Target Milestone: ---Keywords: Regression, ZStream
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: is1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1060692 (view as bug list) Environment:
Last Closed: 2014-05-13 08:58:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1060692, 1078909, 1142926    
Attachments:
Description Flags
engine.log
none
console_log none

Description Omri Hochman 2013-01-02 15:09:25 UTC
ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs. 

This Issue was found during investigation of Bug 891270.

Description:
************ 
I Started 10000 VMs with 256MB Memory (100 by 100) on 13 hosts.   

Environment:
*************
rhevm3.2 (build sf2.1)
rhevm-3.2.0-2.el6ev.noarch 
rhevm-backend-3.2.0-2.el6ev.noarch
vdsm-cli-4.10.2-2.0.el6.noarch  (on hosts)
vdsm-4.10.2-2.0.el6.x86_64 (on hosts) 

Results:
********
Found one Java-level deadlock: <See Console.log>
=============================
"pool-3-thread-33":
  waiting to lock monitor 0x00007ff678002ee8 (object 0x00000000c398be80, a java.lang.Object),
  which is held by "QuartzScheduler_Worker-99"
"QuartzScheduler_Worker-99":
  waiting for ownable synchronizer 0x00000000c398c800, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "pool-3-thread-30"
"pool-3-thread-30":
  waiting to lock monitor 0x00007ff678002ee8 (object 0x00000000c398be80, a java.lang.Object),
  which is held by "QuartzScheduler_Worker-99"

Comment 1 Omri Hochman 2013-01-02 15:10:54 UTC
Created attachment 671471 [details]
engine.log

Comment 2 Omri Hochman 2013-01-02 15:11:51 UTC
Created attachment 671472 [details]
console_log

Comment 3 Omri Hochman 2013-01-02 19:24:30 UTC
Fixed Description:
***************** 
I Started *1000* VMs with 256MB Memory (100 by 100) on 13 hosts.

Comment 4 Roy Golan 2013-01-02 21:01:19 UTC
the deadlock is caused by the refresh thread holding the VdsManager lock waiting on decreasedPending lock and RunVm thread performing rerun() and holding the decreasedPending lock waiting to perform UpdateVdsDynamicData ( a VDS command which acquires the VdsManager lock)

I see 2 main ways to solve this:
1. get rid of the decreasedPending lock and make it AtomicInteger to ensure atomicity and visibility without blocking
2. fix the order of lock acquisition in decreasedPending() method - first get the VdsManager lock and then perform decreasPending and call

Comment 7 mkublin 2013-04-07 16:21:31 UTC
During doing some work on phantom vds status, the deadlock also will be solved a patch is added to bug

Comment 14 Shai Revivo 2013-12-30 09:11:40 UTC
Currently we do not have the resources (Lab) to test it.
will have to push it forward to 3.4

Comment 15 Shai Revivo 2014-01-15 15:11:22 UTC
QE cannot verify this bug in 3.3, will verify in 3.4

Comment 19 Yuri Obshansky 2014-05-13 08:58:00 UTC
The bug is identical to bug *Bug 1060692* <https://bugzilla.redhat.com/show_bug.cgi?id=1060692> -ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs.
which was fixed and verified in 3.3.2
So, closed