Bug 1047163

Summary: [engine-backend] deadlock in postgres during multiple AddVmFromTemplate threads
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.3.0CC: acanan, acathrow, amureini, ebenahar, eedri, iheim, lpeer, michal.skrivanek, ofrenkel, pstehlik, Rhev-m-bugs, sherold, yeylon
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: is31 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1056111    
Attachments:
Description Flags
engine sos report and vdsm.log none

Description Elad 2013-12-29 16:49:20 UTC
Created attachment 843118 [details]
engine sos report and vdsm.log

Description of problem:
I tried to create multiple VMs (15) from a template (cloned VMs from a template).
Engine failed to complete the creation of some of the images with postgres deadlock error for AddVmFromTemplate. 
After the failure, the VMs disks remained 'locked'.

Version-Release number of selected component (if applicable):
is29
rhevm-3.3.0-0.42.el6ev.noarch
postgresql-8.4.13-1.el6_3.x86_64

How reproducible:
Not sure

Steps to Reproduce:
1. create a template from VM with 2 disks
2. create 15 VMs from the template (manually from UI or using a script on REST or SDK) as cloned provisioning


Actual results:

Engine failed in AddVmFromTemplate because of a deadlock in the database. Apparently, 2 processes are trying to perform the same operation in the same table in DB. 

The error as presented in engine.log:
 
2013-12-29 17:28:36,489 ERROR [org.ovirt.engine.core.bll.CommandAsyncTask] (pool-5-thread-46) [within thread]: EndAction for action type AddVmFromTemplate threw an exception.: javax.ejb.EJBTransactionRolledbackExc
eption: CallableStatementCallback; SQL [{call updateimagestatus(?, ?)}]; ERROR: deadlock detected
  Detail: Process 8282 waits for ShareLock on transaction 1767434; blocked by process 7678.
Process 7678 waits for ShareLock on transaction 1767433; blocked by process 8282.
  Hint: See server log for query details.
  Where: SQL statement "UPDATE images SET imageStatus =  $1  WHERE image_guid =  $2 "
PL/pgSQL function "updateimagestatus" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 8282 waits for ShareLock on transaction 1767434; blocked by process 7678.
Process 7678 waits for ShareLock on transaction 1767433; blocked by process 8282.
  Hint: See server log for query details.
  Where: SQL statement "UPDATE images SET imageStatus =  $1  WHERE image_guid =  $2 "
PL/pgSQL function "updateimagestatus" line 2 at SQL statement


After that, the disks of those VM remains in 'locked' status


Additional info:
engine sos report and vdsm.log

Comment 1 Elad 2013-12-29 17:31:46 UTC
This bug can block this PRD:
https://bugzilla.redhat.com/show_bug.cgi?id=815642

Therefore, added the ? flag of rhevm-3.3.0

Comment 2 Elad 2013-12-29 17:33:13 UTC
Also the VM remain 'locked'

Comment 3 Elad 2013-12-30 07:31:44 UTC
Also the VMs remain 'locked'

Comment 4 Itamar Heim 2013-12-30 07:59:20 UTC
is this a regression from 3.2 for same use case/test?

Comment 5 Elad 2013-12-30 08:09:30 UTC
(In reply to Itamar Heim from comment #4)
> is this a regression from 3.2 for same use case/test?

Multiple creation of VMs from template wasn't possible before 3.3 since the read-lock for the template images. This is a new feature that was introduced in 3.3:
https://bugzilla.redhat.com/show_bug.cgi?id=815642

Comment 6 Itamar Heim 2013-12-30 08:11:02 UTC
ok, so how was 815642 verified? does it work with 2 concurrent VMs? 6? 10?

Comment 7 Elad 2013-12-30 08:40:20 UTC
(In reply to Itamar Heim from comment #6)
> ok, so how was 815642 verified? does it work with 2 concurrent VMs? 6? 10?

My test was with 15 VMs with 2 disks each. It failed after a few VMs, I'm not sure exactly how many, it's hard to trace that in the log.

Comment 8 Arik 2013-12-31 14:23:08 UTC
I couldn't reproduce this bug on master. it turned out that it was fixed by http://gerrit.ovirt.org/#/c/21100.

I'll backport this simple patch to 3.3.

Comment 12 Aharon Canan 2014-01-09 15:58:18 UTC
reproduced using is30 

from engine.log
--------------------
2014-01-09 17:17:06,260 ERROR [org.ovirt.engine.core.bll.CommandAsyncTask] (pool-5-thread-42) [within thread]: EndAction for action type AddVmFromTemplate threw an exception.: javax.ejb.EJBTransactionRolledbackException: CallableStatementCallback; SQL [{call updateimagestatus(?, ?)}]; ERROR: deadlock detected
PL/pgSQL function "updateimagestatus" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
Caused by: org.springframework.dao.DeadlockLoserDataAccessException: CallableStatementCallback; SQL [{call updateimagestatus(?, ?)}]; ERROR: deadlock detected
PL/pgSQL function "updateimagestatus" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected




verified using is31
no deadlock in logs
VMs created much faster than with is30

Comment 13 Itamar Heim 2014-01-21 22:32:43 UTC
Closing - RHEV 3.3 Released

Comment 14 Itamar Heim 2014-01-21 22:32:47 UTC
Closing - RHEV 3.3 Released