Bug 1047163 - [engine-backend] deadlock in postgres during multiple AddVmFromTemplate threads
Summary: [engine-backend] deadlock in postgres during multiple AddVmFromTemplate threads
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: x86_64
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.3.0
Assignee: Arik
QA Contact: meital avital
URL:
Whiteboard: virt
Depends On:
Blocks: rhev3.3ga
TreeView+ depends on / blocked
 
Reported: 2013-12-29 16:49 UTC by Elad
Modified: 2014-09-28 06:38 UTC (History)
13 users (show)

Fixed In Version: is31
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine sos report and vdsm.log (9.67 MB, application/x-gzip)
2013-12-29 16:49 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 22844 0 None None None Never

Description Elad 2013-12-29 16:49:20 UTC
Created attachment 843118 [details]
engine sos report and vdsm.log

Description of problem:
I tried to create multiple VMs (15) from a template (cloned VMs from a template).
Engine failed to complete the creation of some of the images with postgres deadlock error for AddVmFromTemplate. 
After the failure, the VMs disks remained 'locked'.

Version-Release number of selected component (if applicable):
is29
rhevm-3.3.0-0.42.el6ev.noarch
postgresql-8.4.13-1.el6_3.x86_64

How reproducible:
Not sure

Steps to Reproduce:
1. create a template from VM with 2 disks
2. create 15 VMs from the template (manually from UI or using a script on REST or SDK) as cloned provisioning


Actual results:

Engine failed in AddVmFromTemplate because of a deadlock in the database. Apparently, 2 processes are trying to perform the same operation in the same table in DB. 

The error as presented in engine.log:
 
2013-12-29 17:28:36,489 ERROR [org.ovirt.engine.core.bll.CommandAsyncTask] (pool-5-thread-46) [within thread]: EndAction for action type AddVmFromTemplate threw an exception.: javax.ejb.EJBTransactionRolledbackExc
eption: CallableStatementCallback; SQL [{call updateimagestatus(?, ?)}]; ERROR: deadlock detected
  Detail: Process 8282 waits for ShareLock on transaction 1767434; blocked by process 7678.
Process 7678 waits for ShareLock on transaction 1767433; blocked by process 8282.
  Hint: See server log for query details.
  Where: SQL statement "UPDATE images SET imageStatus =  $1  WHERE image_guid =  $2 "
PL/pgSQL function "updateimagestatus" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 8282 waits for ShareLock on transaction 1767434; blocked by process 7678.
Process 7678 waits for ShareLock on transaction 1767433; blocked by process 8282.
  Hint: See server log for query details.
  Where: SQL statement "UPDATE images SET imageStatus =  $1  WHERE image_guid =  $2 "
PL/pgSQL function "updateimagestatus" line 2 at SQL statement


After that, the disks of those VM remains in 'locked' status


Additional info:
engine sos report and vdsm.log

Comment 1 Elad 2013-12-29 17:31:46 UTC
This bug can block this PRD:
https://bugzilla.redhat.com/show_bug.cgi?id=815642

Therefore, added the ? flag of rhevm-3.3.0

Comment 2 Elad 2013-12-29 17:33:13 UTC
Also the VM remain 'locked'

Comment 3 Elad 2013-12-30 07:31:44 UTC
Also the VMs remain 'locked'

Comment 4 Itamar Heim 2013-12-30 07:59:20 UTC
is this a regression from 3.2 for same use case/test?

Comment 5 Elad 2013-12-30 08:09:30 UTC
(In reply to Itamar Heim from comment #4)
> is this a regression from 3.2 for same use case/test?

Multiple creation of VMs from template wasn't possible before 3.3 since the read-lock for the template images. This is a new feature that was introduced in 3.3:
https://bugzilla.redhat.com/show_bug.cgi?id=815642

Comment 6 Itamar Heim 2013-12-30 08:11:02 UTC
ok, so how was 815642 verified? does it work with 2 concurrent VMs? 6? 10?

Comment 7 Elad 2013-12-30 08:40:20 UTC
(In reply to Itamar Heim from comment #6)
> ok, so how was 815642 verified? does it work with 2 concurrent VMs? 6? 10?

My test was with 15 VMs with 2 disks each. It failed after a few VMs, I'm not sure exactly how many, it's hard to trace that in the log.

Comment 8 Arik 2013-12-31 14:23:08 UTC
I couldn't reproduce this bug on master. it turned out that it was fixed by http://gerrit.ovirt.org/#/c/21100.

I'll backport this simple patch to 3.3.

Comment 12 Aharon Canan 2014-01-09 15:58:18 UTC
reproduced using is30 

from engine.log
--------------------
2014-01-09 17:17:06,260 ERROR [org.ovirt.engine.core.bll.CommandAsyncTask] (pool-5-thread-42) [within thread]: EndAction for action type AddVmFromTemplate threw an exception.: javax.ejb.EJBTransactionRolledbackException: CallableStatementCallback; SQL [{call updateimagestatus(?, ?)}]; ERROR: deadlock detected
PL/pgSQL function "updateimagestatus" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
Caused by: org.springframework.dao.DeadlockLoserDataAccessException: CallableStatementCallback; SQL [{call updateimagestatus(?, ?)}]; ERROR: deadlock detected
PL/pgSQL function "updateimagestatus" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected




verified using is31
no deadlock in logs
VMs created much faster than with is30

Comment 13 Itamar Heim 2014-01-21 22:32:43 UTC
Closing - RHEV 3.3 Released

Comment 14 Itamar Heim 2014-01-21 22:32:47 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.