Bug 1102689 - Deadlock detected when performing plug/unplug VNIC action
Summary: Deadlock detected when performing plug/unplug VNIC action
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Barak
QA Contact: GenadiC
URL:
Whiteboard: infra
Depends On:
Blocks: rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-05-29 12:32 UTC by GenadiC
Modified: 2016-02-10 19:40 UTC (History)
15 users (show)

Fixed In Version: ovirt-engine-3.5.0_beta
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-17 17:09:51 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine log (2.15 MB, text/x-log)
2014-05-29 12:32 UTC, GenadiC
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 28830 0 master MERGED core: Deadlock detected when performing plug/unplug VNIC action Never

Description GenadiC 2014-05-29 12:32:59 UTC
Created attachment 900336 [details]
engine log

Description of problem:
When a vm network interface is being plugged/unplugged, sometimes
a deadlock appears in the log.
It seems that the vms monitor thread (VdsUpdateRuntimInfo) updates
the same vm_devices table which is being updated by the same hotplug
action.

It seems that the same result might occur when plug/unplug vm disks.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Plug/unplug NIC on VM
2.
3.

Actual results:
Deadlock detected

Expected results:
No deadlock should appear

Additional info:
Where: SQL statement "UPDATE vm_device SET device =  $1 , type =  $2 , address =  $3 , boot_order =  $4 , spec_params =  $5 , is_managed =  $6 , is_plugged =  $7 , is_readonly =  $8 , alias =  $9 , custom_properties =  $10 , snapshot_id =  $11 , _update_date = current_timestamp WHERE device_id =  $12  and vm_id =  $13 "
PL/pgSQL function "updatevmdevice" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 13504 waits for ShareLock on transaction 6857612; blocked by process 13506.
Process 13506 waits for ShareLock on transaction 6857622; blocked by process 13504.
  Hint: See server log for query details.
  Where: SQL statement "UPDATE vm_device SET device =  $1 , type =  $2 , address =  $3 , boot_order =  $4 , spec_params =  $5 , is_managed =  $6 , is_plugged =  $7 , is_readonly =  $8 , alias =  $9 , custom_properties =  $10 , snapshot_id =  $11 , _update_date = current_timestamp WHERE device_id =  $12  and vm_id =  $13 "
PL/pgSQL function "updatevmdevice" line 2 at SQL statement
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:265) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:349) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeStoredProcAsBatch(SimpleJdbcCallsHandler.java:52) [dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeStoredProcAsBatch(SimpleJdbcCallsHandler.java:70) [dal.jar:]
        at org.ovirt.engine.core.dao.MassOperationsGenericDaoDbFacade.updateAllInBatch(MassOperationsGenericDaoDbFacade.java:52) [dal.jar:]
        at org.ovirt.engine.core.dao.MassOperationsGenericDaoDbFacade.updateAllInBatch(MassOperationsGenericDaoDbFacade.java:87) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.saveVmDevicesToDb(VdsUpdateRunTimeInfo.java:202) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.saveDataToDb(VdsUpdateRunTimeInfo.java:173) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:359) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:231) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) [:1.7.0_51]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 13504 waits for ShareLock on transaction 6857612; blocked by process 13506.
Process 13506 waits for ShareLock on transaction 6857622; blocked by process 13504.

Comment 1 Barak 2014-06-01 12:35:02 UTC
Roy - do you think this a generic issue of hot plug/unplug scenario ?
Or is it a network specific issue ?

Comment 2 Barak 2014-06-01 12:36:28 UTC
Genadi - why is the regression keyword was set ?
How can you tell this potential "deadlock" did not exist in previous version ?

Comment 3 GenadiC 2014-06-01 12:43:27 UTC
Indeed not a regression, sorry for misunderstanding

Comment 4 Michal Skrivanek 2014-06-02 10:17:05 UTC
or the same issue as in 1097256 ?

Comment 5 Roy Golan 2014-06-10 07:27:24 UTC
(In reply to Michal Skrivanek from comment #4)
> or the same issue as in 1097256 ?

no its a db deadlock. the hotplug command changing row a and b in devices table
while monitoring is batch updateing b and a. 

looking further to sort this out. 


Genadi pls confirm you tried hotplug on more than 1 device

Comment 6 Roy Golan 2014-06-10 07:54:10 UTC
actually its the boot order update so this can colide on a single device uupdate as it triggers a change accross all device of a VM

Comment 7 Roy Golan 2014-06-10 07:59:58 UTC
Eli, 

ActivateDeactivateVmNic command is updating all devices under the same TX

VmDeviceUtils.updateBootOrderInVmDeviceAndStoreToDB(getVm().getStaticData())


so sorting the batch update is only one side.

a quick solution would be to sort the collection before itterating and updateing the boot order

more thoughrow solution is to find all places that update some table with itteration and make sure it uses collections and make the DAO sort them ahead.

Comment 8 GenadiC 2014-06-10 09:04:01 UTC
From what I recall we tried it on several devices.

Comment 9 Eli Mesika 2014-06-15 12:53:53 UTC
(In reply to Roy Golan from comment #7)
> Eli, 
> 
> ActivateDeactivateVmNic command is updating all devices under the same TX
> 
> VmDeviceUtils.updateBootOrderInVmDeviceAndStoreToDB(getVm().getStaticData())
> 
> 
> so sorting the batch update is only one side.
> 
> a quick solution would be to sort the collection before itterating and
> updateing the boot order
> 
> more thoughrow solution is to find all places that update some table with
> itteration and make sure it uses collections and make the DAO sort them
> ahead.

Agree
What info is needed here

Comment 10 Oved Ourfali 2014-06-15 15:13:14 UTC
I think no further input is required. 
Liran - please handle this one.

Comment 11 GenadiC 2014-07-03 07:23:01 UTC
Verified in  3.5.0-0.0.master.20140629172257.git0b16ed7.el6
Couldn't reproduce the problem

Comment 13 Eyal Edri 2015-02-17 17:09:51 UTC
rhev 3.5.0 was released. closing.


Note You need to log in before you can comment on or make changes to this bug.