Bug 1102689

Summary: Deadlock detected when performing plug/unplug VNIC action
Product: Red Hat Enterprise Virtualization Manager Reporter: GenadiC <gcheresh>
Component: ovirt-engineAssignee: Barak <bazulay>
Status: CLOSED CURRENTRELEASE QA Contact: GenadiC <gcheresh>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: bazulay, emesika, gcheresh, gklein, iheim, lpeer, masayag, michal.skrivanek, oourfali, pstehlik, rbalakri, rgolan, Rhev-m-bugs, sherold, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: ovirt-engine-3.5.0_beta Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 17:09:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1142923, 1156165    
Attachments:
Description Flags
engine log none

Description GenadiC 2014-05-29 12:32:59 UTC
Created attachment 900336 [details]
engine log

Description of problem:
When a vm network interface is being plugged/unplugged, sometimes
a deadlock appears in the log.
It seems that the vms monitor thread (VdsUpdateRuntimInfo) updates
the same vm_devices table which is being updated by the same hotplug
action.

It seems that the same result might occur when plug/unplug vm disks.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Plug/unplug NIC on VM
2.
3.

Actual results:
Deadlock detected

Expected results:
No deadlock should appear

Additional info:
Where: SQL statement "UPDATE vm_device SET device =  $1 , type =  $2 , address =  $3 , boot_order =  $4 , spec_params =  $5 , is_managed =  $6 , is_plugged =  $7 , is_readonly =  $8 , alias =  $9 , custom_properties =  $10 , snapshot_id =  $11 , _update_date = current_timestamp WHERE device_id =  $12  and vm_id =  $13 "
PL/pgSQL function "updatevmdevice" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 13504 waits for ShareLock on transaction 6857612; blocked by process 13506.
Process 13506 waits for ShareLock on transaction 6857622; blocked by process 13504.
  Hint: See server log for query details.
  Where: SQL statement "UPDATE vm_device SET device =  $1 , type =  $2 , address =  $3 , boot_order =  $4 , spec_params =  $5 , is_managed =  $6 , is_plugged =  $7 , is_readonly =  $8 , alias =  $9 , custom_properties =  $10 , snapshot_id =  $11 , _update_date = current_timestamp WHERE device_id =  $12  and vm_id =  $13 "
PL/pgSQL function "updatevmdevice" line 2 at SQL statement
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:265) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:349) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeStoredProcAsBatch(SimpleJdbcCallsHandler.java:52) [dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeStoredProcAsBatch(SimpleJdbcCallsHandler.java:70) [dal.jar:]
        at org.ovirt.engine.core.dao.MassOperationsGenericDaoDbFacade.updateAllInBatch(MassOperationsGenericDaoDbFacade.java:52) [dal.jar:]
        at org.ovirt.engine.core.dao.MassOperationsGenericDaoDbFacade.updateAllInBatch(MassOperationsGenericDaoDbFacade.java:87) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.saveVmDevicesToDb(VdsUpdateRunTimeInfo.java:202) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.saveDataToDb(VdsUpdateRunTimeInfo.java:173) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:359) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:231) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) [:1.7.0_51]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 13504 waits for ShareLock on transaction 6857612; blocked by process 13506.
Process 13506 waits for ShareLock on transaction 6857622; blocked by process 13504.

Comment 1 Barak 2014-06-01 12:35:02 UTC
Roy - do you think this a generic issue of hot plug/unplug scenario ?
Or is it a network specific issue ?

Comment 2 Barak 2014-06-01 12:36:28 UTC
Genadi - why is the regression keyword was set ?
How can you tell this potential "deadlock" did not exist in previous version ?

Comment 3 GenadiC 2014-06-01 12:43:27 UTC
Indeed not a regression, sorry for misunderstanding

Comment 4 Michal Skrivanek 2014-06-02 10:17:05 UTC
or the same issue as in 1097256 ?

Comment 5 Roy Golan 2014-06-10 07:27:24 UTC
(In reply to Michal Skrivanek from comment #4)
> or the same issue as in 1097256 ?

no its a db deadlock. the hotplug command changing row a and b in devices table
while monitoring is batch updateing b and a. 

looking further to sort this out. 


Genadi pls confirm you tried hotplug on more than 1 device

Comment 6 Roy Golan 2014-06-10 07:54:10 UTC
actually its the boot order update so this can colide on a single device uupdate as it triggers a change accross all device of a VM

Comment 7 Roy Golan 2014-06-10 07:59:58 UTC
Eli, 

ActivateDeactivateVmNic command is updating all devices under the same TX

VmDeviceUtils.updateBootOrderInVmDeviceAndStoreToDB(getVm().getStaticData())


so sorting the batch update is only one side.

a quick solution would be to sort the collection before itterating and updateing the boot order

more thoughrow solution is to find all places that update some table with itteration and make sure it uses collections and make the DAO sort them ahead.

Comment 8 GenadiC 2014-06-10 09:04:01 UTC
From what I recall we tried it on several devices.

Comment 9 Eli Mesika 2014-06-15 12:53:53 UTC
(In reply to Roy Golan from comment #7)
> Eli, 
> 
> ActivateDeactivateVmNic command is updating all devices under the same TX
> 
> VmDeviceUtils.updateBootOrderInVmDeviceAndStoreToDB(getVm().getStaticData())
> 
> 
> so sorting the batch update is only one side.
> 
> a quick solution would be to sort the collection before itterating and
> updateing the boot order
> 
> more thoughrow solution is to find all places that update some table with
> itteration and make sure it uses collections and make the DAO sort them
> ahead.

Agree
What info is needed here

Comment 10 Oved Ourfali 2014-06-15 15:13:14 UTC
I think no further input is required. 
Liran - please handle this one.

Comment 11 GenadiC 2014-07-03 07:23:01 UTC
Verified in  3.5.0-0.0.master.20140629172257.git0b16ed7.el6
Couldn't reproduce the problem

Comment 13 Eyal Edri 2015-02-17 17:09:51 UTC
rhev 3.5.0 was released. closing.