Bug 909935

Summary: engine: live snapshot fails with nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Maor <mlipchuk>
Status: CLOSED DUPLICATE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.3CC: abaron, acathrow, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, scohen, yeylon, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0Flags: amureini: Triaged+
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-04 07:15:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 902824    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-02-11 13:34:26 UTC
Created attachment 696085 [details]
logs

Description of problem:

I ran multiple live snapshot (created several vms -> create live snapshots on each vm) 
one of the snapshots failed with nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction

Version-Release number of selected component (if applicable):

si27

How reproducible:

Steps to Reproduce:
1. create vms 10 from template using pool 
2. run the vms
3. create a live snapshots on each vm
  
Actual results:

snapshot failed with nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction

Expected results:

we should not fail 

Additional info: logs

Comment 2 Haim 2013-03-11 13:23:37 UTC
adding regression, scenario used to work in 3.1.0

Comment 3 Ayal Baron 2013-04-03 07:34:08 UTC
IIuc the failed live snapshot took more than 5 minutes.  What I'd like to understand is why it takes so long.

Comment 4 Maor 2013-04-03 16:45:52 UTC
> IIuc the failed live snapshot took more than 5 minutes.  What I'd like to
> understand is why it takes so long.
I didn't see in the logs that it took more then 5 minutes, 
From what I have seen engine got an exception after we rolled back, and tried to call rollbackQuota. rollbackQuota tried to fetch the storage pool, but since there was a rollback there is no transaction to use in the DB.
IMO the fix for BZ909937 should avoid getting to that rollback issue, but still, there should be a proper fix for the rollbackQuota after compensate.

Comment 5 Maor 2013-04-03 16:53:17 UTC
I suspect that the issue here is that at the end command phase the storage pool is not initialized, there for when we rollback, the rollbackQuota tries first to get it from the memory and if it is not there, it will try to fetch it from the DB.
This could be at each command which will rollback at the end command phase.
Ofri can u confirm this is the case here?

Comment 6 Maor 2013-04-04 07:15:45 UTC

*** This bug has been marked as a duplicate of bug 885460 ***

Comment 7 Maor 2013-04-04 13:47:45 UTC
Logs of both bugs show the same issue:

rollbackQuota(ImportVmCommand.java:1005) [engine-bll.jar:]
....
javax.resource.ResourceException: IJ000459: Transaction is not active: tx=TransactionImple < ac, BasicAction: 0:ffff7f000001:2a04c229:50be1126:5aeb9 status: ActionStatus.ABORT_ONLY >
According comment4 of BZ885460

The problem in this bug is that since there was a dead lock (described in BZ909937), the transaction was aborted, and compensation flow was executed.
As part of the compensation flow, engine tried to execute rollbackQuota and fetch the Storage Pool to revert the quota assignment.
Since the transaction was aborted, the fetch of the storage pool could not be achieved and there for we got an exception the same as BZ885460