Bug 1305524

Summary: [cinder] Restarting engine during create template with cinder disks sometimes causes a creation of a cinder disk which is not reflected through the engine
Product: [oVirt] ovirt-engine Reporter: Natalie Gavrielov <ngavrilo>
Component: BLL.StorageAssignee: Maor <mlipchuk>
Status: CLOSED WONTFIX QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.3CC: bugs, ebenahar, mlipchuk, tnisan, ylavi
Target Milestone: ---Flags: sbonazzo: ovirt-4.3-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-02 08:21:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log, cinder logs none

Description Natalie Gavrielov 2016-02-08 13:52:18 UTC
Created attachment 1122178 [details]
engine.log, cinder logs

Description of problem:

Following the verification of issue 1287771:
When performing an engine restart during template creation of a VM with cinder/ceph disks, operation sometimes fail with NPE and a cinder disk is created (disk does not appear anywhere in engine).

Version-Release number of selected component:
rhevm-3.6.3-0.1.el6.noarch

How reproducible:
Depends on WHEN performing the engine restart.

Steps to Reproduce:
1. Create a VM, with a few cinder-ceph disks, and install an OS.
2. Make a template out of the VM.
3. Meanwhile restart the engine.

Actual results:

A few errors:
2016-02-07 16:47:41,191 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Error cloning Cinder disk 'cinder-vm_Disk2': javax.ejb.EJBException: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (pool-5-thread-6) [] Error in invocating CTOR of command 'CloneSingleCinderDisk': null
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Command 'org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand' failed: null

NPE:
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Exception: java.lang.NullPointerException


From Cinder (cat volume.log):

2016-02-07 16:47:41.753 3165 DEBUG cinder.volume.manager [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeSpecTask;volume:create' (ec65a87c-e291-4f67-8d5b-5185701eb959) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'status': u'creating', 'volume_id': u'9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_size': 2, 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'type': 'source_vol', 'source_volstatus': u'available'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:178
2016-02-07 16:47:41.979 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume 9477ed71-72d7-4c19-a8e5-f0e2b3f1622a: being created as source_vol with specification: {'status': u'creating', 'volume_size': 2, 'source_volstatus': u'available', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a'}
2016-02-07 16:47:42.504 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] creating snapshot='volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a.clone_snap' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:496
2016-02-07 16:47:42.680 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] cloning 'volume-536a482f-d936-4c7a-8b3d-fdbcc1c20798' to 'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:509
2016-02-07 16:47:43.345 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a (9477ed71-72d7-4c19-a8e5-f0e2b3f1622a): created successfully

A cinder disk was created, but nowhere to be found on the engine side.

Expected results:

* To rollback gracefully (either prevent the creation of this disk or delete it somehow).
* Prevent NPE.

Comment 1 Maor 2016-02-08 15:24:45 UTC
The NPE is not a relevant issue since the engine was stopped and we might see different type of exceptions of SQL Exception and NPE.
The real issue is if there is a Cinder disk that is not deleted after the reboot from Cinder.

Comment 2 Maor 2016-05-03 09:36:15 UTC
The engine already supports revert tasks of Cinder operations, though there is a corner case when the engine gets restarted right after there is a call to a Cinder operation and before the CoCo task gets registered in the DB.
To fix that we should add a header into the CoCo request so after a reboot the engine will be able to revert the operation whether it was done in Cinder or not.

Based that on 3.6 Cinder is in tech-preview and the CoCo infrastructure was changed drastically in 4.0, and also that this issue is a corner case, it will be better to focus the efforts only on 4.0, therefore, changing the target milestone to 4.0

Comment 3 Yaniv Lavi 2016-05-23 13:18:18 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 4 Yaniv Lavi 2016-05-23 13:22:14 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 5 Allon Mureinik 2017-07-16 09:17:51 UTC
Maor, this patch has been acked for over a month.
Can you please verify and merge it?

Comment 6 Maor 2017-07-16 10:23:03 UTC
(In reply to Allon Mureinik from comment #5)
> Maor, this patch has been acked for over a month.
> Can you please verify and merge it?

I had some problems with the verification, I will need to re-visit the code, I will do it this week

Comment 7 Maor 2017-07-23 09:34:11 UTC
(In reply to Maor from comment #6)
> (In reply to Allon Mureinik from comment #5)
> > Maor, this patch has been acked for over a month.
> > Can you please verify and merge it?
> 
> I had some problems with the verification, I will need to re-visit the code,
> I will do it this week

I've encountered a problem with the solution proposed in the patch.
To achieve a full compensation for Cinder volume I would need to create the Cinder volume with a predefine id as being done with VDSM for image volume, I tried to use woorea API for that by passing imageRef, but it seems that there is no option to initialize a volume with a pre-define id.
Therefore I remove the patch and move the bug back to new.
I will try to open an issue on Cinder to add the ability to create a volume with a predefine id.

Comment 8 Tal Nisan 2018-07-02 08:21:17 UTC
In 4.3 we are changing the way Engine works with Cinder so this bug is irrelevant anymore.