Bug 1305524 - [cinder] Restarting engine during create template with cinder disks sometimes causes a creation of a cinder disk which is not reflected through the engine
Summary: [cinder] Restarting engine during create template with cinder disks sometimes...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Maor
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-08 13:52 UTC by Natalie Gavrielov
Modified: 2022-06-27 08:05 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-02 08:21:17 UTC
oVirt Team: Storage
Embargoed:
sbonazzo: ovirt-4.3-


Attachments (Terms of Use)
engine.log, cinder logs (1.07 MB, application/x-gzip)
2016-02-08 13:52 UTC, Natalie Gavrielov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1705891 0 None None None Never
Red Hat Issue Tracker RHV-46630 0 None None None 2022-06-27 08:05:38 UTC
oVirt gerrit 72636 0 master ABANDONED core: Cinder add compensation for new disk. 2017-07-23 09:34:35 UTC

Description Natalie Gavrielov 2016-02-08 13:52:18 UTC
Created attachment 1122178 [details]
engine.log, cinder logs

Description of problem:

Following the verification of issue 1287771:
When performing an engine restart during template creation of a VM with cinder/ceph disks, operation sometimes fail with NPE and a cinder disk is created (disk does not appear anywhere in engine).

Version-Release number of selected component:
rhevm-3.6.3-0.1.el6.noarch

How reproducible:
Depends on WHEN performing the engine restart.

Steps to Reproduce:
1. Create a VM, with a few cinder-ceph disks, and install an OS.
2. Make a template out of the VM.
3. Meanwhile restart the engine.

Actual results:

A few errors:
2016-02-07 16:47:41,191 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Error cloning Cinder disk 'cinder-vm_Disk2': javax.ejb.EJBException: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (pool-5-thread-6) [] Error in invocating CTOR of command 'CloneSingleCinderDisk': null
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Command 'org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand' failed: null

NPE:
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Exception: java.lang.NullPointerException


From Cinder (cat volume.log):

2016-02-07 16:47:41.753 3165 DEBUG cinder.volume.manager [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeSpecTask;volume:create' (ec65a87c-e291-4f67-8d5b-5185701eb959) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'status': u'creating', 'volume_id': u'9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_size': 2, 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'type': 'source_vol', 'source_volstatus': u'available'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:178
2016-02-07 16:47:41.979 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume 9477ed71-72d7-4c19-a8e5-f0e2b3f1622a: being created as source_vol with specification: {'status': u'creating', 'volume_size': 2, 'source_volstatus': u'available', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a'}
2016-02-07 16:47:42.504 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] creating snapshot='volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a.clone_snap' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:496
2016-02-07 16:47:42.680 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] cloning 'volume-536a482f-d936-4c7a-8b3d-fdbcc1c20798' to 'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:509
2016-02-07 16:47:43.345 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a (9477ed71-72d7-4c19-a8e5-f0e2b3f1622a): created successfully

A cinder disk was created, but nowhere to be found on the engine side.

Expected results:

* To rollback gracefully (either prevent the creation of this disk or delete it somehow).
* Prevent NPE.

Comment 1 Maor 2016-02-08 15:24:45 UTC
The NPE is not a relevant issue since the engine was stopped and we might see different type of exceptions of SQL Exception and NPE.
The real issue is if there is a Cinder disk that is not deleted after the reboot from Cinder.

Comment 2 Maor 2016-05-03 09:36:15 UTC
The engine already supports revert tasks of Cinder operations, though there is a corner case when the engine gets restarted right after there is a call to a Cinder operation and before the CoCo task gets registered in the DB.
To fix that we should add a header into the CoCo request so after a reboot the engine will be able to revert the operation whether it was done in Cinder or not.

Based that on 3.6 Cinder is in tech-preview and the CoCo infrastructure was changed drastically in 4.0, and also that this issue is a corner case, it will be better to focus the efforts only on 4.0, therefore, changing the target milestone to 4.0

Comment 3 Yaniv Lavi 2016-05-23 13:18:18 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 4 Yaniv Lavi 2016-05-23 13:22:14 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 5 Allon Mureinik 2017-07-16 09:17:51 UTC
Maor, this patch has been acked for over a month.
Can you please verify and merge it?

Comment 6 Maor 2017-07-16 10:23:03 UTC
(In reply to Allon Mureinik from comment #5)
> Maor, this patch has been acked for over a month.
> Can you please verify and merge it?

I had some problems with the verification, I will need to re-visit the code, I will do it this week

Comment 7 Maor 2017-07-23 09:34:11 UTC
(In reply to Maor from comment #6)
> (In reply to Allon Mureinik from comment #5)
> > Maor, this patch has been acked for over a month.
> > Can you please verify and merge it?
> 
> I had some problems with the verification, I will need to re-visit the code,
> I will do it this week

I've encountered a problem with the solution proposed in the patch.
To achieve a full compensation for Cinder volume I would need to create the Cinder volume with a predefine id as being done with VDSM for image volume, I tried to use woorea API for that by passing imageRef, but it seems that there is no option to initialize a volume with a pre-define id.
Therefore I remove the patch and move the bug back to new.
I will try to open an issue on Cinder to add the ability to create a volume with a predefine id.

Comment 8 Tal Nisan 2018-07-02 08:21:17 UTC
In 4.3 we are changing the way Engine works with Cinder so this bug is irrelevant anymore.


Note You need to log in before you can comment on or make changes to this bug.