Bug 1305524 - [cinder] Restarting engine during create template with cinder disks sometimes causes a creation of a cinder disk which is not reflected through the engine
[cinder] Restarting engine during create template with cinder disks sometimes...
Status: NEW
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.3
Unspecified Unspecified
unspecified Severity medium (vote)
: ovirt-4.3.0
: ---
Assigned To: Maor
Raz Tamir
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-08 08:52 EST by Natalie Gavrielov
Modified: 2017-09-28 05:32 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.3+
rule-engine: exception+


Attachments (Terms of Use)
engine.log, cinder logs (1.07 MB, application/x-gzip)
2016-02-08 08:52 EST, Natalie Gavrielov
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 72636 master ABANDONED core: Cinder add compensation for new disk. 2017-07-23 05:34 EDT

  None (edit)
Description Natalie Gavrielov 2016-02-08 08:52:18 EST
Created attachment 1122178 [details]
engine.log, cinder logs

Description of problem:

Following the verification of issue 1287771:
When performing an engine restart during template creation of a VM with cinder/ceph disks, operation sometimes fail with NPE and a cinder disk is created (disk does not appear anywhere in engine).

Version-Release number of selected component:
rhevm-3.6.3-0.1.el6.noarch

How reproducible:
Depends on WHEN performing the engine restart.

Steps to Reproduce:
1. Create a VM, with a few cinder-ceph disks, and install an OS.
2. Make a template out of the VM.
3. Meanwhile restart the engine.

Actual results:

A few errors:
2016-02-07 16:47:41,191 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Error cloning Cinder disk 'cinder-vm_Disk2': javax.ejb.EJBException: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (pool-5-thread-6) [] Error in invocating CTOR of command 'CloneSingleCinderDisk': null
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Command 'org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand' failed: null

NPE:
2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Exception: java.lang.NullPointerException


From Cinder (cat volume.log):

2016-02-07 16:47:41.753 3165 DEBUG cinder.volume.manager [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeSpecTask;volume:create' (ec65a87c-e291-4f67-8d5b-5185701eb959) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'status': u'creating', 'volume_id': u'9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_size': 2, 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'type': 'source_vol', 'source_volstatus': u'available'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:178
2016-02-07 16:47:41.979 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume 9477ed71-72d7-4c19-a8e5-f0e2b3f1622a: being created as source_vol with specification: {'status': u'creating', 'volume_size': 2, 'source_volstatus': u'available', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a'}
2016-02-07 16:47:42.504 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] creating snapshot='volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a.clone_snap' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:496
2016-02-07 16:47:42.680 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] cloning 'volume-536a482f-d936-4c7a-8b3d-fdbcc1c20798@volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a.clone_snap' to 'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:509
2016-02-07 16:47:43.345 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a (9477ed71-72d7-4c19-a8e5-f0e2b3f1622a): created successfully

A cinder disk was created, but nowhere to be found on the engine side.

Expected results:

* To rollback gracefully (either prevent the creation of this disk or delete it somehow).
* Prevent NPE.
Comment 1 Maor 2016-02-08 10:24:45 EST
The NPE is not a relevant issue since the engine was stopped and we might see different type of exceptions of SQL Exception and NPE.
The real issue is if there is a Cinder disk that is not deleted after the reboot from Cinder.
Comment 2 Maor 2016-05-03 05:36:15 EDT
The engine already supports revert tasks of Cinder operations, though there is a corner case when the engine gets restarted right after there is a call to a Cinder operation and before the CoCo task gets registered in the DB.
To fix that we should add a header into the CoCo request so after a reboot the engine will be able to revert the operation whether it was done in Cinder or not.

Based that on 3.6 Cinder is in tech-preview and the CoCo infrastructure was changed drastically in 4.0, and also that this issue is a corner case, it will be better to focus the efforts only on 4.0, therefore, changing the target milestone to 4.0
Comment 3 Yaniv Lavi 2016-05-23 09:18:18 EDT
oVirt 4.0 beta has been released, moving to RC milestone.
Comment 4 Yaniv Lavi 2016-05-23 09:22:14 EDT
oVirt 4.0 beta has been released, moving to RC milestone.
Comment 5 Allon Mureinik 2017-07-16 05:17:51 EDT
Maor, this patch has been acked for over a month.
Can you please verify and merge it?
Comment 6 Maor 2017-07-16 06:23:03 EDT
(In reply to Allon Mureinik from comment #5)
> Maor, this patch has been acked for over a month.
> Can you please verify and merge it?

I had some problems with the verification, I will need to re-visit the code, I will do it this week
Comment 7 Maor 2017-07-23 05:34:11 EDT
(In reply to Maor from comment #6)
> (In reply to Allon Mureinik from comment #5)
> > Maor, this patch has been acked for over a month.
> > Can you please verify and merge it?
> 
> I had some problems with the verification, I will need to re-visit the code,
> I will do it this week

I've encountered a problem with the solution proposed in the patch.
To achieve a full compensation for Cinder volume I would need to create the Cinder volume with a predefine id as being done with VDSM for image volume, I tried to use woorea API for that by passing imageRef, but it seems that there is no option to initialize a volume with a pre-define id.
Therefore I remove the patch and move the bug back to new.
I will try to open an issue on Cinder to add the ability to create a volume with a predefine id.

Note You need to log in before you can comment on or make changes to this bug.