Created attachment 1122178 [details] engine.log, cinder logs Description of problem: Following the verification of issue 1287771: When performing an engine restart during template creation of a VM with cinder/ceph disks, operation sometimes fail with NPE and a cinder disk is created (disk does not appear anywhere in engine). Version-Release number of selected component: rhevm-3.6.3-0.1.el6.noarch How reproducible: Depends on WHEN performing the engine restart. Steps to Reproduce: 1. Create a VM, with a few cinder-ceph disks, and install an OS. 2. Make a template out of the VM. 3. Meanwhile restart the engine. Actual results: A few errors: 2016-02-07 16:47:41,191 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Error cloning Cinder disk 'cinder-vm_Disk2': javax.ejb.EJBException: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource 2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (pool-5-thread-6) [] Error in invocating CTOR of command 'CloneSingleCinderDisk': null 2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Command 'org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand' failed: null NPE: 2016-02-07 16:47:41,193 ERROR [org.ovirt.engine.core.bll.storage.CloneCinderDisksCommand] (pool-5-thread-6) [] Exception: java.lang.NullPointerException From Cinder (cat volume.log): 2016-02-07 16:47:41.753 3165 DEBUG cinder.volume.manager [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeSpecTask;volume:create' (ec65a87c-e291-4f67-8d5b-5185701eb959) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'status': u'creating', 'volume_id': u'9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_size': 2, 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a', 'type': 'source_vol', 'source_volstatus': u'available'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:178 2016-02-07 16:47:41.979 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume 9477ed71-72d7-4c19-a8e5-f0e2b3f1622a: being created as source_vol with specification: {'status': u'creating', 'volume_size': 2, 'source_volstatus': u'available', 'source_volid': u'536a482f-d936-4c7a-8b3d-fdbcc1c20798', 'volume_name': u'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a'} 2016-02-07 16:47:42.504 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] creating snapshot='volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a.clone_snap' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:496 2016-02-07 16:47:42.680 3165 DEBUG cinder.volume.drivers.rbd [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] cloning 'volume-536a482f-d936-4c7a-8b3d-fdbcc1c20798' to 'volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:509 2016-02-07 16:47:43.345 3165 INFO cinder.volume.flows.manager.create_volume [req-08be5df7-9fdc-4369-8c3a-fae7967175e7 cf705d47b500468f9acf56e90137e8f4 b63a4858c1b94672b264228f0de21a40 - - -] Volume volume-9477ed71-72d7-4c19-a8e5-f0e2b3f1622a (9477ed71-72d7-4c19-a8e5-f0e2b3f1622a): created successfully A cinder disk was created, but nowhere to be found on the engine side. Expected results: * To rollback gracefully (either prevent the creation of this disk or delete it somehow). * Prevent NPE.
The NPE is not a relevant issue since the engine was stopped and we might see different type of exceptions of SQL Exception and NPE. The real issue is if there is a Cinder disk that is not deleted after the reboot from Cinder.
The engine already supports revert tasks of Cinder operations, though there is a corner case when the engine gets restarted right after there is a call to a Cinder operation and before the CoCo task gets registered in the DB. To fix that we should add a header into the CoCo request so after a reboot the engine will be able to revert the operation whether it was done in Cinder or not. Based that on 3.6 Cinder is in tech-preview and the CoCo infrastructure was changed drastically in 4.0, and also that this issue is a corner case, it will be better to focus the efforts only on 4.0, therefore, changing the target milestone to 4.0
oVirt 4.0 beta has been released, moving to RC milestone.
Maor, this patch has been acked for over a month. Can you please verify and merge it?
(In reply to Allon Mureinik from comment #5) > Maor, this patch has been acked for over a month. > Can you please verify and merge it? I had some problems with the verification, I will need to re-visit the code, I will do it this week
(In reply to Maor from comment #6) > (In reply to Allon Mureinik from comment #5) > > Maor, this patch has been acked for over a month. > > Can you please verify and merge it? > > I had some problems with the verification, I will need to re-visit the code, > I will do it this week I've encountered a problem with the solution proposed in the patch. To achieve a full compensation for Cinder volume I would need to create the Cinder volume with a predefine id as being done with VDSM for image volume, I tried to use woorea API for that by passing imageRef, but it seems that there is no option to initialize a volume with a pre-define id. Therefore I remove the patch and move the bug back to new. I will try to open an issue on Cinder to add the ability to create a volume with a predefine id.
In 4.3 we are changing the way Engine works with Cinder so this bug is irrelevant anymore.