Created attachment 738974 [details] logs Description of problem: After vdsm responses to engine about failed extendStorageDomain, engine throws: 2013-04-23 13:42:52,219 ERROR [org.ovirt.engine.core.bll.storage.ExtendSANStorageDomainCommand] (ajp-/127.0.0.1:8702-6) [2f41e0fd] Command org.ovirt.engine.core.bll.storage.ExtendSANStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction After that, in the next 3 minutes, engine will fail any other asynchronous tasks (whether if they'll success or fail by vdsm). Version-Release number of selected component (if applicable): RHEVM - 3.2 - SF13.1 rhevm-backend-3.2.0-10.19.beta2.el6ev.noarch libvirt-0.10.2-18.el6_4.4.x86_64 vdsm-4.10.2-15.0.el6ev.x86_64 sanlock-2.6-2.el6.x86_64 How reproducible: 100% (will have to get a failure in extendStorageDomain in order to reproduce) Steps to Reproduce: in a 2 hosts FC env.: 1) Trigger a failure in extend storage domain (I got one by a failure in remove mappings holding device) vdsm will update the engine about the failure 2) Try to do another action that will cause an asynchronous task by vdsm. Actual results: Any other asynchronous tasks that will be performed by vdsm in the next 3 minutes, will get fail by engine due to a timeout. Expected results: There should not be an exception in engine after failure in ExtendStorageDomain. Additional info: See logs attached.
(In reply to comment #0) > Actual results: > Any other asynchronous tasks that will be performed by vdsm in the next 3 > minutes, will get fail by engine due to a timeout. This implies there are exceptions in the engine log for the next three minutes. I don't see any in engine_elad_rhevm32.log. All I see is the ExtedStorageDomainVDS and related errors, and only for the following minute. Where did you see these exceptions? In the UI only? Were you using webadmin? Am I looking at the correct log? If not, where should I be looking? If yes - which other exceptions am I missing? > Expected results: > There should not be an exception in engine after failure in > ExtendStorageDomain. You mean for the other operations I seem to be missing, right? You're ok with the ExtedStorageDomainVDS etc. exceptions, correct?
(In reply to comment #1) > (In reply to comment #0) The first ExtendStorageDomainVDSCommand sent to vdsm at 13:32:48,751. This operation failed. After that (about 2 minuted) (13:34:26,173), I tried again to extend the domain. This operation was send by engine to vdsm only after 10 minutes (13:42:48,797). The operation failed by vdsm but even if it was successful, engine would fail it due to timeout. I've indeed used the webadmin.
Already fixed upstream
engine is not failing other asynchronous tasks after failure in ExtendStorageDomain: vdsm failed to extend the domain: on host: Thread-991::INFO::2013-05-15 22:28:41,094::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='5191a994-536b-49fc-bbbf-fa0429fade12', spUUID='b6dba6fd-e23c-4 474-b2ef-6d3fa985987d', devlist=['1elad21367847', '1elad3113678542'], force=True, options=None) Thread-991::ERROR::2013-05-15 22:28:41,136::task::850::TaskManager.Task::(_setError) Task=`35b0e6b0-366d-4fe6-8d1d-280e359202d0`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 41, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 684, in extendStorageDomain pool.extendSD(sdUUID, devlist, force) File "/usr/share/vdsm/storage/securable.py2013-05-15 22:29:12,332 INFO [org.ovirt.engine.core.bll.storage.ExtendSANStorageDomainCommand] (ajp-/127.0.0.1:8702-11) [1d36254b] Running command: ExtendSANStorageDomainCommand internal: false. Entities affected : ID: 5191a994-536b-49fc-bbbf-fa0429fade12 Type: Storage 2013-05-15 22:29:12,337 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ExtendStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [1d36254b] START, ExtendStorageDomainVDSCommand( storagePoolId = b6dba6fd-e23c-4474-b2ef-6d3fa985987d, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 5191a994-536b-49fc-bbbf-fa0429fade12, deviceList = [1elad3313678542]), log id: 3b5ab952 2013-05-15 22:29:15,179 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ExtendStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [1d36254b] FINISH, ExtendStorageDomainVDSCommand, log id: 3b5ab952 ", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 2018, in extendSD sdCache.produce(sdUUID).extend(devlist, force) File "/usr/share/vdsm/storage/blockSD.py", line 695, in extend lvm.extendVG(self.sdUUID, devices, force) File "/usr/share/vdsm/storage/lvm.py", line 914, in extendVG _initpvs(pvs, int(vg.vg_mda_size) / 2 ** 20, force) File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs _initpvs_removeHolders() File "/usr/share/vdsm/storage/lvm.py", line 667, in _initpvs_removeHolders os.path.basename(device)) File "/usr/share/vdsm/storage/devicemapper.py", line 166, in removeMappingsHoldingDevice removeMapping(getDevName(holder)) File "/usr/share/vdsm/storage/devicemapper.py", line 137, in removeMapping return getProxy().removeDeviceMapping(deviceName) File "/usr/share/vdsm/supervdsm.py", line 76, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 67, in <lambda> **kwargs) File "<string>", line 2, in removeDeviceMapping File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) Exception: Could not remove mapping `5191a994--536b--49fc--bbbf--fa0429fade12-ids` on engine: 2013-05-15 22:28:37,871 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8702-5) [379d390e] Failed in ExtendStorageDomainVDS method 2013-05-15 22:28:37,901 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ExtendStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-5) [379d390e] FINISH, ExtendStorageDomainVDSCommand, log id: right after, I tried to extend the domain again and succeed: on host: Thread-1018::INFO::2013-05-15 22:29:15,630::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='5191a994-536b-49fc-bbbf-fa0429fade12', spUUID='b6dba6fd-e23c- 4474-b2ef-6d3fa985987d', devlist=['1elad3313678542'], force=True, options=None) 2013-05-15 22:29:12,332 INFO [org.ovirt.engine.core.bll.storage.ExtendSANStorageDomainCommand] (ajp-/127.0.0.1:8702-11) [1d36254b] Running command: ExtendSANStorageDomainCommand internal: false. Entities affected : ID: 5191a994-536b-49fc-bbbf-fa0429fade12 Type: Storage 2013-05-15 22:29:12,337 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ExtendStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [1d36254b] START, ExtendStorageDomainVDSCommand( storagePoolId = b6dba6fd-e23c-4474-b2ef-6d3fa985987d, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 5191a994-536b-49fc-bbbf-fa0429fade12, deviceList = [1elad3313678542]), log id: 3b5ab952 2013-05-15 22:29:15,179 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ExtendStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [1d36254b] FINISH, ExtendStorageDomainVDSCommand, log id: 3b5ab952 verified on RHEVM-3.2 - SF16: rhevm-3.2.0-10.25.beta3.el6ev.noarch vdsm-4.10.2-18.0.el6ev.x86_64
3.2 has been released