Created attachment 775129 [details] logs Description of problem: Failed to LSM, "Replication job not found for disk vda" (appears in HSM) Version-Release number of selected component (if applicable): is5 How reproducible: Steps to Reproduce: 1. create VM with one disk, 2 ISCSI storage domains 2. LSM the VM disk 3. Actual results: fails Expected results: should work Additional info: ================ Thread-383::ERROR::2013-07-18 09:07:55,731::BindingXMLRPC::998::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 984, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 373, in vmDiskReplicateStart return vm.diskReplicateStart(srcDisk, dstDisk) File "/usr/share/vdsm/API.py", line 529, in diskReplicateStart return v.diskReplicateStart(srcDisk, dstDisk) File "/usr/share/vdsm/vm.py", line 3828, in diskReplicateStart self._setDiskReplica(srcDrive, dstDisk) File "/usr/share/vdsm/vm.py", line 3778, in _setDiskReplica "replication" % srcDrive.name) RuntimeError: Disk 'vda' already has an ongoing replication Thread-384::DEBUG::2013-07-18 09:07:55,746::BindingXMLRPC::979::vds::(wrapper) client [10.35.161.44]::call vmDiskReplicateFinish with ('8deb82ed-e630-44e2-9679-bdc761fdb9b5', {'device': 'disk', 'domainID': '19fd7e93-d479-45a0-92b9-5ef8fa d759f9', 'volumeID': '4617c659-0c02-42b7-9f63-3c823f7b830d', 'poolID': '5849b030-626e-47cb-ad90-3ce782d831b3', 'imageID': 'af1eb886-d759-48f0-ad02-7224b005ee88'}, {'device': 'disk', 'domainID': '19fd7e93-d479-45a0-92b9-5ef8fad759f9', 'vo lumeID': '4617c659-0c02-42b7-9f63-3c823f7b830d', 'poolID': '5849b030-626e-47cb-ad90-3ce782d831b3', 'imageID': 'af1eb886-d759-48f0-ad02-7224b005ee88'}) {} flowID [3b98d83c] Thread-384::ERROR::2013-07-18 09:07:55,748::vm::3878::vm.Vm::(diskReplicateFinish) vmId=`8deb82ed-e630-44e2-9679-bdc761fdb9b5`::Replication job not found for disk vda ({'device': 'disk', 'domainID': '19fd7e93-d479-45a0-92b9-5ef8fad759f9' , 'volumeID': '4617c659-0c02-42b7-9f63-3c823f7b830d', 'poolID': '5849b030-626e-47cb-ad90-3ce782d831b3', 'imageID': 'af1eb886-d759-48f0-ad02-7224b005ee88'})
Original code was: 1ce73032 def diskReplicateStart(self, srcDisk, dstDisk): 1ce73032 try: 1ce73032 srcDrive = self._findDriveByUUIDs(srcDisk) 1ce73032 except LookupError: 30a4b50a self.log.error("Unable to find the disk for '%s'", srcDisk) 1ce73032 return errCode['imageErr'] 1ce73032 30a4b50a try: 30a4b50a self._setDiskReplica(srcDrive, dstDisk) 30a4b50a except: 30a4b50a self.log.error("Unable to set the replication for disk '%s' with " 30a4b50a "destination '%s'" % srcDrive.name, dstDisk) 1ce73032 return errCode['replicaErr'] 1ce73032 1ce73032 dstDiskCopy = dstDisk.copy() Somehow it looks like that commit: commit ee082b87f12618e97881ecbfc0fcccdfdec3e887 Author: Vinzenz Feenstra <vfeenstr> Date: Wed Apr 24 11:14:01 2013 +0200 vdsm: [REFACTORING] libvirtvm.py bye bye doubled the number of times we call self._setDiskReplica(...): ee082b87 def diskReplicateStart(self, srcDisk, dstDisk): ee082b87 try: ee082b87 srcDrive = self._findDriveByUUIDs(srcDisk) ee082b87 except LookupError: ee082b87 self.log.error("Unable to find the disk for '%s'", srcDisk) ee082b87 return errCode['imageErr'] ee082b87 ee082b87 try: ee082b87 self._setDiskReplica(srcDrive, dstDisk) ee082b87 except: ee082b87 self.log.error("Unable to set the replication for disk '%s' with " ee082b87 "destination '%s'" % srcDrive.name, dstDisk) ee082b87 return errCode['replicaErr'] ee082b87 ee082b87 self._setDiskReplica(srcDrive, dstDisk) ee082b87 dstDiskCopy = dstDisk.copy() Which results in: Thread-383::ERROR::2013-07-18 09:07:55,731::BindingXMLRPC::998::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 984, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 373, in vmDiskReplicateStart return vm.diskReplicateStart(srcDisk, dstDisk) File "/usr/share/vdsm/API.py", line 529, in diskReplicateStart return v.diskReplicateStart(srcDisk, dstDisk) File "/usr/share/vdsm/vm.py", line 3828, in diskReplicateStart self._setDiskReplica(srcDrive, dstDisk) File "/usr/share/vdsm/vm.py", line 3778, in _setDiskReplica "replication" % srcDrive.name) RuntimeError: Disk 'vda' already has an ongoing replication
*** Bug 986914 has been marked as a duplicate of this bug. ***
Fede, By comparing the 2 patches (just vimdiff between libvirtvm.py and vm.py one patch later) it looks like there are additional regressions: "if hasattr(self, "diskReplicate"):" has been reintroduced instead of "self.isDiskReplicationInProgress():" Under hotunplugDisk self.conf['devices'] is being modified without the conf lock. i.e. missing: "with self._confLock:" and a couple more.
After discussing with Vinzenz, he is fixing the regressions introduced by the patch. Reassigning.
Merged u/s to master as http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=57170907e4b6cb8c0d458f87ea89217fbac8c112
Merged u/s to ovirt-3.3 as http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=e5dbe4cbef2d57c34232fd998ee82b3641ee7931
Now it fails with message: Drive image file %s could not be found. Note that DC is iSCSI. Thread-752::ERROR::2013-08-08 12:49:48,424::vm::3840::vm.Vm::(diskReplicateStart) vmId=`7339e1b5-531c-4644-a479-b7c79dcb6650`::Unable to find the disk for '{'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}' Thread-752::DEBUG::2013-08-08 12:49:48,424::BindingXMLRPC::981::vds::(wrapper) return vmDiskReplicateStart with {'status': {'message': 'Drive image file %s could not be found', 'code': 13}} Thread-753::DEBUG::2013-08-08 12:49:48,437::BindingXMLRPC::974::vds::(wrapper) client [10.35.161.27]::call vmDiskReplicateFinish with ('7339e1b5-531c-4644-a479-b7c79dcb6650', {'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}, {'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}) {} flowID [52807d10] Thread-753::DEBUG::2013-08-08 12:49:48,437::BindingXMLRPC::981::vds::(wrapper) return vmDiskReplicateFinish with {'status': {'message': 'Drive image file %s could not be found', 'code': 13}}
Created attachment 784414 [details] Logs from jenkins job
I am not too much into storage, but there seems to be an issue here: f29d3d3b-5a19-4bf0-8c92-3d126d497412::DEBUG::2013-08-08 12:49:42,936::resourceManager::615::ResourceManager::(releaseResource) Trying to release resource '67251752-d80e-4395-99f7-c587a7b57802_volumeNS.c73bbed7-5db2-49ca-bd58-5937ea75ecc0' f29d3d3b-5a19-4bf0-8c92-3d126d497412::DEBUG::2013-08-08 12:49:42,938::resourceManager::634::ResourceManager::(releaseResource) Released resource '67251752-d80e-4395-99f7-c587a7b57802_volumeNS.c73bbed7-5db2-49ca-bd58-5937ea75ecc0' (0 active users) f29d3d3b-5a19-4bf0-8c92-3d126d497412::DEBUG::2013-08-08 12:49:42,939::resourceManager::640::ResourceManager::(releaseResource) Resource '67251752-d80e-4395-99f7-c587a7b57802_volumeNS.c73bbed7-5db2-49ca-bd58-5937ea75ecc0' is free, finding out if anyone is waiting for it. f29d3d3b-5a19-4bf0-8c92-3d126d497412::DEBUG::2013-08-08 12:49:42,940::resourceManager::648::ResourceManager::(releaseResource) No one is waiting for resource '67251752-d80e-4395-99f7-c587a7b57802_volumeNS.c73bbed7-5db2-49ca-bd58-5937ea75ecc0', Clearing records. and then the next mentioning of c73bbed7-5db2-49ca-bd58-5937ea75ecc0 is: Thread-752::DEBUG::2013-08-08 12:49:48,423::BindingXMLRPC::974::vds::(wrapper) client [10.35.161.27]::call vmDiskReplicateStart with ('7339e1b5-531c-4644-a479-b7c79dcb6650', {'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}, {'device': 'disk', 'domainID': 'bf0f8dc2-9a4a-4095-b2ae-8d6915404697', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}) {} flowID [52807d10] Thread-752::ERROR::2013-08-08 12:49:48,424::vm::3840::vm.Vm::(diskReplicateStart) vmId=`7339e1b5-531c-4644-a479-b7c79dcb6650`::Unable to find the disk for '{'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}' Thread-752::DEBUG::2013-08-08 12:49:48,424::BindingXMLRPC::981::vds::(wrapper) return vmDiskReplicateStart with {'status': {'message': 'Drive image file %s could not be found', 'code': 13}} Thread-753::DEBUG::2013-08-08 12:49:48,437::BindingXMLRPC::974::vds::(wrapper) client [10.35.161.27]::call vmDiskReplicateFinish with ('7339e1b5-531c-4644-a479-b7c79dcb6650', {'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}, {'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}) {} flowID [52807d10] Thread-753::DEBUG::2013-08-08 12:49:48,437::BindingXMLRPC::981::vds::(wrapper) return vmDiskReplicateFinish with {'status': {'message': 'Drive image file %s could not be found', 'code': 13}} I don't think that this is related to the origin of this bug nor is it related to the refactoring.
Could you please check that? Thanks.
Jakub what is the version of the engine?
This looks like an engine side issue. I see that the vmSnapshot call is missing: Thread-701::INFO::2013-08-08 12:48:47,433::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='67251752-d80e-4395-99f7-c587a7b57802', spUUID='7321a626-04dc-48d2-b261-5bfcc6d88137', imgUUID='faddefb7-3513-42f6-9731-499853740ffe', size='5368709120', volFormat=4, preallocate=2, diskType=2, volUUID='c73bbed7-5db2-49ca-bd58-5937ea75ecc0', desc='', srcImgUUID='faddefb7-3513-42f6-9731-499853740ffe', srcVolUUID='aa29d19a-1c73-4d33-98f1-2bd925eb76dc') (...missing vmSnapshot call on 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0') Thread-729::INFO::2013-08-08 12:49:21,309::logUtils::44::dispatcher::(wrapper) Run and protect: cloneImageStructure(spUUID='7321a626-04dc-48d2-b261-5bfcc6d88137', sdUUID='67251752-d80e-4395-99f7-c587a7b57802', imgUUID='faddefb7-3513-42f6-9731-499853740ffe', dstSdUUID='bf0f8dc2-9a4a-4095-b2ae-8d6915404697') Thread-752::DEBUG::2013-08-08 12:49:48,423::BindingXMLRPC::974::vds::(wrapper) client [10.35.161.27]::call vmDiskReplicateStart with ('7339e1b5-531c-4644-a479-b7c79dcb6650', {'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}, {'device': 'disk', 'domainID': 'bf0f8dc2-9a4a-4095-b2ae-8d6915404697', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}) {} flowID [52807d10] Thread-752::ERROR::2013-08-08 12:49:48,424::vm::3840::vm.Vm::(diskReplicateStart) vmId=`7339e1b5-531c-4644-a479-b7c79dcb6650`::Unable to find the disk for '{'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': 'faddefb7-3513-42f6-9731-499853740ffe'}' Thread-752::DEBUG::2013-08-08 12:49:48,424::BindingXMLRPC::981::vds::(wrapper) return vmDiskReplicateStart with {'status': {'message': 'Drive image file %s could not be found', 'code': 13}} Daniel I think you fixed something similar recently, right? Can you assist Jakub in verifying the fix for this bug?
(In reply to Federico Simoncelli from comment #14) > Jakub what is the version of the engine? Currently reproduces on rhevm-3.3.0-0.15 (In reply to Federico Simoncelli from comment #15) > Daniel I think you fixed something similar recently, right? Isn't it https://bugzilla.redhat.com/show_bug.cgi?id=987783 ? > Can you assist Jakub in verifying the fix for this bug? If it's the bug 987783 I see it's verified in is10 but our tests still fail in is10 because of this issue.
(In reply to Federico Simoncelli from comment #15) > This looks like an engine side issue. > > I see that the vmSnapshot call is missing: > > Thread-701::INFO::2013-08-08 > 12:48:47,433::logUtils::44::dispatcher::(wrapper) Run and protect: > createVolume(sdUUID='67251752-d80e-4395-99f7-c587a7b57802', > spUUID='7321a626-04dc-48d2-b261-5bfcc6d88137', > imgUUID='faddefb7-3513-42f6-9731-499853740ffe', size='5368709120', > volFormat=4, preallocate=2, diskType=2, > volUUID='c73bbed7-5db2-49ca-bd58-5937ea75ecc0', desc='', > srcImgUUID='faddefb7-3513-42f6-9731-499853740ffe', > srcVolUUID='aa29d19a-1c73-4d33-98f1-2bd925eb76dc') > > (...missing vmSnapshot call on 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0') > > Thread-729::INFO::2013-08-08 > 12:49:21,309::logUtils::44::dispatcher::(wrapper) Run and protect: > cloneImageStructure(spUUID='7321a626-04dc-48d2-b261-5bfcc6d88137', > sdUUID='67251752-d80e-4395-99f7-c587a7b57802', > imgUUID='faddefb7-3513-42f6-9731-499853740ffe', > dstSdUUID='bf0f8dc2-9a4a-4095-b2ae-8d6915404697') > > Thread-752::DEBUG::2013-08-08 > 12:49:48,423::BindingXMLRPC::974::vds::(wrapper) client [10.35.161.27]::call > vmDiskReplicateStart with ('7339e1b5-531c-4644-a479-b7c79dcb6650', > {'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', > 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': > '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': > 'faddefb7-3513-42f6-9731-499853740ffe'}, {'device': 'disk', 'domainID': > 'bf0f8dc2-9a4a-4095-b2ae-8d6915404697', 'volumeID': > 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': > '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': > 'faddefb7-3513-42f6-9731-499853740ffe'}) {} flowID [52807d10] > Thread-752::ERROR::2013-08-08 > 12:49:48,424::vm::3840::vm.Vm::(diskReplicateStart) > vmId=`7339e1b5-531c-4644-a479-b7c79dcb6650`::Unable to find the disk for > '{'device': 'disk', 'domainID': '67251752-d80e-4395-99f7-c587a7b57802', > 'volumeID': 'c73bbed7-5db2-49ca-bd58-5937ea75ecc0', 'poolID': > '7321a626-04dc-48d2-b261-5bfcc6d88137', 'imageID': > 'faddefb7-3513-42f6-9731-499853740ffe'}' > Thread-752::DEBUG::2013-08-08 > 12:49:48,424::BindingXMLRPC::981::vds::(wrapper) return vmDiskReplicateStart > with {'status': {'message': 'Drive image file %s could not be found', > 'code': 13}} > > > Daniel I think you fixed something similar recently, right? > Can you assist Jakub in verifying the fix for this bug? Yes, it looks similar to bug 987783 Resolved by http://gerrit.ovirt.org/#/c/17553/
Ayal, Following comment #3, can you tell us the scenarios we need to check here?
(In reply to Aharon Canan from comment #19) > Ayal, > > Following comment #3, > can you tell us the scenarios we need to check here? The rest of the issues (beyond the one in this bug) would be really difficult to reproduce mostly. One easy is extend (create thinnly provisioned disk, run vm and write a lot of data to this disk).
is14.
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html