Description of problem: Exporting vm & with "discard snapshot = true' result in low level image copy failed. http://jenkins.eng.lab.tlv.redhat.com/view/Automation/view/rhevm_3.2/view/Infra/job/rhevm_3.2_automation_coretools_restapi_vms/COMPATIBILITY_VER=3.1,ENGINE=rest,HOST_HOSTGROUP=P-RHEVM-RHEL64-HOSTS,label=RHEVM-ENGINE-RHEL64-ART-RPM/423/testReport/junit/Vms/105-Export%20VM%20discard%20snapshots/Export_VM_discard_snapshots/ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: 2013-02-07 09:28:09,601 - MainThread - vms - DEBUG - Response code is valid: [200, 201] 2013-02-07 09:28:09,601 - MainThread - vms - ERROR - Action status is not valid, expected is: ['complete'], actual is: failed 2013-02-07 09:28:09,755 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-automation-rpm-vm30.eng.lab.tlv.redhat.com): 2013-02-07 09:28:00,259 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-7) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com 2013-02-07 09:28:00,272 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-7) Running command: LoginUserCommand internal: false. 2013-02-07 09:28:00,331 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-9) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com 2013-02-07 09:28:00,344 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-9) Running command: LoginUserCommand internal: false. 2013-02-07 09:28:00,092 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-4) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com 2013-02-07 09:28:00,106 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-4) Running command: LoginUserCommand internal: false. 2013-02-07 09:28:00,259 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-7) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com 2013-02-07 09:28:00,272 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-7) Running command: LoginUserCommand internal: false. 2013-02-07 09:28:00,331 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-9) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com 2013-02-07 09:28:00,344 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-9) Running command: LoginUserCommand internal: false. 2013-02-07 09:28:09,370 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-70) Failed in HSMGetAllTasksStatusesVDS method 2013-02-07 09:28:09,370 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-70) Error code CopyImageError and error message VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed 2013-02-07 09:28:09,370 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-70) SPMAsyncTask::PollTask: Polling task 928055c8-30fb-482e-8546-71e009790279 (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'. 2013-02-07 09:28:09,374 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-70) BaseAsyncTask::LogEndTaskFailure: Task 928055c8-30fb-482e-8546-71e009790279 (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure: -- Result: cleanSuccess -- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, -- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed
Created attachment 694345 [details] engine & host logs includes vdsm,libvirt,engine,sanlock
I see this error on our zstream environment as well : 2013-02-08 09:34:12,308 - MainThread - plmanagement.matrix-test-composer - ERROR - Entity rest_templ1 not found for collection 'templates'. 2013-02-08 09:34:12,498 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-vm-17.qa.lab.tlv.redhat.com): 2013-02-08 09:34:04,293 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-81) Failed in HSMGetAllTasksStatusesVDS method 2013-02-08 09:34:04,293 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-81) Error code CopyImageError and error message VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed 2013-02-08 09:34:04,294 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-81) SPMAsyncTask::PollTask: Polling task 4367540b-2e43-42f5-8edc-e15e3009b607 (Parent Command AddVmTemplate, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'. 2013-02-08 09:34:04,301 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-81) BaseAsyncTask::LogEndTaskFailure: Task 4367540b-2e43-42f5-8edc-e15e3009b607 (Parent Command AddVmTemplate, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure: -- Result: cleanSuccess -- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, -- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed 2013-02-08 09:34:04,302 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-81) EntityAsyncTask::EndActionIfNecessary: All tasks of entity fe964e8b-12c2-406c-86ff-266b88fa528a has ended -> executing EndAction 2013-02-08 09:34:04,330 ERROR [org.ovirt.engine.core.bll.AddVmTemplateCommand] (pool-3-thread-50) [518] Ending command with failure: org.ovirt.engine.core.bll.AddVmTemplateCommand 2013-02-08 09:34:04,337 ERROR [org.ovirt.engine.core.bll.CreateImageTemplateCommand] (pool-3-thread-50) [64860853] Ending command with failure: org.ovirt.engine.core.bll.CreateImageTemplateCommand ################################################################# vdsm version ( si27 ) : vdsm-4.10.2-1.3.el6.x86_64
An image on an NFS domain is deleted with postZero = True. Post zeroing NFS images is forbidden because it is irrelevant. Thread-1603::INFO::2013-02-07 09:24:58,910::logUtils::37::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='72385e09-7f27-4db2-951c-3795fae0b391', s pUUID='a7fddec3-aea5-4d4e-a3a0-435d18aabee5', imgUUID='fd9eafa5-64c7-4a18-8449-a8ea21e3a8b4', postZero='true', force='false') Thread-1603::INFO::2013-02-07 09:24:58,910::fileSD::302::Storage.StorageDomain::(validate) sdUUID=72385e09-7f27-4db2-951c-3795fae0b391 Thread-1603::DEBUG::2013-02-07 09:24:58,914::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=D ataDomainRest', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1', 'POOL_DESCRIPTION=De faultRest', 'POOL_DOMAINS=527e2e5c-5736-4bd5-8ce0-0efcc06aa8ab:Active,72385e09-7f27-4db2-951c-3795fae0b391:Active,ea82189d-3486-4926-880a-bb3c45b18b82:Active ', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=1', 'POOL_UUID=a7fddec3-aea5-4d4e-a3a0-435d18aabee5', 'REMOTE_PATH=10.35.64.106:/export/jenkins-automation-rpm-vm30_nfs_20 13_02_07_9_8_29_765150', 'ROLE=Master', 'SDUUID=72385e09-7f27-4db2-951c-3795fae0b391', 'TYPE=NFS', 'VERSION=3', '_SHA_CKSUM=9194c5945c950223206ef5b24a0777a39 db80658'] 813af784-199a-4c61-a8ef-cd633eacf588::ERROR::2013-02-07 09:24:59,340::task::833::TaskManager.Task::(_setError) Task=`813af784-199a-4c61-a8ef-cd633eacf588`::U nexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 307, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/fileSD.py", line 354, in zeroImage "fileSD %s should not be zeroed." % (imgUUID, sdUUID)) SourceImageActionError: Error during source image manipulation: 'image=fd9eafa5-64c7-4a18-8449-a8ea21e3a8b4, source domain=72385e09-7f27-4db2-951c-3795fae0b391: image fd9eafa5-64c7-4a18-8449-a8ea21e3a8b4 on a fileSD 72385e09-7f27-4db2-951c-3795fae0b391 should not be zeroed.'
The reason for the engine fail, unrelated to the previous one: 928055c8-30fb-482e-8546-71e009790279::ERROR::2013-02-07 09:28:06,277::remoteFileHandler::190::Storage.CrabRPCProxy::(callCrabRPCFunction) Problem with handler, treating as timeout Traceback (most recent call last): File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 149, in _recvAll timeLeft): File "/usr/lib64/python2.6/contextlib.py", line 83, in helper return GeneratorContextManager(func(*args, **kwds)) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 133, in _poll res = misc.NoIntrPoll(self._poller.poll, timeout) File "/usr/share/vdsm/storage/misc.py", line 1348, in NoIntrPoll return pollfun(timeout) error: (4, 'Interrupted system call') 928055c8-30fb-482e-8546-71e009790279::ERROR::2013-02-07 09:28:06,278::image::900::Storage.Image::(copy) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 881, in copy size=srcVol.getVolumeSize(bs=1), dstvolType=dstVol.getType()) File "/usr/share/vdsm/storage/volume.py", line 1023, in qemuConvert recoveryCallback=baseAsyncTasksRollback) File "/usr/share/vdsm/storage/misc.py", line 254, in watchCmd recoveryCallback(proc) File "/usr/share/vdsm/storage/volume.py", line 971, in baseAsyncTasksRollback str(misc.getProcCtime(proc.pid))])) File "/usr/share/vdsm/storage/task.py", line 1027, in pushRecovery self.persist() File "/usr/share/vdsm/storage/task.py", line 1098, in persist self._save(self.store) File "/usr/share/vdsm/storage/task.py", line 742, in _save getProcPool().os.rename(origTaskDir, origTaskDir + BACKUP_EXT) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 275, in callCrabRPCFunction *args, **kwargs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 191, in callCrabRPCFunction raise Timeout() Timeout
any ETA on addressing this bug? it's a major test blocker for almost all automaton tests for 3.2. (and possibly 3.1.Z)
According to comment 6 moving to infra.
Isn't it a dup of bug 911139?
*** This bug has been marked as a duplicate of bug 910667 ***