Bug 908648 - Export VM & discard snapshots fails with "low level Image copy failed" when using compatibility mode 3.1 & 3.2
Summary: Export VM & discard snapshots fails with "low level Image copy failed" when u...
Keywords:
Status: CLOSED DUPLICATE of bug 910667
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: x86_64
OS: Linux
urgent
unspecified
Target Milestone: ---
: 3.2.0
Assignee: Yaniv Bronhaim
QA Contact: Barak Dagan
URL:
Whiteboard: infra
Depends On:
Blocks: 890951
TreeView+ depends on / blocked
 
Reported: 2013-02-07 09:05 UTC by Eyal Edri
Modified: 2016-02-10 19:27 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-17 14:42:09 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine & host logs (11.06 MB, application/x-gzip)
2013-02-07 09:13 UTC, Eyal Edri
no flags Details

Description Eyal Edri 2013-02-07 09:05:02 UTC
Description of problem:
Exporting vm & with "discard snapshot = true' result in low level image copy failed.

http://jenkins.eng.lab.tlv.redhat.com/view/Automation/view/rhevm_3.2/view/Infra/job/rhevm_3.2_automation_coretools_restapi_vms/COMPATIBILITY_VER=3.1,ENGINE=rest,HOST_HOSTGROUP=P-RHEVM-RHEL64-HOSTS,label=RHEVM-ENGINE-RHEL64-ART-RPM/423/testReport/junit/Vms/105-Export%20VM%20discard%20snapshots/Export_VM_discard_snapshots/





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:



2013-02-07 09:28:09,601 - MainThread - vms - DEBUG - Response code is valid: [200, 201] 
2013-02-07 09:28:09,601 - MainThread - vms - ERROR - Action status is not valid, expected is: ['complete'], actual is: failed 
2013-02-07 09:28:09,755 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-automation-rpm-vm30.eng.lab.tlv.redhat.com): 2013-02-07 09:28:00,259 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-7) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com
2013-02-07 09:28:00,272 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-7) Running command: LoginUserCommand internal: false.
2013-02-07 09:28:00,331 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-9) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com
2013-02-07 09:28:00,344 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-9) Running command: LoginUserCommand internal: false.
2013-02-07 09:28:00,092 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-4) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com
2013-02-07 09:28:00,106 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-4) Running command: LoginUserCommand internal: false.
2013-02-07 09:28:00,259 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-7) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com
2013-02-07 09:28:00,272 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-7) Running command: LoginUserCommand internal: false.
2013-02-07 09:28:00,331 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-9) Error in finding LDAP servers for domain dc.eng.lab.tlv.redhat.com
2013-02-07 09:28:00,344 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-9) Running command: LoginUserCommand internal: false.
2013-02-07 09:28:09,370 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-70) Failed in HSMGetAllTasksStatusesVDS method
2013-02-07 09:28:09,370 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-70) Error code CopyImageError and error message VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed
2013-02-07 09:28:09,370 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-70) SPMAsyncTask::PollTask: Polling task 928055c8-30fb-482e-8546-71e009790279 (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2013-02-07 09:28:09,374 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-70) BaseAsyncTask::LogEndTaskFailure: Task 928055c8-30fb-482e-8546-71e009790279 (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed,
-- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed

Comment 1 Eyal Edri 2013-02-07 09:13:29 UTC
Created attachment 694345 [details]
engine & host logs

includes vdsm,libvirt,engine,sanlock

Comment 4 Oded Ramraz 2013-02-08 07:51:42 UTC
I see this error on our zstream environment as well :

2013-02-08 09:34:12,308 - MainThread - plmanagement.matrix-test-composer - ERROR - Entity rest_templ1 not found  for collection 'templates'.
2013-02-08 09:34:12,498 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-vm-17.qa.lab.tlv.redhat.com): 2013-02-08 09:34:04,293 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-81) Failed in HSMGetAllTasksStatusesVDS method
2013-02-08 09:34:04,293 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-81) Error code CopyImageError and error message VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed
2013-02-08 09:34:04,294 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-81) SPMAsyncTask::PollTask: Polling task 4367540b-2e43-42f5-8edc-e15e3009b607 (Parent Command AddVmTemplate, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2013-02-08 09:34:04,301 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-81) BaseAsyncTask::LogEndTaskFailure: Task 4367540b-2e43-42f5-8edc-e15e3009b607 (Parent Command AddVmTemplate, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed,
-- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed
2013-02-08 09:34:04,302 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-81) EntityAsyncTask::EndActionIfNecessary: All tasks of entity fe964e8b-12c2-406c-86ff-266b88fa528a has ended -> executing EndAction
2013-02-08 09:34:04,330 ERROR [org.ovirt.engine.core.bll.AddVmTemplateCommand] (pool-3-thread-50) [518] Ending command with failure: org.ovirt.engine.core.bll.AddVmTemplateCommand
2013-02-08 09:34:04,337 ERROR [org.ovirt.engine.core.bll.CreateImageTemplateCommand] (pool-3-thread-50) [64860853] Ending command with failure: org.ovirt.engine.core.bll.CreateImageTemplateCommand


#################################################################


vdsm version ( si27 ) : vdsm-4.10.2-1.3.el6.x86_64

Comment 5 Eduardo Warszawski 2013-02-10 19:41:25 UTC
An image on an NFS domain is deleted with postZero = True.
Post zeroing NFS images is forbidden because it is irrelevant.


Thread-1603::INFO::2013-02-07 09:24:58,910::logUtils::37::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='72385e09-7f27-4db2-951c-3795fae0b391', s
pUUID='a7fddec3-aea5-4d4e-a3a0-435d18aabee5', imgUUID='fd9eafa5-64c7-4a18-8449-a8ea21e3a8b4', postZero='true', force='false')
Thread-1603::INFO::2013-02-07 09:24:58,910::fileSD::302::Storage.StorageDomain::(validate) sdUUID=72385e09-7f27-4db2-951c-3795fae0b391

Thread-1603::DEBUG::2013-02-07 09:24:58,914::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=D
ataDomainRest', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1', 'POOL_DESCRIPTION=De
faultRest', 'POOL_DOMAINS=527e2e5c-5736-4bd5-8ce0-0efcc06aa8ab:Active,72385e09-7f27-4db2-951c-3795fae0b391:Active,ea82189d-3486-4926-880a-bb3c45b18b82:Active
', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=1', 'POOL_UUID=a7fddec3-aea5-4d4e-a3a0-435d18aabee5', 'REMOTE_PATH=10.35.64.106:/export/jenkins-automation-rpm-vm30_nfs_20
13_02_07_9_8_29_765150', 'ROLE=Master', 'SDUUID=72385e09-7f27-4db2-951c-3795fae0b391', 'TYPE=NFS', 'VERSION=3', '_SHA_CKSUM=9194c5945c950223206ef5b24a0777a39
db80658']


813af784-199a-4c61-a8ef-cd633eacf588::ERROR::2013-02-07 09:24:59,340::task::833::TaskManager.Task::(_setError) Task=`813af784-199a-4c61-a8ef-cd633eacf588`::U
nexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 307, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/fileSD.py", line 354, in zeroImage
    "fileSD %s should not be zeroed." % (imgUUID, sdUUID))
SourceImageActionError: Error during source image manipulation: 'image=fd9eafa5-64c7-4a18-8449-a8ea21e3a8b4, source domain=72385e09-7f27-4db2-951c-3795fae0b391: image fd9eafa5-64c7-4a18-8449-a8ea21e3a8b4 on a fileSD 72385e09-7f27-4db2-951c-3795fae0b391 should not be zeroed.'

Comment 6 Eduardo Warszawski 2013-02-10 19:52:18 UTC
The reason for the engine fail, unrelated to the previous one:

928055c8-30fb-482e-8546-71e009790279::ERROR::2013-02-07 09:28:06,277::remoteFileHandler::190::Storage.CrabRPCProxy::(callCrabRPCFunction) Problem with handler, treating as timeout
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in callCrabRPCFunction
    rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 149, in _recvAll
    timeLeft):
  File "/usr/lib64/python2.6/contextlib.py", line 83, in helper
    return GeneratorContextManager(func(*args, **kwds))
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 133, in _poll
    res = misc.NoIntrPoll(self._poller.poll, timeout)
  File "/usr/share/vdsm/storage/misc.py", line 1348, in NoIntrPoll
    return pollfun(timeout)
error: (4, 'Interrupted system call')
928055c8-30fb-482e-8546-71e009790279::ERROR::2013-02-07 09:28:06,278::image::900::Storage.Image::(copy) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 881, in copy
    size=srcVol.getVolumeSize(bs=1), dstvolType=dstVol.getType())
  File "/usr/share/vdsm/storage/volume.py", line 1023, in qemuConvert
    recoveryCallback=baseAsyncTasksRollback)
  File "/usr/share/vdsm/storage/misc.py", line 254, in watchCmd
    recoveryCallback(proc)
  File "/usr/share/vdsm/storage/volume.py", line 971, in baseAsyncTasksRollback
    str(misc.getProcCtime(proc.pid))]))
  File "/usr/share/vdsm/storage/task.py", line 1027, in pushRecovery
    self.persist()
  File "/usr/share/vdsm/storage/task.py", line 1098, in persist
    self._save(self.store)
  File "/usr/share/vdsm/storage/task.py", line 742, in _save
    getProcPool().os.rename(origTaskDir, origTaskDir + BACKUP_EXT)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 275, in callCrabRPCFunction
    *args, **kwargs)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 191, in callCrabRPCFunction
    raise Timeout()
Timeout

Comment 7 Eyal Edri 2013-02-11 12:51:51 UTC
any ETA on addressing this bug? it's a major test blocker for almost all automaton tests for 3.2. (and possibly 3.1.Z)

Comment 8 Ayal Baron 2013-02-11 20:11:51 UTC
According to comment 6 moving to infra.

Comment 9 Yaniv Kaul 2013-02-17 11:10:55 UTC
Isn't it a dup of bug 911139?

Comment 11 Eyal Edri 2013-02-17 14:42:09 UTC

*** This bug has been marked as a duplicate of bug 910667 ***


Note You need to log in before you can comment on or make changes to this bug.