Bug 1270003
Summary: | Failed to import vm template | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Israel Pinto <ipinto> | ||||||||||
Component: | ovirt-engine | Assignee: | Martin Mucha <mmucha> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Burman <mburman> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 3.6.0 | CC: | acanan, alukiano, amureini, cmestreg, cshao, cwu, danken, gklein, huiwa, huzhao, ipinto, leiwang, lsurette, mavital, mburman, mgoldboi, michal.skrivanek, mlipchuk, mmucha, nsoffer, rbalakri, Rhev-m-bugs, srevivo, tnisan, yaniwang, ycui, ykaul, ylavi | ||||||||||
Target Milestone: | ovirt-3.6.0-rc3 | Keywords: | Regression, TestBlocker | ||||||||||
Target Release: | 3.6.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-04-20 01:37:13 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Israel Pinto
2015-10-08 18:25:25 UTC
Created attachment 1081074 [details]
engine_log
Created attachment 1081075 [details]
vdsm_log
any idea why is it failing in copy operation? Seems like the call to qemu-img gets killed. Odd. Nir, can you take a look please? Israel - can you also specify which version of the qemu* RPMs you have there please? [root@ibm-p8-rhevm-04 ~]# rpm -qa | grep qemu* libvirt-daemon-driver-qemu-1.2.17-12.el7.ppc64le qemu-kvm-rhev-2.3.0-28.el7.ppc64le ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch qemu-kvm-common-rhev-2.3.0-28.el7.ppc64le qemu-kvm-tools-rhev-2.3.0-29.el7.ppc64le qemu-img-rhev-2.3.0-28.el7.ppc64le restoring needinfo on Nir, which was mistakenly removed. This is not only happening in PPC64el, is happening in 100% of runs in x86 RHEL7.2 (same logs in engine and vdsm.log) importing a template: ovirt - 3.6-0-17 qemu-kvm-rhev-2.3.0-31.el7.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64 qemu-kvm-common-rhev-2.3.0-31.el7.x86_64 qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64 qemu-img-rhev-2.3.0-31.el7.x86_64 ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch vdsm-4.17.10-5.el7ev.noarch libvirt-1.2.17-13.el7.x86_64 @Gil: should this be mark as blocker? Looks more urgent if it's reproduce also on x86_64. Nir, per Allon's request, could you please assist? (In reply to Carlos Mestre González from comment #8) > This is not only happening in PPC64el, is happening in 100% of runs in x86 > RHEL7.2 (same logs in engine and vdsm.log) importing a template: Carlos, can you attach vdsm and engine logs showing this failure on X86_64 showing this failure? Israel, Does it work when running RHEV 3.6 on RHEL 7.1? Does it work when running RHEV 3.5 on RHEL 7.2? Looking in the current logs (from ppcele?) 1. Starting copyImage Thread-160::INFO::2015-10-08 07:36:18,327::logUtils::48::dispatcher::(wrapper) Run and protect: copyImage(sdUUID='871a92d1-0a98-40ff-a3c8-36ede1ee589c', spUUID='fc6a619b-f98a-4594-86ba-ba4dac253a87', vmUUID='', srcImgUUID='f99d83ec-3a52-4e68-ab2c-67e7863c1288', srcVolUUID='be5e78f0-c472-4af5-8302-0ca94b908b86', dstImgUUID='f99d83ec-3a52-4e68-ab2c-67e7863c1288', dstVolUUID='be5e78f0-c472-4af5-8302-0ca94b908b86', description='', dstSdUUID='941b820a-0b82-4b96-8a79-8818d82bbcf0', volType=6, volFormat=5, preallocate=2, postZero='false', force='false') 2. Scheduling task 2b75ff2e-723f-43d6-9d37-039a67b79f35 Thread-160::DEBUG::2015-10-08 07:36:18,355::taskManager::78::Storage.TaskManager::(scheduleJob) scheduled job copyImage_f99d83ec-3a52-4e68-ab2c-67e7863c1288 for task 2b75ff2e-7 23f-43d6-9d37-039a67b79f35 3. Starting qemu-img convert ... 2b75ff2e-723f-43d6-9d37-039a67b79f35::DEBUG::2015-10-08 07:36:18,634::utils::676::root::(execCmd) /usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -t none -T none -f raw /rhev/data-center/fc6a619b-f98a-4594-86ba-ba4dac253a87/871a92d1-0a98-40ff-a3c8-36ede1ee589c/images/f99d83ec-3a52-4e68-ab2c-67e7863c1288/be5e78f0-c472-4af5-8302-0ca94b908b86 -O raw /rhev/data-center/mnt/10.16.29.93:_ipinto__storage__nfs__1/941b820a-0b82-4b96-8a79-8818d82bbcf0/images/f99d83ec-3a52-4e68-ab2c-67e7863c1288/be5e78f0-c472-4af5-8302-0ca94b908b86 (cwd None) 4. Stopping task 2b75ff2e-723f-43d6-9d37-039a67b79f35 Thread-317::INFO::2015-10-08 07:36:19,150::logUtils::48::dispatcher::(wrapper) Run and protect: stopTask(taskID='2b75ff2e-723f-43d6-9d37-039a67b79f35', spUUID=None, options=None) 5. Task stopped Thread-364::INFO::2015-10-08 07:36:19,561::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:50176 stopped 2b75ff2e-723f-43d6-9d37-039a67b79f35::ERROR::2015-10-08 07:36:19,638::image::857::Storage.Image::(copyCollapsed) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 836, in copyCollapsed volume.fmt2str(dstVolFormat)) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 206, in convert ioclass=utils.IOCLASS.IDLE) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 718, in watchCmd raise ActionStopped() ActionStopped: Action was stopped: () 6. Bogus task error after task was stopped 2b75ff2e-723f-43d6-9d37-039a67b79f35::ERROR::2015-10-08 07:36:19,640::task::866::Storage.TaskManager.Task::(_setError) Task=`2b75ff2e-723f-43d6-9d37-039a67b79f35`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 332, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1512, in copyImage postZero, force) File "/usr/share/vdsm/storage/image.py", line 859, in copyCollapsed (srcImgUUID, dstImgUUID, str(e))) CopyImageError: low level Image copy failed: ('src image=f99d83ec-3a52-4e68-ab2c-67e7863c1288, dst image=f99d83ec-3a52-4e68-ab2c-67e7863c1288: msg=Action was stopped: ()',) We have several examples of this flow in the same log: $ egrep 'stopTask|ActionStopped|CopyImageError' vdsm.log.1 Thread-317::INFO::2015-10-08 07:36:19,150::logUtils::48::dispatcher::(wrapper) Run and protect: stopTask(taskID='2b75ff2e-723f-43d6-9d37-039a67b79f35', spUUID=None, options=None) Thread-317::DEBUG::2015-10-08 07:36:19,150::taskManager::155::Storage.TaskManager::(stopTask) Entry. taskID: 2b75ff2e-723f-43d6-9d37-039a67b79f35 Thread-317::DEBUG::2015-10-08 07:36:19,151::taskManager::158::Storage.TaskManager::(stopTask) Return. Thread-317::INFO::2015-10-08 07:36:19,151::logUtils::51::dispatcher::(wrapper) Run and protect: stopTask, Return response: True raise ActionStopped() ActionStopped: Action was stopped: () CopyImageError: low level Image copy failed: ('src image=f99d83ec-3a52-4e68-ab2c-67e7863c1288, dst image=f99d83ec-3a52-4e68-ab2c-67e7863c1288: msg=Action was stopped: ()',) Thread-328::INFO::2015-10-08 07:39:43,158::logUtils::48::dispatcher::(wrapper) Run and protect: stopTask(taskID='37a93f9c-9271-4592-9133-8272b2252e75', spUUID=None, options=None) Thread-328::DEBUG::2015-10-08 07:39:43,158::taskManager::155::Storage.TaskManager::(stopTask) Entry. taskID: 37a93f9c-9271-4592-9133-8272b2252e75 Thread-328::DEBUG::2015-10-08 07:39:43,158::taskManager::158::Storage.TaskManager::(stopTask) Return. Thread-328::INFO::2015-10-08 07:39:43,158::logUtils::51::dispatcher::(wrapper) Run and protect: stopTask, Return response: True raise ActionStopped() ActionStopped: Action was stopped: () CopyImageError: low level Image copy failed: ('src image=f99d83ec-3a52-4e68-ab2c-67e7863c1288, dst image=f99d83ec-3a52-4e68-ab2c-67e7863c1288: msg=Action was stopped: ()',) Thread-39::INFO::2015-10-08 08:39:10,647::logUtils::48::dispatcher::(wrapper) Run and protect: stopTask(taskID='fbface18-5c65-42c5-806a-5f1daa3208ee', spUUID=None, options=None) Thread-39::DEBUG::2015-10-08 08:39:10,647::taskManager::155::Storage.TaskManager::(stopTask) Entry. taskID: fbface18-5c65-42c5-806a-5f1daa3208ee Thread-39::DEBUG::2015-10-08 08:39:10,648::taskManager::158::Storage.TaskManager::(stopTask) Return. Thread-39::INFO::2015-10-08 08:39:10,648::logUtils::51::dispatcher::(wrapper) Run and protect: stopTask, Return response: True raise ActionStopped() ActionStopped: Action was stopped: () CopyImageError: low level Image copy failed: ('src image=a9584585-57dc-4d9f-ac01-aaba5d690d1c, dst image=a9584585-57dc-4d9f-ac01-aaba5d690d1c: msg=Action was stopped: ()',) Thread-39::INFO::2015-10-08 08:40:03,137::logUtils::48::dispatcher::(wrapper) Run and protect: stopTask(taskID='696e52f6-e4fd-41f5-9cb7-cc5e87dd961e', spUUID=None, options=None) Thread-39::DEBUG::2015-10-08 08:40:03,137::taskManager::155::Storage.TaskManager::(stopTask) Entry. taskID: 696e52f6-e4fd-41f5-9cb7-cc5e87dd961e Thread-39::DEBUG::2015-10-08 08:40:03,137::taskManager::158::Storage.TaskManager::(stopTask) Return. Thread-39::INFO::2015-10-08 08:40:03,137::logUtils::51::dispatcher::(wrapper) Run and protect: stopTask, Return response: True raise ActionStopped() ActionStopped: Action was stopped: () CopyImageError: low level Image copy failed: ('src image=a9584585-57dc-4d9f-ac01-aaba5d690d1c, dst image=a9584585-57dc-4d9f-ac01-aaba5d690d1c: msg=Action was stopped: ()',) [nsoffer@thin 1270003 (master)]$ Everything on the vdsm side is expected - when you ask to stop a task, vdsm will stop it. Looking in engine log: $ grep 'SPMAsyncTask::StopTask: Attempting to stop task' engine.log 2015-10-08 14:36:19,805 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-44) [76a3396a] SPMAsyncTask::StopTask: Attempting to stop task '2b75ff2e-723f-43d6-9d37-039a67b79f35' (Parent Command 'ImportVmTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'). 2015-10-08 14:39:42,471 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-28) [7b5de388] SPMAsyncTask::StopTask: Attempting to stop task '37a93f9c-9271-4592-9133-8272b2252e75' (Parent Command 'ImportVmTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'). 2015-10-08 15:39:11,192 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-13) [5d845592] SPMAsyncTask::StopTask: Attempting to stop task 'fbface18-5c65-42c5-806a-5f1daa3208ee' (Parent Command 'ImportVmTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'). 2015-10-08 15:40:04,728 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-37) [66c84f5f] SPMAsyncTask::StopTask: Attempting to stop task '696e52f6-e4fd-41f5-9cb7-cc5e87dd961e' (Parent Command 'ImportVmTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'). So it was engine who ask to stop the task. Looking in the first command flow in the engine: 2015-10-08 14:36:19,397 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-7-thread-7) [76a3396a] Adding task '2b75ff2e-723f-43d6-9d37-039a67b79f35 ' (Parent Command 'ImportVmTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet.. 2015-10-08 14:36:19,606 ERROR [org.ovirt.engine.core.bll.ImportVmTemplateCommand] (org.ovirt.thread.pool-7-thread-7) [76a3396a] Command 'org.ovirt.engine.core.bll.ImportVmTempl ateCommand' failed: null 2015-10-08 14:36:19,606 ERROR [org.ovirt.engine.core.bll.ImportVmTemplateCommand] (org.ovirt.thread.pool-7-thread-7) [76a3396a] Exception: java.lang.NumberFormatException: null at java.lang.Long.parseLong(Long.java:552) [rt.jar:1.8.0_51] at org.ovirt.engine.core.utils.MacAddressRangeUtils.macToLong(MacAddressRangeUtils.java:122) [utils.jar:] at org.ovirt.engine.core.bll.network.macpoolmanager.MacPoolManagerRanges.isMacInUse(MacPoolManagerRanges.java:142) [bll.jar:] at org.ovirt.engine.core.bll.utils.VmDeviceUtils.canPlugInterfaceInDc(VmDeviceUtils.java:684) [bll.jar:] at org.ovirt.engine.core.bll.utils.VmDeviceUtils.canPlugInterface(VmDeviceUtils.java:672) [bll.jar:] at org.ovirt.engine.core.bll.utils.VmDeviceUtils.addImportedInterfaces(VmDeviceUtils.java:1660) [bll.jar:] at org.ovirt.engine.core.bll.utils.VmDeviceUtils.addImportedDevices(VmDeviceUtils.java:1619) [bll.jar:] at org.ovirt.engine.core.bll.ImportVmTemplateCommand.executeCommand(ImportVmTemplateCommand.java:327) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1211) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1355) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1979) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:374) [bll.jar:] at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:202) [bll.jar:] at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:170) [bll.jar:] at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:179) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:92) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_51] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51] Somebody was not careful with his nulls. So engine cancel the task. 2015-10-08 14:36:19,805 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-44) [76a3396a] SPMAsyncTask::StopTask: Attempting to stop task '2b75ff2e-723f-43d6-9d37-039a67b79f35' (Parent Command 'ImportVmTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'). Maor, can you take a look at this? Michal, this exception in MacAddressRangeUtils is similar to the one we've seen in an import bug a few weeks ago, I think it might be more virt related, can someone from you team take a look? Created attachment 1086612 [details]
engine.log x86
Created attachment 1086614 [details]
vdsm.log x86
I successfully reproduced the problem on a completely clean and freshly deployed system, just create a VM, export to local storage export domain and import it back AFAICT the issue is with mac assignment when importing (or template creation) - reassigning to network Worth noting in Import VM dialog the MAC field is empty. The original VM does have a MAC. I tried it once again during meeting with danken and we had some weird issues: We created VM with 1GB disk, created template and issued it's export. While Danken confirmed, that vdsm was done sufficiently fast with exporting template to export domain, it took over 8 minutes to engine to report it's done. Before engine presented template as exported in "export domain -> Template import" it said in events: "Oct 29, 2015 4:11:43 PM Failed to export Template test to exportDomain" and few minutes after that successfully exported template miraculously summoned. During waiting for this template in export domain, whole app was resilient to refresh, and refreshed only after export was done. After logout and log back in, it takes again minutes to see content in "export domain -> Template import". Not sure if it's related to this bug, or if it's even another bug. Having successfully someting in export domain, I'll try to verify if there's a problem with mac pool asap. Verified on - 3.6.0.3-0.1.el6 |