Created attachment 1348061 [details] engine and vdsm logs Description of problem: In our automation (tier 1) we see that a clod merge operation is failing to remove all volumes. 2017-11-02 19:40:00,671+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage', 2017-11-02 19:40:00,672+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerExceptio n at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_144] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_144] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_144] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78) 2017-11-02 19:40:00,672+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable R untimeException the task will be cleared. As a result, not all volumes get removed Version-Release number of selected component (if applicable): ovirt-engine-4.2.0-0.0.master.20171101152059.git8b85834.el7.centos.noarch How reproducible: Fails on all storage types Steps to Reproduce: 1. Create VM with OS disk + 4 disks 2. Create files on the VM's disks and create snapshot 3. cold merge the snapshot Actual results: Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks Expected results: Cold merge succeeds - before cold merge: 20 volumes, after cold merge: 15 volumes, snapshot contains 5 disks Additional info: * No exception in vdsm side * Regression because this case passed in previous commits
I was able to reproduce the issue; actually, I see the same exception in the engine log, however, the snapshot is removed. Can you please confirm whether the snapshot removed in your env?
Based on the following message: Snapshot 'snapshot_18894_fcp_1' deletion for VM 'vm_TestCase18894_0219280637' has been completed Looks that we are hitting the same issue - the snapshot is removed but there is the NPE. I will look into this.
Yes, the snapshot removed
(In reply to Raz Tamir from comment #3) > Yes, the snapshot removed In the description you mentioned "we see that a clod merge operation is failing to remove all volumes", what do you mean by that? Do the volumes still exist after cold merge?
Yes, This is the output of our tests: Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks It checks lvs before the remove snap and after
I still see the same issue in engine.log: 2017-11-23 12:02:37,743+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage', 2017-11-23 12:02:37,744+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78) 2017-11-23 12:02:37,746+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared. 2017-11-23 12:02:37,748+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] BaseAsyncTask::onTaskEndSuccess: Task 'f06cf00c-76d8-4329-8a8a-fcdec3008c56' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully. 2017-11-23 12:02:37,749+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] CommandAsyncTask::endActionIfNecessary: All tasks of command 'fac1cac6-8f5d-49ba-af34-3b3b8d13e151' has ended -> executing 'endAction'
Created attachment 1358115 [details] engine log
What's the engine version?
ovirt-engine-4.2.0-0.0.master.20171121184703.git173fe83.el7.centos
Raz, Can you please set log level to DEBUG, try again and send the logs? If possible to give access to the env after the test fails, that would be great.
(In reply to Ala Hino from comment #10) > Raz, > > Can you please set log level to DEBUG, try again and send the logs? The provided logs are with DEBUG level > If possible to give access to the env after the test fails, that would be > great.
(In reply to Raz Tamir from comment #11) > (In reply to Ala Hino from comment #10) > > Raz, > > > > Can you please set log level to DEBUG, try again and send the logs? > The provided logs are with DEBUG level > > If possible to give access to the env after the test fails, that would be > > great. The hosts are with DEBUG logs but not the engine. This is an engine side bug.
Ala - Engine log in DEBUG is available @ http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log Please analyze. (CI test failure)
(In reply to Yaniv Kaul from comment #13) > Ala - Engine log in DEBUG is available @ > http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7- > x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/ > basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/ > _var_log/ovirt-engine/engine.log > > Please analyze. > > (CI test failure) Same error as found by automation.
Passed automation run on rhvm-4.2.0-0.6.el7 Verified
This bug was verified on 4.2 but the issue encountered on 4.1.8 too. Re-targeting this bug to 4.1.9.
Relevant log engine.log: 2017-12-08 07:08:07,117+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage', 2017-12-08 07:08:07,118+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:335) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:340) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:154) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:106) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:84) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151] 2017-12-08 07:08:07,118+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared.
One of the customer environment is having the same issue on live merge as well. IIUC, we have fix in both cold merge and live merge as per https://gerrit.ovirt.org/#/c/85227/. However, would like to confirm because the title only mentions cold merge. Could you please confirm if the fix is there for both cold and live merge?
The fix applies both for cold and live merge.
Passed automation run of tier 1 - Verified ovirt-engine-4.1.9-0.2.el7 vdsm-4.19.44-1.el7ev.x86_64
This bugzilla is included in oVirt 4.1.9 release, published on Jan 24th 2018. Since the problem described in this bug report should be resolved in oVirt 4.1.9 release, published on Jan 24th 2018, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.