+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1532567 +++ ====================================================================== +++ This bug was initially created as a clone of Bug #1509629 +++ Description of problem: In our automation (tier 1) we see that a clod merge operation is failing to remove all volumes. 2017-11-02 19:40:00,671+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage', 2017-11-02 19:40:00,672+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerExceptio n at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_144] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_144] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_144] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78) 2017-11-02 19:40:00,672+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable R untimeException the task will be cleared. As a result, not all volumes get removed Version-Release number of selected component (if applicable): ovirt-engine-4.2.0-0.0.master.20171101152059.git8b85834.el7.centos.noarch How reproducible: Fails on all storage types Steps to Reproduce: 1. Create VM with OS disk + 4 disks 2. Create files on the VM's disks and create snapshot 3. cold merge the snapshot Actual results: Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks Expected results: Cold merge succeeds - before cold merge: 20 volumes, after cold merge: 15 volumes, snapshot contains 5 disks Additional info: * No exception in vdsm side * Regression because this case passed in previous commits --- Additional comment from Ala Hino on 2017-11-05 08:09:09 EST --- I was able to reproduce the issue; actually, I see the same exception in the engine log, however, the snapshot is removed. Can you please confirm whether the snapshot removed in your env? --- Additional comment from Ala Hino on 2017-11-05 08:14:23 EST --- Based on the following message: Snapshot 'snapshot_18894_fcp_1' deletion for VM 'vm_TestCase18894_0219280637' has been completed Looks that we are hitting the same issue - the snapshot is removed but there is the NPE. I will look into this. --- Additional comment from Raz Tamir on 2017-11-05 08:15:26 EST --- Yes, the snapshot removed --- Additional comment from Ala Hino on 2017-11-05 08:19:43 EST --- (In reply to Raz Tamir from comment #3) > Yes, the snapshot removed In the description you mentioned "we see that a clod merge operation is failing to remove all volumes", what do you mean by that? Do the volumes still exist after cold merge? --- Additional comment from Raz Tamir on 2017-11-05 08:24:55 EST --- Yes, This is the output of our tests: Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks It checks lvs before the remove snap and after --- Additional comment from Raz Tamir on 2017-11-23 05:13:06 EST --- I still see the same issue in engine.log: 2017-11-23 12:02:37,743+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage', 2017-11-23 12:02:37,744+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78) 2017-11-23 12:02:37,746+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared. 2017-11-23 12:02:37,748+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] BaseAsyncTask::onTaskEndSuccess: Task 'f06cf00c-76d8-4329-8a8a-fcdec3008c56' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully. 2017-11-23 12:02:37,749+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] CommandAsyncTask::endActionIfNecessary: All tasks of command 'fac1cac6-8f5d-49ba-af34-3b3b8d13e151' has ended -> executing 'endAction' --- Additional comment from Raz Tamir on 2017-11-23 05:13 EST --- --- Additional comment from Ala Hino on 2017-11-23 06:24:34 EST --- What's the engine version? --- Additional comment from Raz Tamir on 2017-11-23 06:27:06 EST --- ovirt-engine-4.2.0-0.0.master.20171121184703.git173fe83.el7.centos --- Additional comment from Ala Hino on 2017-11-27 03:23:00 EST --- Raz, Can you please set log level to DEBUG, try again and send the logs? If possible to give access to the env after the test fails, that would be great. --- Additional comment from Raz Tamir on 2017-11-28 06:55:55 EST --- (In reply to Ala Hino from comment #10) > Raz, > > Can you please set log level to DEBUG, try again and send the logs? The provided logs are with DEBUG level > If possible to give access to the env after the test fails, that would be > great. --- Additional comment from Ala Hino on 2017-11-28 07:01:50 EST --- (In reply to Raz Tamir from comment #11) > (In reply to Ala Hino from comment #10) > > Raz, > > > > Can you please set log level to DEBUG, try again and send the logs? > The provided logs are with DEBUG level > > If possible to give access to the env after the test fails, that would be > > great. The hosts are with DEBUG logs but not the engine. This is an engine side bug. --- Additional comment from Yaniv Kaul on 2017-11-28 10:03:50 EST --- Ala - Engine log in DEBUG is available @ http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log Please analyze. (CI test failure) --- Additional comment from Ala Hino on 2017-11-29 06:11:15 EST --- (In reply to Yaniv Kaul from comment #13) > Ala - Engine log in DEBUG is available @ > http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7- > x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/ > basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/ > _var_log/ovirt-engine/engine.log > > Please analyze. > > (CI test failure) Same error as found by automation. --- Additional comment from Raz Tamir on 2017-12-07 15:17:05 EST --- Passed automation run on rhvm-4.2.0-0.6.el7 Verified --- Additional comment from Ala Hino on 2017-12-10 05:15:16 EST --- This bug was verified on 4.2 but the issue encountered on 4.1.8 too. Re-targeting this bug to 4.1.9. --- Additional comment from Raz Tamir on 2017-12-10 05:27:46 EST --- Relevant log engine.log: 2017-12-08 07:08:07,117+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage', 2017-12-08 07:08:07,118+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:335) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:340) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:154) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:106) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:84) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151] 2017-12-08 07:08:07,118+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared. --- Additional comment from nijin ashok on 2018-01-05 11:35:20 EST --- One of the customer environment is having the same issue on live merge as well. IIUC, we have fix in both cold merge and live merge as per https://gerrit.ovirt.org/#/c/85227/. However, would like to confirm because the title only mentions cold merge. Could you please confirm if the fix is there for both cold and live merge? --- Additional comment from Ala Hino on 2018-01-08 03:07:43 EST --- The fix applies both for cold and live merge. (Originally by Nijin Ashok)
This clone was probably done for errata purpose. QE - please decide how you'd like to verify this one. Moving to modified. (Originally by Ala Hino)
Ala, is there any dev action here? If this is just a bug for the sake of ET, shouldn't it already be on MODIFIED?
(In reply to Allon Mureinik from comment #6) > Ala, is there any dev action here? > If this is just a bug for the sake of ET, shouldn't it already be on > MODIFIED? MODIFIED now.
Passed automation run (tier 1) over rhv-4.1.9-2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0135
BZ<2>Jira re-sync
sync2jira