+++ This bug was initially created as a clone of Bug #1509629 +++
Description of problem:
In our automation (tier 1) we see that a clod merge operation is failing to remove all volumes.
2017-11-02 19:40:00,671+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2017-11-02 19:40:00,672+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerExceptio
n
at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:]
at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_144]
at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
2017-11-02 19:40:00,672+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable R
untimeException the task will be cleared.
As a result, not all volumes get removed
Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20171101152059.git8b85834.el7.centos.noarch
How reproducible:
Fails on all storage types
Steps to Reproduce:
1. Create VM with OS disk + 4 disks
2. Create files on the VM's disks and create snapshot
3. cold merge the snapshot
Actual results:
Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks
Expected results:
Cold merge succeeds - before cold merge: 20 volumes, after cold merge: 15 volumes, snapshot contains 5 disks
Additional info:
* No exception in vdsm side
* Regression because this case passed in previous commits
--- Additional comment from Ala Hino on 2017-11-05 08:09:09 EST ---
I was able to reproduce the issue; actually, I see the same exception in the engine log, however, the snapshot is removed.
Can you please confirm whether the snapshot removed in your env?
--- Additional comment from Ala Hino on 2017-11-05 08:14:23 EST ---
Based on the following message:
Snapshot 'snapshot_18894_fcp_1' deletion for VM 'vm_TestCase18894_0219280637' has been completed
Looks that we are hitting the same issue - the snapshot is removed but there is the NPE. I will look into this.
--- Additional comment from Raz Tamir on 2017-11-05 08:15:26 EST ---
Yes, the snapshot removed
--- Additional comment from Ala Hino on 2017-11-05 08:19:43 EST ---
(In reply to Raz Tamir from comment #3)
> Yes, the snapshot removed
In the description you mentioned "we see that a clod merge operation is failing to remove all volumes", what do you mean by that? Do the volumes still exist after cold merge?
--- Additional comment from Raz Tamir on 2017-11-05 08:24:55 EST ---
Yes,
This is the output of our tests:
Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks
It checks lvs before the remove snap and after
--- Additional comment from Raz Tamir on 2017-11-23 05:13:06 EST ---
I still see the same issue in engine.log:
2017-11-23 12:02:37,743+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2017-11-23 12:02:37,744+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:]
at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
2017-11-23 12:02:37,746+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared.
2017-11-23 12:02:37,748+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] BaseAsyncTask::onTaskEndSuccess: Task 'f06cf00c-76d8-4329-8a8a-fcdec3008c56' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2017-11-23 12:02:37,749+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] CommandAsyncTask::endActionIfNecessary: All tasks of command 'fac1cac6-8f5d-49ba-af34-3b3b8d13e151' has ended -> executing 'endAction'
--- Additional comment from Raz Tamir on 2017-11-23 05:13 EST ---
--- Additional comment from Ala Hino on 2017-11-23 06:24:34 EST ---
What's the engine version?
--- Additional comment from Raz Tamir on 2017-11-23 06:27:06 EST ---
ovirt-engine-4.2.0-0.0.master.20171121184703.git173fe83.el7.centos
--- Additional comment from Ala Hino on 2017-11-27 03:23:00 EST ---
Raz,
Can you please set log level to DEBUG, try again and send the logs?
If possible to give access to the env after the test fails, that would be great.
--- Additional comment from Raz Tamir on 2017-11-28 06:55:55 EST ---
(In reply to Ala Hino from comment #10)
> Raz,
>
> Can you please set log level to DEBUG, try again and send the logs?
The provided logs are with DEBUG level
> If possible to give access to the env after the test fails, that would be
> great.
--- Additional comment from Ala Hino on 2017-11-28 07:01:50 EST ---
(In reply to Raz Tamir from comment #11)
> (In reply to Ala Hino from comment #10)
> > Raz,
> >
> > Can you please set log level to DEBUG, try again and send the logs?
> The provided logs are with DEBUG level
> > If possible to give access to the env after the test fails, that would be
> > great.
The hosts are with DEBUG logs but not the engine.
This is an engine side bug.
--- Additional comment from Yaniv Kaul on 2017-11-28 10:03:50 EST ---
Ala - Engine log in DEBUG is available @ http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
Please analyze.
(CI test failure)
--- Additional comment from Ala Hino on 2017-11-29 06:11:15 EST ---
(In reply to Yaniv Kaul from comment #13)
> Ala - Engine log in DEBUG is available @
> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-
> x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/
> basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/
> _var_log/ovirt-engine/engine.log
>
> Please analyze.
>
> (CI test failure)
Same error as found by automation.
--- Additional comment from Raz Tamir on 2017-12-07 15:17:05 EST ---
Passed automation run on rhvm-4.2.0-0.6.el7
Verified
--- Additional comment from Ala Hino on 2017-12-10 05:15:16 EST ---
This bug was verified on 4.2 but the issue encountered on 4.1.8 too.
Re-targeting this bug to 4.1.9.
--- Additional comment from Raz Tamir on 2017-12-10 05:27:46 EST ---
Relevant log
engine.log:
2017-12-08 07:08:07,117+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2017-12-08 07:08:07,118+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:335) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:340) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:154) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:106) [bll.jar:]
at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:84) [utils.jar:]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
2017-12-08 07:08:07,118+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared.
--- Additional comment from nijin ashok on 2018-01-05 11:35:20 EST ---
One of the customer environment is having the same issue on live merge as well. IIUC, we have fix in both cold merge and live merge as per https://gerrit.ovirt.org/#/c/85227/. However, would like to confirm because the title only mentions cold merge.
Could you please confirm if the fix is there for both cold and live merge?
--- Additional comment from Ala Hino on 2018-01-08 03:07:43 EST ---
The fix applies both for cold and live merge.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2018:1488