Bug 1532567

Summary: Cold merge failed to remove all volumes
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: ovirt-engineAssignee: Ala Hino <ahino>
Status: CLOSED ERRATA QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.1.8CC: ahino, bugs, lsurette, nashok, ratamir, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi
Target Milestone: ovirt-4.2.1Keywords: Automation, Regression, ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1509629
: 1533382 (view as bug list) Environment:
Last Closed: 2018-05-15 17:46:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1509629    
Bug Blocks: 1518562, 1533382    

Description nijin ashok 2018-01-09 11:13:21 UTC
+++ This bug was initially created as a clone of Bug #1509629 +++

Description of problem:

In our automation (tier 1) we see that a clod merge operation is failing to remove all volumes.

2017-11-02 19:40:00,671+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2017-11-02 19:40:00,672+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerExceptio
n
        at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_144]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_144]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
        at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)

2017-11-02 19:40:00,672+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-6690) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable R
untimeException the task will be cleared.

As a result, not all volumes get removed


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20171101152059.git8b85834.el7.centos.noarch


How reproducible:
Fails on all storage types

Steps to Reproduce:
1. Create VM with OS disk + 4 disks
2. Create files on the VM's disks and create snapshot
3. cold merge the snapshot

Actual results:
Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks

Expected results:
Cold merge succeeds - before cold merge: 20 volumes, after cold merge: 15 volumes, snapshot contains 5 disks

Additional info:
* No exception in vdsm side
* Regression because this case passed in previous commits

--- Additional comment from Ala Hino on 2017-11-05 08:09:09 EST ---

I was able to reproduce the issue; actually, I see the same exception in the engine log, however, the snapshot is removed.

Can you please confirm whether the snapshot removed in your env?

--- Additional comment from Ala Hino on 2017-11-05 08:14:23 EST ---

Based on the following message:

Snapshot 'snapshot_18894_fcp_1' deletion for VM 'vm_TestCase18894_0219280637' has been completed

Looks that we are hitting the same issue - the snapshot is removed but there is the NPE. I will look into this.

--- Additional comment from Raz Tamir on 2017-11-05 08:15:26 EST ---

Yes, the snapshot removed

--- Additional comment from Ala Hino on 2017-11-05 08:19:43 EST ---

(In reply to Raz Tamir from comment #3)
> Yes, the snapshot removed

In the description you mentioned "we see that a clod merge operation is failing to remove all volumes", what do you mean by that? Do the volumes still exist after cold merge?

--- Additional comment from Raz Tamir on 2017-11-05 08:24:55 EST ---

Yes,

This is the output of our tests:
Cold merge failed - before cold merge: 20 volumes, after cold merge: 18 volumes, snapshot contains 5 disks

It checks lvs before the remove snap and after

--- Additional comment from Raz Tamir on 2017-11-23 05:13:06 EST ---

I still see the same issue in engine.log:
2017-11-23 12:02:37,743+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2017-11-23 12:02:37,744+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
        at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)

2017-11-23 12:02:37,746+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-28260) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared.
2017-11-23 12:02:37,748+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] BaseAsyncTask::onTaskEndSuccess: Task 'f06cf00c-76d8-4329-8a8a-fcdec3008c56' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2017-11-23 12:02:37,749+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [] CommandAsyncTask::endActionIfNecessary: All tasks of command 'fac1cac6-8f5d-49ba-af34-3b3b8d13e151' has ended -> executing 'endAction'

--- Additional comment from Raz Tamir on 2017-11-23 05:13 EST ---



--- Additional comment from Ala Hino on 2017-11-23 06:24:34 EST ---

What's the engine version?

--- Additional comment from Raz Tamir on 2017-11-23 06:27:06 EST ---

ovirt-engine-4.2.0-0.0.master.20171121184703.git173fe83.el7.centos

--- Additional comment from Ala Hino on 2017-11-27 03:23:00 EST ---

Raz,

Can you please set log level to DEBUG, try again and send the logs?
If possible to give access to the env after the test fails, that would be great.

--- Additional comment from Raz Tamir on 2017-11-28 06:55:55 EST ---

(In reply to Ala Hino from comment #10)
> Raz,
> 
> Can you please set log level to DEBUG, try again and send the logs?
The provided logs are with DEBUG level
> If possible to give access to the env after the test fails, that would be
> great.

--- Additional comment from Ala Hino on 2017-11-28 07:01:50 EST ---

(In reply to Raz Tamir from comment #11)
> (In reply to Ala Hino from comment #10)
> > Raz,
> > 
> > Can you please set log level to DEBUG, try again and send the logs?
> The provided logs are with DEBUG level
> > If possible to give access to the env after the test fails, that would be
> > great.

The hosts are with DEBUG logs but not the engine.
This is an engine side bug.

--- Additional comment from Yaniv Kaul on 2017-11-28 10:03:50 EST ---

Ala - Engine log in DEBUG is available @ http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log

Please analyze.

(CI test failure)

--- Additional comment from Ala Hino on 2017-11-29 06:11:15 EST ---

(In reply to Yaniv Kaul from comment #13)
> Ala - Engine log in DEBUG is available @
> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-
> x86_64/2493/artifact/exported-artifacts/basic-suite-master__logs/test_logs/
> basic-suite-master/post-007_sd_reattach.py/lago-basic-suite-master-engine/
> _var_log/ovirt-engine/engine.log
> 
> Please analyze.
> 
> (CI test failure)

Same error as found by automation.

--- Additional comment from Raz Tamir on 2017-12-07 15:17:05 EST ---

Passed automation run on rhvm-4.2.0-0.6.el7

Verified

--- Additional comment from Ala Hino on 2017-12-10 05:15:16 EST ---

This bug was verified on 4.2 but the issue encountered on 4.1.8 too.
Re-targeting this bug to 4.1.9.

--- Additional comment from Raz Tamir on 2017-12-10 05:27:46 EST ---

Relevant log

engine.log:

2017-12-08 07:08:07,117+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2017-12-08 07:08:07,118+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:335) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:340) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:154) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:106) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:84) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]

2017-12-08 07:08:07,118+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-48) [1ec95d85] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared.

--- Additional comment from nijin ashok on 2018-01-05 11:35:20 EST ---


One of the customer environment is having the same issue on live merge as well. IIUC, we have fix in both cold merge and live merge as per https://gerrit.ovirt.org/#/c/85227/. However, would like to confirm because the title only mentions cold merge.

Could you please confirm if the fix is there for both cold and live merge?

--- Additional comment from Ala Hino on 2018-01-08 03:07:43 EST ---

The fix applies both for cold and live merge.

Comment 1 Ala Hino 2018-01-09 12:04:09 UTC
This clone was probably done for errata purpose.
QE - please decide how you'd like to verify this one.
Moving to modified.

Comment 6 Raz Tamir 2018-01-14 09:39:23 UTC
Passed automation run tier 1 - Verified
ovirt-engine-4.2.1.1-0.1.el7
vdsm-4.20.13-1.el7ev.x86_64

Comment 10 errata-xmlrpc 2018-05-15 17:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 11 Franta Kust 2019-05-16 13:08:58 UTC
BZ<2>Jira Resync

Comment 12 Daniel Gur 2019-08-28 13:14:53 UTC
sync2jira

Comment 13 Daniel Gur 2019-08-28 13:19:56 UTC
sync2jira