Bug 1648345 - Jobs are not properly cleaned after a failed task.
Summary: Jobs are not properly cleaned after a failed task.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.0
: 4.3.0
Assignee: Benny Zlotnik
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
Depends On:
Blocks: 1728212
TreeView+ depends on / blocked
 
Reported: 2018-11-09 13:30 UTC by Roman Hodain
Modified: 2021-12-10 18:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1728212 (view as bug list)
Environment:
Last Closed: 2020-08-04 13:16:18 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)
Engine logs (1.28 MB, application/x-gzip)
2018-11-09 13:30 UTC, Roman Hodain
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-36358 0 None None None 2021-12-10 18:24:56 UTC
Red Hat Product Errata RHSA-2020:3247 0 None None None 2020-08-04 13:16:43 UTC

Description Roman Hodain 2018-11-09 13:30:24 UTC
Created attachment 1503647 [details]
Engine logs

Description of problem:
The job is not removed if there is an exception in the endAction of the RemoveSnapshot.

Version-Release number of selected component (if applicable):
RHV 4.2.6

How reproducible:
Under some circumstances

Steps to Reproduce:
Hard to say. It requires a storage-related issue during the end action.

Actual results:
Jot is in state STARTED

Expected results:
The JOB is removed, when the issue is sorted out.

Additional information:
jobID = 14d992a6-1c64-40fe-b0d8-21f399c826bb
CorrelationID = d466a458-60a5-4038-9c70-8a82746a29cb

2018-10-12 23:07:58,883+02 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand] (EE-ManagedThreadFactory-engine-Thread-287984) [d466a458-60a5-4038-9c70-8a82746a29cb] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand' successfully.
...
org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-287984) [d466a458-60a5-4038-9c70-8a82746a29cb] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM zahqinfrhevh02 command GetVolumeInfoVDS failed: Resource timeout: ()
2018-10-12 23:09:58,933+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engine-Thread-287984) [d466a458-60a5-4038-9c70-8a82746a29cb] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand' return value '
VolumeInfoReturn:{status='Status [code=851, message=Resource timeout: ()]'}
'
...

2018-10-12 23:14:00,869+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-287984) [] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Failed to delete snapshot '_GX_BACKUP_zahqdevlecm01_26560_6988_zahqinfvsa01' for VM 'zahqdevlecm01'.
2018-10-12 23:14:00,869+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-287984) [] [within thread]: endAction for action type RemoveSnapshot threw an exception.: javax.ejb.EJBTransactionRolledbackException
    at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleInCallerTx(CMTTxInterceptor.java:160) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1]
    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:257) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1]
    at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:381) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1]
    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:244) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1]
    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
...
Caused by: java.lang.NullPointerException
    at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.handleForwardMerge(RemoveSnapshotSingleDiskCommandBase.java:198) [bll.jar:]
    at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.lambda$syncDbRecords$0(RemoveSnapshotSingleDiskCommandBase.java:161) [bll.jar:]
    at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [utils.jar:]
    at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.syncDbRecords(RemoveSnapshotSingleDiskCommandBase.java:151) [bll.jar:]
    at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand.endSuccessfully(RemoveSnapshotSingleDiskCommand.java:115) [bll.jar:]
    at org.ovirt.engine.core.bll.CommandBase.internalEndSuccessfully(CommandBase.java:675) [bll.jar:]
...

Comment 2 Sandro Bonazzola 2019-01-28 09:42:15 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 4 Benny Zlotnik 2019-02-26 14:03:21 UTC
IIUC Ravi's patch[1] should handle these cases. Ravi, can you keep me honest?


[1] - https://gerrit.ovirt.org/#/c/95873/

Comment 5 Ravi Nori 2019-02-26 14:19:01 UTC
(In reply to Benny Zlotnik from comment #4)
> IIUC Ravi's patch[1] should handle these cases. Ravi, can you keep me honest?
> 
> 
> [1] - https://gerrit.ovirt.org/#/c/95873/

A lot of these issues should be fixed in 4.3 as part of BZ 1633777. I have not verified the specific flow described in comment #1, but I have verified as part of BZ 163377 that exceptions thrown in endAction of AddDisk and AddVmTemplate commands were handled properly.

Comment 6 Benny Zlotnik 2019-06-19 09:09:17 UTC
I tested throwing exceptions in live/cold merge endActions and it seems to be working correctly, moving to modified

Comment 7 RHV bug bot 2019-06-27 11:39:49 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops

Comment 8 RHV bug bot 2019-06-27 11:48:38 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops

Comment 9 Avihai 2019-07-02 06:41:43 UTC
From the initial description, it is hard to understand what is the scenario here.
Please provide a clear scenario so we can verify this issue.

Comment 10 Benny Zlotnik 2019-07-02 08:21:49 UTC
(In reply to Avihai from comment #9)
> From the initial description, it is hard to understand what is the scenario
> here.
> Please provide a clear scenario so we can verify this issue.

You could try to block the connection to vdsm when once it starts executing the endAction, indicated by this log message:
"Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand' successfully"

Comment 11 RHV bug bot 2019-07-04 13:10:57 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops

Comment 13 Daniel Gur 2019-08-28 13:12:53 UTC
sync2jira

Comment 14 Daniel Gur 2019-08-28 13:17:06 UTC
sync2jira

Comment 16 Yosi Ben Shimon 2019-11-13 15:38:01 UTC
Tested on:
ovirt-engine-4.4.0-0.4.master.el7.noarch
vdsm-4.40.0-127.gitc628cce.el8ev.x86_64

using the same steps as in https://bugzilla.redhat.com/show_bug.cgi?id=1728212#c17

Actual result:
The same as the cloned z-stream bug#1728212

- The snapshot failed to be deleted
- No "running" jobs in the database - only FINISHED or FAILED (RemoveSnapshot)


Moving to VERIFIED

Comment 18 RHV bug bot 2019-12-13 13:16:02 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 19 RHV bug bot 2019-12-20 17:45:33 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 20 RHV bug bot 2020-01-08 14:47:49 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 21 RHV bug bot 2020-01-08 15:17:26 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 22 RHV bug bot 2020-01-24 19:49:32 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 26 errata-xmlrpc 2020-08-04 13:16:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3247


Note You need to log in before you can comment on or make changes to this bug.