Created attachment 1503647 [details] Engine logs Description of problem: The job is not removed if there is an exception in the endAction of the RemoveSnapshot. Version-Release number of selected component (if applicable): RHV 4.2.6 How reproducible: Under some circumstances Steps to Reproduce: Hard to say. It requires a storage-related issue during the end action. Actual results: Jot is in state STARTED Expected results: The JOB is removed, when the issue is sorted out. Additional information: jobID = 14d992a6-1c64-40fe-b0d8-21f399c826bb CorrelationID = d466a458-60a5-4038-9c70-8a82746a29cb 2018-10-12 23:07:58,883+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand] (EE-ManagedThreadFactory-engine-Thread-287984) [d466a458-60a5-4038-9c70-8a82746a29cb] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand' successfully. ... org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-287984) [d466a458-60a5-4038-9c70-8a82746a29cb] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM zahqinfrhevh02 command GetVolumeInfoVDS failed: Resource timeout: () 2018-10-12 23:09:58,933+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engine-Thread-287984) [d466a458-60a5-4038-9c70-8a82746a29cb] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand' return value ' VolumeInfoReturn:{status='Status [code=851, message=Resource timeout: ()]'} ' ... 2018-10-12 23:14:00,869+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-287984) [] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Failed to delete snapshot '_GX_BACKUP_zahqdevlecm01_26560_6988_zahqinfvsa01' for VM 'zahqdevlecm01'. 2018-10-12 23:14:00,869+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-287984) [] [within thread]: endAction for action type RemoveSnapshot threw an exception.: javax.ejb.EJBTransactionRolledbackException at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleInCallerTx(CMTTxInterceptor.java:160) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:257) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:381) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:244) [wildfly-ejb3-7.1.4.GA-redhat-1.jar:7.1.4.GA-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) ... Caused by: java.lang.NullPointerException at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.handleForwardMerge(RemoveSnapshotSingleDiskCommandBase.java:198) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.lambda$syncDbRecords$0(RemoveSnapshotSingleDiskCommandBase.java:161) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [utils.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.syncDbRecords(RemoveSnapshotSingleDiskCommandBase.java:151) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand.endSuccessfully(RemoveSnapshotSingleDiskCommand.java:115) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalEndSuccessfully(CommandBase.java:675) [bll.jar:] ...
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
IIUC Ravi's patch[1] should handle these cases. Ravi, can you keep me honest? [1] - https://gerrit.ovirt.org/#/c/95873/
(In reply to Benny Zlotnik from comment #4) > IIUC Ravi's patch[1] should handle these cases. Ravi, can you keep me honest? > > > [1] - https://gerrit.ovirt.org/#/c/95873/ A lot of these issues should be fixed in 4.3 as part of BZ 1633777. I have not verified the specific flow described in comment #1, but I have verified as part of BZ 163377 that exceptions thrown in endAction of AddDisk and AddVmTemplate commands were handled properly.
I tested throwing exceptions in live/cold merge endActions and it seems to be working correctly, moving to modified
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops
From the initial description, it is hard to understand what is the scenario here. Please provide a clear scenario so we can verify this issue.
(In reply to Avihai from comment #9) > From the initial description, it is hard to understand what is the scenario > here. > Please provide a clear scenario so we can verify this issue. You could try to block the connection to vdsm when once it starts executing the endAction, indicated by this log message: "Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommand' successfully"
sync2jira
Tested on: ovirt-engine-4.4.0-0.4.master.el7.noarch vdsm-4.40.0-127.gitc628cce.el8ev.x86_64 using the same steps as in https://bugzilla.redhat.com/show_bug.cgi?id=1728212#c17 Actual result: The same as the cloned z-stream bug#1728212 - The snapshot failed to be deleted - No "running" jobs in the database - only FINISHED or FAILED (RemoveSnapshot) Moving to VERIFIED
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3247