Bug 1481098 - [downstream clone - 4.1.5] engine does not start due to trying to call non-existent callback.
Summary: [downstream clone - 4.1.5] engine does not start due to trying to call non-ex...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.1.5
: ---
Assignee: Ala Hino
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 1477042
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-14 05:34 UTC by rhev-integ
Modified: 2020-09-10 11:13 UTC (History)
13 users (show)

Fixed In Version: ovirt-engine-4.1.5.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1477042
Environment:
Last Closed: 2017-08-16 09:58:59 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3133381 0 None None None 2017-08-14 05:35:38 UTC
Red Hat Product Errata RHEA-2017:2509 0 normal SHIPPED_LIVE Red Hat Virtualization Manager (ovirt-engine) 4.1.5 2017-08-22 21:38:47 UTC
oVirt gerrit 80351 0 master MERGED core: Verify that the VM exists when returning callback 2017-08-14 05:35:38 UTC
oVirt gerrit 80421 0 ovirt-engine-4.1 MERGED core: Verify that the VM exists when returning callback 2017-08-14 05:35:38 UTC

Description rhev-integ 2017-08-14 05:34:41 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1477042 +++
======================================================================

Description of problem:

Engine won't start, environment is down.

Apparently a successful snapshot removal, but there were entries in the Database triggering a callback for a command that doesn't exist, so the engine fails to start.

The snapshot:

2017-06-27 17:05:11,157 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand] (default task-48) [231f337f] START, MergeSnapshotsVDSCommand( MergeSnapshotsVDSCommandParameters:{runAsync='true', storagePoolId='7fa2b154-e990-4f7d-bf4d-de30026ccedf', ignoreFailoverLimit='false', storageDomainId='cb2f38db-5560-4762-b700-8e13c55a1e30', imageGroupId='ee7838f0-263e-477b-ac30-157472812b7b', imageId='7abcc25c-a459-4452-be5c-5369284a8d95', imageId2='2fac3f76-bf64-4073-a16a-98ce71958c51', vmId='9a254614-ce1a-40d3-9ca2-3955372653b1', postZero='false'}), log id: 3babecb0

2017-06-27 17:05:12,271 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (default task-48) [231f337f] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'de74fa43-2210-4cb8-811f-5b76b23bf107'

2017-06-27 17:05:12,271 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-48) [231f337f] CommandMultiAsyncTasks::attachTask: Attaching task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' to command 'de74fa43-2210-4cb8-811f-5b76b23bf107'.
2017-06-27 17:05:13,156 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-48) [3a647aaa] CommandMultiAsyncTasks::attachTask: Attaching task 'c8498212-6567-4c66-912e-459e77fb9c56' to command 'de74fa43-2210-4cb8-811f-5b76b23bf107'.

2017-06-27 17:19:53,306 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler4) [3aea9bbc] SPMAsyncTask::PollTask: Polling task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' (Parent Command 'RemoveSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2017-06-27 18:14:24,065 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler8) [18d5ac84] SPMAsyncTask::PollTask: Polling task 'c8498212-6567-4c66-912e-459e77fb9c56' (Parent Command 'RemoveSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.

2017-06-27 18:14:24,069 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler8) [18d5ac84] CommandAsyncTask::endActionIfNecessary: All tasks of command 'de74fa43-2210-4cb8-811f-5b76b23bf107' has ended -> executing 'endAction'

2017-06-27 18:14:24,361 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-41) [537c0fab] Correlation ID: 537c0fab, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'XYZ' deletion for VM 'XYZ' has been completed.

2017-06-27 18:14:24,362 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68'
2017-06-27 18:14:25,390 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] BaseAsyncTask::removeTaskFromDB: Removed task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' from DataBase

2017-06-27 18:14:25,391 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'c8498212-6567-4c66-912e-459e77fb9c56'
2017-06-27 18:14:26,034 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] BaseAsyncTask::removeTaskFromDB: Removed task 'c8498212-6567-4c66-912e-459e77fb9c56' from DataBase

The above looks good to me. But 2 days later:

2017-06-29 19:05:11,479 WARN  [org.ovirt.engine.core.bll.tasks.CommandsRepository] (DefaultQuartzScheduler3) [] Marking expired command as Failed: command 'RemoveSnapshotSingleDisk (dc957381-fd6d-41ed-bf77-ea761a5970a4)' that started at 'Tue Jun 27 17:05:11 JST 2017' has been marked as Failed.
2017-06-29 19:05:12,498 WARN  [org.ovirt.engine.core.bll.tasks.CommandsRepository] (DefaultQuartzScheduler9) [] Marking expired command as Failed: command 'RemoveSnapshotSingleDisk (ae4fe47c-dd40-4a4d-82a9-e7248014fd68)' that started at 'Tue Jun 27 17:05:12 JST 2017' has been marked as Failed.

There is no engine-restart between June 26th and June 29th. The first restart is on July 26th, when the customer wanted to change a engine-config parameter. I don't know why these entries below were still in the DB if snapshot command succeded.

engine=> select command_id,parent_command_id,command_params_class,status,executed from command_entities ;
-[ RECORD 1 ]--------+-----------------------------------------------------------------------
command_id           | dc957381-fd6d-41ed-bf77-ea761a5970a4
parent_command_id    | de74fa43-2210-4cb8-811f-5b76b23bf107
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotSingleDiskParameters
status               | FAILED
executed             | t
-[ RECORD 2 ]--------+-----------------------------------------------------------------------
command_id           | ae4fe47c-dd40-4a4d-82a9-e7248014fd68
parent_command_id    | de74fa43-2210-4cb8-811f-5b76b23bf107
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotSingleDiskParameters
status               | FAILED
executed             | t
-[ RECORD 3 ]--------+-----------------------------------------------------------------------
command_id           | de74fa43-2210-4cb8-811f-5b76b23bf107
parent_command_id    | 00000000-0000-0000-0000-000000000000
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotParameters
status               | ENDED_SUCCESSFULLY
executed             | t

And these entries make the engine fail to start as there is no callback for RemoveSnapshotCommand:

2017-07-28 13:29:38,026 ERROR [org.ovirt.engine.core.bll.InitBackendServicesOnStartupBean] (ServerService Thread Pool -- 50) [537c0fab] Failed to initialize backend: org.jboss.weld.exceptions.WeldException: WELD-000049: Unable to invoke private void org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.init() on org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller@14522b5f
    [....]
Caused by: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand.getCallback(RemoveSnapshotCommand.java:493) [bll.jar:]

Version-Release number of selected component (if applicable):
ovirt-engine-4.0.7

(Originally by Germano Veit Michel)

Comment 3 rhev-integ 2017-08-14 05:34:54 UTC
Is this on track for 4.1.5?

(Originally by Yaniv Kaul)

Comment 4 rhev-integ 2017-08-14 05:34:59 UTC
(In reply to Yaniv Kaul from comment #2)
> Is this on track for 4.1.5?

I am looking into this now. Will update more once I have additional data.

(Originally by Ala Hino)

Comment 5 rhev-integ 2017-08-14 05:35:04 UTC
(In reply to Ala Hino from comment #3)
> (In reply to Yaniv Kaul from comment #2)
> > Is this on track for 4.1.5?
> 
> I am looking into this now. Will update more once I have additional data.

Thanks, keeping NEEDINFO for the time being.

(Originally by Yaniv Kaul)

Comment 6 rhev-integ 2017-08-14 05:35:09 UTC
Cancelling the needinfo flag as this bug is MODIFIED now.

(Originally by Ala Hino)

Comment 7 rhev-integ 2017-08-14 05:35:14 UTC
Ala,
What are the steps to reproduce this?

(Originally by Raz Tamir)

Comment 8 rhev-integ 2017-08-14 05:35:19 UTC
This one is a bit tricky to reproduce but maybe you can help.
In order to reproduce, we need a situation where live merge step is ACTIVE and we were able to remove the VM. Afterwards, restart the engine. Without the fix, we would get a NPE, with the fix the engine should start.

(Originally by Ala Hino)

Comment 9 rhev-integ 2017-08-14 05:35:24 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops

(Originally by rhev-integ)

Comment 11 Kevin Alon Goldblatt 2017-08-16 09:58:59 UTC
This scenario cannot be reproduced and based on Comment 8 closing current release

Comment 14 Ala Hino 2017-08-22 09:15:35 UTC
Cancelling the needinfo flag as this bug is CLOSED now.


Note You need to log in before you can comment on or make changes to this bug.