Bug 1477042 - engine does not start due to trying to call non-existent callback.
Summary: engine does not start due to trying to call non-existent callback.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Ala Hino
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks: 1481098
TreeView+ depends on / blocked
 
Reported: 2017-08-01 06:26 UTC by Germano Veit Michel
Modified: 2020-09-10 11:05 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1481098 (view as bug list)
Environment:
Last Closed: 2018-05-15 17:43:21 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3133381 0 None None None 2017-08-01 06:42:40 UTC
Red Hat Product Errata RHEA-2018:1488 0 None None None 2018-05-15 17:45:04 UTC
oVirt gerrit 80351 0 master MERGED core: Verify that the VM exists when returning callback 2017-08-08 19:57:47 UTC
oVirt gerrit 80421 0 ovirt-engine-4.1 MERGED core: Verify that the VM exists when returning callback 2017-08-09 07:20:35 UTC

Description Germano Veit Michel 2017-08-01 06:26:36 UTC
Description of problem:

Engine won't start, environment is down.

Apparently a successful snapshot removal, but there were entries in the Database triggering a callback for a command that doesn't exist, so the engine fails to start.

The snapshot:

2017-06-27 17:05:11,157 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand] (default task-48) [231f337f] START, MergeSnapshotsVDSCommand( MergeSnapshotsVDSCommandParameters:{runAsync='true', storagePoolId='7fa2b154-e990-4f7d-bf4d-de30026ccedf', ignoreFailoverLimit='false', storageDomainId='cb2f38db-5560-4762-b700-8e13c55a1e30', imageGroupId='ee7838f0-263e-477b-ac30-157472812b7b', imageId='7abcc25c-a459-4452-be5c-5369284a8d95', imageId2='2fac3f76-bf64-4073-a16a-98ce71958c51', vmId='9a254614-ce1a-40d3-9ca2-3955372653b1', postZero='false'}), log id: 3babecb0

2017-06-27 17:05:12,271 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (default task-48) [231f337f] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'de74fa43-2210-4cb8-811f-5b76b23bf107'

2017-06-27 17:05:12,271 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-48) [231f337f] CommandMultiAsyncTasks::attachTask: Attaching task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' to command 'de74fa43-2210-4cb8-811f-5b76b23bf107'.
2017-06-27 17:05:13,156 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-48) [3a647aaa] CommandMultiAsyncTasks::attachTask: Attaching task 'c8498212-6567-4c66-912e-459e77fb9c56' to command 'de74fa43-2210-4cb8-811f-5b76b23bf107'.

2017-06-27 17:19:53,306 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler4) [3aea9bbc] SPMAsyncTask::PollTask: Polling task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' (Parent Command 'RemoveSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2017-06-27 18:14:24,065 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler8) [18d5ac84] SPMAsyncTask::PollTask: Polling task 'c8498212-6567-4c66-912e-459e77fb9c56' (Parent Command 'RemoveSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.

2017-06-27 18:14:24,069 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler8) [18d5ac84] CommandAsyncTask::endActionIfNecessary: All tasks of command 'de74fa43-2210-4cb8-811f-5b76b23bf107' has ended -> executing 'endAction'

2017-06-27 18:14:24,361 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-41) [537c0fab] Correlation ID: 537c0fab, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'XYZ' deletion for VM 'XYZ' has been completed.

2017-06-27 18:14:24,362 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68'
2017-06-27 18:14:25,390 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] BaseAsyncTask::removeTaskFromDB: Removed task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' from DataBase

2017-06-27 18:14:25,391 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'c8498212-6567-4c66-912e-459e77fb9c56'
2017-06-27 18:14:26,034 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] BaseAsyncTask::removeTaskFromDB: Removed task 'c8498212-6567-4c66-912e-459e77fb9c56' from DataBase

The above looks good to me. But 2 days later:

2017-06-29 19:05:11,479 WARN  [org.ovirt.engine.core.bll.tasks.CommandsRepository] (DefaultQuartzScheduler3) [] Marking expired command as Failed: command 'RemoveSnapshotSingleDisk (dc957381-fd6d-41ed-bf77-ea761a5970a4)' that started at 'Tue Jun 27 17:05:11 JST 2017' has been marked as Failed.
2017-06-29 19:05:12,498 WARN  [org.ovirt.engine.core.bll.tasks.CommandsRepository] (DefaultQuartzScheduler9) [] Marking expired command as Failed: command 'RemoveSnapshotSingleDisk (ae4fe47c-dd40-4a4d-82a9-e7248014fd68)' that started at 'Tue Jun 27 17:05:12 JST 2017' has been marked as Failed.

There is no engine-restart between June 26th and June 29th. The first restart is on July 26th, when the customer wanted to change a engine-config parameter. I don't know why these entries below were still in the DB if snapshot command succeded.

engine=> select command_id,parent_command_id,command_params_class,status,executed from command_entities ;
-[ RECORD 1 ]--------+-----------------------------------------------------------------------
command_id           | dc957381-fd6d-41ed-bf77-ea761a5970a4
parent_command_id    | de74fa43-2210-4cb8-811f-5b76b23bf107
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotSingleDiskParameters
status               | FAILED
executed             | t
-[ RECORD 2 ]--------+-----------------------------------------------------------------------
command_id           | ae4fe47c-dd40-4a4d-82a9-e7248014fd68
parent_command_id    | de74fa43-2210-4cb8-811f-5b76b23bf107
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotSingleDiskParameters
status               | FAILED
executed             | t
-[ RECORD 3 ]--------+-----------------------------------------------------------------------
command_id           | de74fa43-2210-4cb8-811f-5b76b23bf107
parent_command_id    | 00000000-0000-0000-0000-000000000000
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotParameters
status               | ENDED_SUCCESSFULLY
executed             | t

And these entries make the engine fail to start as there is no callback for RemoveSnapshotCommand:

2017-07-28 13:29:38,026 ERROR [org.ovirt.engine.core.bll.InitBackendServicesOnStartupBean] (ServerService Thread Pool -- 50) [537c0fab] Failed to initialize backend: org.jboss.weld.exceptions.WeldException: WELD-000049: Unable to invoke private void org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.init() on org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller@14522b5f
    [....]
Caused by: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand.getCallback(RemoveSnapshotCommand.java:493) [bll.jar:]

Version-Release number of selected component (if applicable):
ovirt-engine-4.0.7

Comment 2 Yaniv Kaul 2017-08-06 09:21:36 UTC
Is this on track for 4.1.5?

Comment 3 Ala Hino 2017-08-06 11:00:25 UTC
(In reply to Yaniv Kaul from comment #2)
> Is this on track for 4.1.5?

I am looking into this now. Will update more once I have additional data.

Comment 4 Yaniv Kaul 2017-08-06 11:43:36 UTC
(In reply to Ala Hino from comment #3)
> (In reply to Yaniv Kaul from comment #2)
> > Is this on track for 4.1.5?
> 
> I am looking into this now. Will update more once I have additional data.

Thanks, keeping NEEDINFO for the time being.

Comment 5 Ala Hino 2017-08-09 11:37:27 UTC
Cancelling the needinfo flag as this bug is MODIFIED now.

Comment 6 Raz Tamir 2017-08-10 09:06:05 UTC
Ala,
What are the steps to reproduce this?

Comment 7 Ala Hino 2017-08-10 09:09:40 UTC
This one is a bit tricky to reproduce but maybe you can help.
In order to reproduce, we need a situation where live merge step is ACTIVE and we were able to remove the VM. Afterwards, restart the engine. Without the fix, we would get a NPE, with the fix the engine should start.

Comment 8 rhev-integ 2017-08-10 17:53:05 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops

Comment 13 Kevin Alon Goldblatt 2017-11-28 15:50:00 UTC
Based on comment 7, since we can't manipulate a vm e.g. delete it, during a live merge operation we have no way to reproduce this issue. Moving to VERIFIED

Comment 16 errata-xmlrpc 2018-05-15 17:43:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 17 Franta Kust 2019-05-16 13:08:20 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.