Bug 1477042 - engine does not start due to trying to call non-existent callback.
engine does not start due to trying to call non-existent callback.
Status: MODIFIED
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
4.0.7
x86_64 Linux
unspecified Severity high
: ovirt-4.2.0
: ---
Assigned To: Ala Hino
Kevin Alon Goldblatt
: ZStream
Depends On:
Blocks: 1481098
  Show dependency treegraph
 
Reported: 2017-08-01 02:26 EDT by Germano Veit Michel
Modified: 2017-09-17 05:25 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1481098 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3133381 None None None 2017-08-01 02:42 EDT
oVirt gerrit 80351 master MERGED core: Verify that the VM exists when returning callback 2017-08-08 15:57 EDT
oVirt gerrit 80421 ovirt-engine-4.1 MERGED core: Verify that the VM exists when returning callback 2017-08-09 03:20 EDT

  None (edit)
Description Germano Veit Michel 2017-08-01 02:26:36 EDT
Description of problem:

Engine won't start, environment is down.

Apparently a successful snapshot removal, but there were entries in the Database triggering a callback for a command that doesn't exist, so the engine fails to start.

The snapshot:

2017-06-27 17:05:11,157 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand] (default task-48) [231f337f] START, MergeSnapshotsVDSCommand( MergeSnapshotsVDSCommandParameters:{runAsync='true', storagePoolId='7fa2b154-e990-4f7d-bf4d-de30026ccedf', ignoreFailoverLimit='false', storageDomainId='cb2f38db-5560-4762-b700-8e13c55a1e30', imageGroupId='ee7838f0-263e-477b-ac30-157472812b7b', imageId='7abcc25c-a459-4452-be5c-5369284a8d95', imageId2='2fac3f76-bf64-4073-a16a-98ce71958c51', vmId='9a254614-ce1a-40d3-9ca2-3955372653b1', postZero='false'}), log id: 3babecb0

2017-06-27 17:05:12,271 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (default task-48) [231f337f] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'de74fa43-2210-4cb8-811f-5b76b23bf107'

2017-06-27 17:05:12,271 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-48) [231f337f] CommandMultiAsyncTasks::attachTask: Attaching task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' to command 'de74fa43-2210-4cb8-811f-5b76b23bf107'.
2017-06-27 17:05:13,156 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-48) [3a647aaa] CommandMultiAsyncTasks::attachTask: Attaching task 'c8498212-6567-4c66-912e-459e77fb9c56' to command 'de74fa43-2210-4cb8-811f-5b76b23bf107'.

2017-06-27 17:19:53,306 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler4) [3aea9bbc] SPMAsyncTask::PollTask: Polling task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' (Parent Command 'RemoveSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2017-06-27 18:14:24,065 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler8) [18d5ac84] SPMAsyncTask::PollTask: Polling task 'c8498212-6567-4c66-912e-459e77fb9c56' (Parent Command 'RemoveSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.

2017-06-27 18:14:24,069 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler8) [18d5ac84] CommandAsyncTask::endActionIfNecessary: All tasks of command 'de74fa43-2210-4cb8-811f-5b76b23bf107' has ended -> executing 'endAction'

2017-06-27 18:14:24,361 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-41) [537c0fab] Correlation ID: 537c0fab, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'XYZ' deletion for VM 'XYZ' has been completed.

2017-06-27 18:14:24,362 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68'
2017-06-27 18:14:25,390 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] BaseAsyncTask::removeTaskFromDB: Removed task 'f0837f50-63b8-4dfd-9200-83a4a20a5a68' from DataBase

2017-06-27 18:14:25,391 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'c8498212-6567-4c66-912e-459e77fb9c56'
2017-06-27 18:14:26,034 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-41) [537c0fab] BaseAsyncTask::removeTaskFromDB: Removed task 'c8498212-6567-4c66-912e-459e77fb9c56' from DataBase

The above looks good to me. But 2 days later:

2017-06-29 19:05:11,479 WARN  [org.ovirt.engine.core.bll.tasks.CommandsRepository] (DefaultQuartzScheduler3) [] Marking expired command as Failed: command 'RemoveSnapshotSingleDisk (dc957381-fd6d-41ed-bf77-ea761a5970a4)' that started at 'Tue Jun 27 17:05:11 JST 2017' has been marked as Failed.
2017-06-29 19:05:12,498 WARN  [org.ovirt.engine.core.bll.tasks.CommandsRepository] (DefaultQuartzScheduler9) [] Marking expired command as Failed: command 'RemoveSnapshotSingleDisk (ae4fe47c-dd40-4a4d-82a9-e7248014fd68)' that started at 'Tue Jun 27 17:05:12 JST 2017' has been marked as Failed.

There is no engine-restart between June 26th and June 29th. The first restart is on July 26th, when the customer wanted to change a engine-config parameter. I don't know why these entries below were still in the DB if snapshot command succeded.

engine=> select command_id,parent_command_id,command_params_class,status,executed from command_entities ;
-[ RECORD 1 ]--------+-----------------------------------------------------------------------
command_id           | dc957381-fd6d-41ed-bf77-ea761a5970a4
parent_command_id    | de74fa43-2210-4cb8-811f-5b76b23bf107
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotSingleDiskParameters
status               | FAILED
executed             | t
-[ RECORD 2 ]--------+-----------------------------------------------------------------------
command_id           | ae4fe47c-dd40-4a4d-82a9-e7248014fd68
parent_command_id    | de74fa43-2210-4cb8-811f-5b76b23bf107
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotSingleDiskParameters
status               | FAILED
executed             | t
-[ RECORD 3 ]--------+-----------------------------------------------------------------------
command_id           | de74fa43-2210-4cb8-811f-5b76b23bf107
parent_command_id    | 00000000-0000-0000-0000-000000000000
command_params_class | org.ovirt.engine.core.common.action.RemoveSnapshotParameters
status               | ENDED_SUCCESSFULLY
executed             | t

And these entries make the engine fail to start as there is no callback for RemoveSnapshotCommand:

2017-07-28 13:29:38,026 ERROR [org.ovirt.engine.core.bll.InitBackendServicesOnStartupBean] (ServerService Thread Pool -- 50) [537c0fab] Failed to initialize backend: org.jboss.weld.exceptions.WeldException: WELD-000049: Unable to invoke private void org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.init() on org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller@14522b5f
    [....]
Caused by: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand.getCallback(RemoveSnapshotCommand.java:493) [bll.jar:]

Version-Release number of selected component (if applicable):
ovirt-engine-4.0.7
Comment 2 Yaniv Kaul 2017-08-06 05:21:36 EDT
Is this on track for 4.1.5?
Comment 3 Ala Hino 2017-08-06 07:00:25 EDT
(In reply to Yaniv Kaul from comment #2)
> Is this on track for 4.1.5?

I am looking into this now. Will update more once I have additional data.
Comment 4 Yaniv Kaul 2017-08-06 07:43:36 EDT
(In reply to Ala Hino from comment #3)
> (In reply to Yaniv Kaul from comment #2)
> > Is this on track for 4.1.5?
> 
> I am looking into this now. Will update more once I have additional data.

Thanks, keeping NEEDINFO for the time being.
Comment 5 Ala Hino 2017-08-09 07:37:27 EDT
Cancelling the needinfo flag as this bug is MODIFIED now.
Comment 6 Raz Tamir 2017-08-10 05:06:05 EDT
Ala,
What are the steps to reproduce this?
Comment 7 Ala Hino 2017-08-10 05:09:40 EDT
This one is a bit tricky to reproduce but maybe you can help.
In order to reproduce, we need a situation where live merge step is ACTIVE and we were able to remove the VM. Afterwards, restart the engine. Without the fix, we would get a NPE, with the fix the engine should start.
Comment 8 rhev-integ 2017-08-10 13:53:05 EDT
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops@redhat.com

Note You need to log in before you can comment on or make changes to this bug.