Bug 893689

Summary: [engine-backend] unable to create floating disk - engine fails to clear task
Product: Red Hat Enterprise Virtualization Manager Reporter: Haim <hateya>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED CURRENTRELEASE QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: amureini, dyasny, iheim, lnatapov, lpeer, mlipchuk, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: SF4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log.
none
vdsm.log none

Description Haim 2013-01-09 17:45:54 UTC
Description of problem:

unable to create system disk since engine fails to clear tasks from some unknown reason and hits a null-pointer exception.

repro steps goes as follows:

2013-01-09 19:28:57,337 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-62) Failed in HSMGetAllTasksStatusesVDS method
2013-01-09 19:28:57,337 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-62) Error code VolumeCreationError and error message VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume
2013-01-09 19:28:57,338 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-62) ERROR, org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksStatusesVDSCommand, exception: NullPointerException: , log id: 2f2f092: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand$IrsProxyData.getIrsProxy(IrsBrokerCommand.java:509) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.getIrsProxy(IrsBrokerCommand.java:1498) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.ExecuteVDSCommand(IrsBrokerCommand.java:1506) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.ExecuteCommand(VDSCommandBase.java:63) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.Execute(VdcCommandBase.java:28) [engine-dal.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:362) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.AsyncTaskManager.getSPMsTasksStatuses(AsyncTaskManager.java:289) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.AsyncTaskManager.PollAndUpdateAsyncTasks(AsyncTaskManager.java:241) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.AsyncTaskManager._timer_Elapsed(AsyncTaskManager.java:93) [engine-bll.jar:]
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) [:1.7.0_09-icedtea]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_09-icedtea]
        at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_09-icedtea]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [engine-scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz-2.1.2.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz-2.1.2.jar:]

2013-01-09 19:28:57,340 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-62) SPMAsyncTask::PollTask: Polling task 854ed1c0-7dd3-46c1-aee5-1dcfabf9fa97 (Parent Command AddDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'success'.

Comment 1 Haim 2013-01-09 17:51:55 UTC
Created attachment 675747 [details]
engine log.

Comment 3 Haim 2013-01-09 17:54:23 UTC
Created attachment 675748 [details]
vdsm.log

Comment 4 Ayal Baron 2013-01-10 09:06:12 UTC
Haim, there are no reproduction steps here.
In addition, the log contains the getTaskStatus that shows that the task failed, but the task flow itself is not in the log:
Thread-55321::DEBUG::2013-01-09 17:30:45,530::taskManager::93::TaskManager::(getTaskStatus) Entry. taskID: fb7244f5-bb52-4ed3-b1d4-2750437fc2a0
Thread-55321::DEBUG::2013-01-09 17:30:45,530::taskManager::96::TaskManager::(getTaskStatus) Return. Response: {'code': 205, 'message': 'Error creating a new volume', 'taskState': 'finished', 'taskResult': 'cleanSuccess', 'taskID': 'fb7244f5-bb52-4ed3-b1d4-2750437fc2a0'}

The vdsm task to create the disk failed (not sure whether before or after lvcreate because I don't have the relevant log).
The task we're missing info on is: fb7244f5-bb52-4ed3-b1d4-2750437fc2a0

Note that there are many problems with the domain (checksum problems, missing directories in master fs, etc).

Comment 5 Maor 2013-01-10 11:11:08 UTC
From the log I suspect the NPE was caused since the storage pool related to the task is missing or does not exists in the setup.
Haim, is it possible to check that through the DB:
 SELECT action_parameters FROM async_Tasks where task_id = 'fb7244f5-bb52-4ed3-b1d4-2750437fc2a0'
see what is the storage pool id in the task
and then SELECT id FROM storage_pool;
to check if this storage pool exists.

Comment 6 Haim 2013-01-10 13:49:33 UTC
(In reply to comment #5)
> From the log I suspect the NPE was caused since the storage pool related to
> the task is missing or does not exists in the setup.
> Haim, is it possible to check that through the DB:
>  SELECT action_parameters FROM async_Tasks where task_id =
> 'fb7244f5-bb52-4ed3-b1d4-2750437fc2a0'
> see what is the storage pool id in the task
> and then SELECT id FROM storage_pool;
> to check if this storage pool exists.

happens consistently on 3.2 setup, pool does exists, the problem only occurs with floating disk, you are more then welcome to login and take a look.

Comment 7 Allon Mureinik 2013-01-16 10:23:44 UTC
devel-ack - correct sp id should be passed, and we should be able to create the disk.

Comment 8 Liron Aravot 2013-01-20 12:11:24 UTC
relevant log to the issue (add disk)

3-01-09 19:48:14,640 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-3-thread-47) transaction rolled back
2013-01-09 19:48:14,656 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-47) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type Add
Disk threw an exception: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.VmCommand.endVmCommand(VmCommand.java:185) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VmCommand.endSuccessfully(VmCommand.java:222) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalEndSuccessfully(CommandBase.java:523) [engine-bll.jar:]
 

already pushed and merged:
http://gerrit.ovirt.org/#/c/10623/

haim- the log you pasted is related to another issue, there seems to be tasks for storage pool that has been deleted - moving this to modify, adding needinfo for Haim to let you know about this issue if its not a known bug already (you might have opened one for that already).

Comment 9 Leonid Natapov 2013-02-13 10:24:04 UTC
sf6. fixed. can create floating system disk.

Comment 10 Itamar Heim 2013-06-11 08:26:42 UTC
3.2 has been released

Comment 11 Itamar Heim 2013-06-11 08:26:47 UTC
3.2 has been released