Bug 1464348

Summary: SQL exception occur during HSMClearTaskVDSCommand
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Fred Rolland <frolland>
Status: CLOSED CURRENTRELEASE QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.3.4CC: bugs, eshenitz, frolland, ratamir, stirabos
Target Milestone: ovirt-4.1.4Keywords: Automation, Regression
Target Release: 4.1.4Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-28 14:20:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log
none
engine logs none

Description Eyal Shenitzky 2017-06-23 07:42:46 UTC
Description of problem:

An exception is thrown during HSMClearTaskVDSCommand:

Error in the engine - 

2017-06-21 09:30:17,609+03 ERROR [org.ovirt.engine.core.bll.job.JobRepositoryImpl] (org.ovirt.thread.pool-6-thread-38) [disks_syncAction_376ef93d-3f7a-4149] Failed to save step '846f016e-6a8f-4ecf-9a06-4f7375129b91', 'FINALIZING': CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(c6fc638d-34c2-4d62-9844-74dd2238e8a9) is not present in table "job".
  Where: SQL statement "INSERT INTO step (
        step_id,
        parent_step_id,
        job_id,
        step_type,
        description,
        step_number,
        status,
        progress,
        start_time,
        end_time,
        correlation_id,
        external_id,
        external_system_type,
        is_external
        )
    VALUES (
        v_step_id,
        v_parent_step_id,
        v_job_id,
        v_step_type,
        v_description,
        v_step_number,
        v_status,
        v_progress,
        v_start_time,
        v_end_time,
        v_correlation_id,
        v_external_id,
        v_external_system_type,
        v_is_external
        )"
PL/pgSQL function insertstep(uuid,uuid,uuid,character varying,text,integer,character varying,smallint,timestamp with time zone,timestamp with time zone,character varying,uuid,character varying,boolean) line 3 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(c6fc638d-34c2-4d62-9844-74dd2238e8a9) is not present in table "job".
  Where: SQL statement "INSERT INTO step (
        step_id,
        parent_step_id,
        job_id,
        step_type,
        description,
        step_number,
        status,
        progress,
        start_time,
        end_time,
        correlation_id,
        external_id,
        external_system_type,
        is_external
        )
    VALUES (
        v_step_id,
        v_parent_step_id,
        v_job_id,
        v_step_type,
        v_description,
        v_step_number,
        v_status,
        v_progress,
        v_start_time,
        v_end_time,
        v_correlation_id,
        v_external_id,
        v_external_system_type,
        v_is_external
        )"
PL/pgSQL function insertstep(uuid,uuid,uuid,character varying,text,integer,character varying,smallint,timestamp with time zone,timestamp with time zone,character varying,uuid,character varying,boolean) line 3 at SQL statement
2017-06-21 09:30:17,610+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand] (org.ovirt.thread.pool-6-thread-38) [disks_syncAction_376ef93d-3f7a-4149] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand' successfully.
2017-06-21 09:30:17,613+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-38) [disks_syncAction_376ef93d-3f7a-4149] [within thread]: endAction for action type RemoveImage threw an exception.: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.job.ExecutionHandler.endJob(ExecutionHandler.java:520) [bll.jar:]
	at org.ovirt.engine.core.bll.job.ExecutionHandler.endTaskJobIfNeeded(ExecutionHandler.java:851) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.endStepsAndJobIfNeeded(CommandBase.java:590) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.endAction(CommandBase.java:563) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.DecoratedCommand.endAction(DecoratedCommand.java:17) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:337) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:340) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:154) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:106) [bll.jar:]
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:84) [utils.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_131]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]

2017-06-21 09:30:17,614+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-38) [disks_syncAction_376ef93d-3f7a-4149] CommandAsyncTask::HandleEndActionResult: endAction for action type 'RemoveImage' threw an unrecoverable RuntimeException the task will be cleared.
2017-06-21 09:30:17,617+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-38) [disks_syncAction_376ef93d-3f7a-4149] BaseAsyncTask::removeTaskFromDB: Removed task '50e8f07a-26ff-4733-b836-f20f2fb34061' from DataBase
2017-06-21 09:30:17,617+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-38) [disks_syncAction_376ef93d-3f7a-4149] CommandAsyncTask::HandleEndActionRuntimeException: Removing CommandMultiAsyncTasks object for entity '5fe84ab4-d124-4fb9-8ec0-1db553c9ab8e'


Version-Release number of selected component (if applicable):
vdsm-4.19.19-1.el7ev.x86_64
ovirt-engine-4.1.3.4-0.1.el7.noarch

How reproducible:
?

Steps to Reproduce:
Occurred during automation run (failed to reproduced it localy)- 

1. In environment with template which has disk on 2 domains (copies)
2. Create a VM with disk from template as thin on the first domain
3. Enter the second domain to maintenance
4. Run the VM
5. Try to migrate the VM to the second domain. 

Actual results:
An exception was thrown

Expected results:
We should fail to migrate the disk

Additional info:
Engine log attached

Comment 1 Allon Mureinik 2017-06-25 08:47:07 UTC
Fred, didn't you recently handle something like this?

Comment 2 Red Hat Bugzilla Rules Engine 2017-06-25 08:47:12 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Fred Rolland 2017-06-25 09:19:35 UTC
No, first time I see this issue.
Eyal, can you provide logs ?

Comment 4 Eyal Shenitzky 2017-06-25 11:23:49 UTC
The engine logs are attached

Comment 5 Fred Rolland 2017-06-25 12:05:14 UTC
(In reply to Eyal Shenitzky from comment #4)
> The engine logs are attached

It is marked as "deleted" ...

Comment 6 Eyal Shenitzky 2017-06-25 12:44:54 UTC
Created attachment 1291715 [details]
engine logs

Comment 7 Fred Rolland 2017-06-27 12:21:53 UTC
Eyal, can you provide the automation code for that scenario ?
Thanks

Comment 8 Eyal Shenitzky 2017-06-27 13:05:44 UTC
The automation code is not a simple python script,
It is a large scale project that I can't provide on this platform

Comment 10 Raz Tamir 2017-07-12 07:54:14 UTC
Tested on ovirt-engine-4.1.4-0.2.el7.noarch
All automation tests (mentioned in comment #9) passed