Bug 1076246

Summary: RHEV 3.3 - Live Migration fails with ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
Product: Red Hat Enterprise Virtualization Manager Reporter: Robert McSwain <rmcswain>
Component: ovirt-engineAssignee: Liran Zelkha <lzelkha>
Status: CLOSED ERRATA QA Contact: Tareq Alayan <talayan>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.3.0CC: aberezin, acathrow, adahms, amureini, bazulay, dossow, iheim, lpeer, lzelkha, michal.skrivanek, mkalinin, ofrenkel, oourfali, pablo.iranzo, pstehlik, Rhev-m-bugs, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: org.ovirt.engine-root-3.3.0-51 Doc Type: Bug Fix
Doc Text:
Previously, jobs that took a long time to complete would fail under certain conditions due to database errors. This was caused by the logic used to handle jobs, which would remove jobs with no steps or asynchronous tasks. This logic has now been revised so that such jobs are no longer removed, allowing jobs that take a long time to complete to be processed correctly.
Story Points: ---
Clone Of:
: 1083423 (view as bug list) Environment:
Last Closed: 2014-06-09 15:05:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1083423, 1090946    

Description Robert McSwain 2014-03-13 21:51:22 UTC
Description of problem:
After upgrading to RHEV-M 3.3 from 3.2, live migrating Windows 7 x86_64 VMs to update the RHEV-H nodes with the latest RHEV-H images fails. After patching one hypervisor to the 20140217 build not all VMs can migrate. 

From the engine log:
2014-03-11 12:22:16,542 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-33) RefreshVmList vm id 7937173d-14fb-42cf-9002-1f9167806fd3 is migrating to vds usorla7hp204x.ww007.siemens.net ignoring it in the refresh until migration is done
2014-03-11 12:22:20,104 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-87) VM usorla7vq201w 7937173d-14fb-42cf-9002-1f9167806fd3 moved from MigratingFrom --> Up
2014-03-11 12:22:20,105 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-87) Adding VM 7937173d-14fb-42cf-9002-1f9167806fd3 to re-run list
2014-03-11 12:22:20,109 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-87) Rerun vm 7937173d-14fb-42cf-9002-1f9167806fd3. Called from vds usorla7hp212x.ww007.siemens.net
2014-03-11 12:22:20,122 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) START, MigrateStatusVDSCommand(HostName = usorla7hp212x.ww007.siemens.net, HostId = 7ddd913e-adcf-11e2-99b9-0017a477ec28, vmId=7937173d-14fb-42cf-9002-1f9167806fd3), log id: 286cea39
2014-03-11 12:22:20,133 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Failed in MigrateStatusVDS method
2014-03-11 12:22:20,133 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Error code MIGRATION_CANCEL_ERROR and error message VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Migration canceled
2014-03-11 12:22:20,133 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=47, mMessage=Migration canceled]]
2014-03-11 12:22:20,133 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) HostName = usorla7hp212x.ww007.siemens.net
2014-03-11 12:22:20,133 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Command MigrateStatusVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Migration canceled
2014-03-11 12:22:20,133 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) FINISH, MigrateStatusVDSCommand, log id: 286cea39
2014-03-11 12:22:20,136 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-48) Correlation ID: 3fd0dcfb, Job ID: a9b2e45d-1bf0-4748-b048-fa5662bfe88b, Call Stack: null, Custom Event ID: -1, Message: Migration failed due to Error: Migration not in progress. Trying to migrate to another Host (VM: usorla7vq201w, Source: usorla7hp212x.ww007.siemens.net, Destination: usorla7hp204x.ww007.siemens.net).
2014-03-11 12:22:20,141 ERROR [org.ovirt.engine.core.bll.job.JobRepositoryImpl] (pool-4-thread-48) Failed to save step f7aac82d-4050-4150-b486-620375d7c435, VALIDATING.: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(a9b2e45d-1bf0-4748-b048-fa5662bfe88b) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type, is_external) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 ,  $13 )"
PL/pgSQL function "insertstep" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(a9b2e45d-1bf0-4748-b048-fa5662bfe88b) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type, is_external) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 ,  $13 )"
PL/pgSQL function "insertstep" line 2 at SQL statement
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:245) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1030) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:1064) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.executeCallInternal(AbstractJdbcCall.java:388) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.doExecute(AbstractJdbcCall.java:351) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:181) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:147) [dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeModification(SimpleJdbcCallsHandler.java:74) [dal.jar:]
        at org.ovirt.engine.core.dao.DefaultGenericDaoDbFacade.save(DefaultGenericDaoDbFacade.java:93) [dal.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:54) [bll.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:48) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl.saveStep(JobRepositoryImpl.java:48) [bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addStep(ExecutionHandler.java:289) [bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addStep(ExecutionHandler.java:253) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommandBase.rerun(RunVmCommandBase.java:119) [bll.jar:]
        at org.ovirt.engine.core.bll.MigrateVmCommand.rerun(MigrateVmCommand.java:347) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsEventListener$3.run(VdsEventListener.java:343) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_25]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_25]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_25]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_25]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_25]
        at java.lang.Thread.run(Thread.java:724) [rt.jar:1.7.0_25]
Caused by: org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(a9b2e45d-1bf0-4748-b048-fa5662bfe88b) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type, is_external) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 ,  $13 )"
PL/pgSQL function "insertstep" line 2 at SQL statement
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2101)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1834)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:379)
        at org.jboss.jca.adapters.jdbc.CachedPreparedStatement.execute(CachedPreparedStatement.java:297)
        at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.execute(WrappedPreparedStatement.java:404)
        at org.springframework.jdbc.core.JdbcTemplate$6.doInCallableStatement(JdbcTemplate.java:1066) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate$6.doInCallableStatement(JdbcTemplate.java:1) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1014) [spring-jdbc.jar:3.1.1.RELEASE]
        ... 24 more

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Michal Skrivanek 2014-03-25 08:49:35 UTC
as per initial investigation the error happens after the migration fails. It's not the reason for it. Flagging for 3.4 for now

Comment 3 Arik 2014-03-30 12:31:19 UTC
The bug reproduced on u/s: the migration job was removed before the rerun procedure managed to add a step for the next migration attempt so we got the same exception. It seems to be related to the additional section that was added to the stored procedure 'DeleteCompletedJobsOlderThanDate' as part of the solution for https://bugzilla.redhat.com/show_bug.cgi?id=1055162: it removes the job as MigrateVmCommand doesn't have async tasks.
The bug occurred on rhevm 3.3.1 which includes this fix, so we're suspecting that the same thing happened in the reported scenario.

Comment 6 Tareq Alayan 2014-05-08 12:03:56 UTC
couldn't reproduce.
Did migrate a windows guest succesfuly ..
rhevm-3.4.0-0.16.rc.el6ev.noarch

Comment 7 errata-xmlrpc 2014-06-09 15:05:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html