Bug 1076246 - RHEV 3.3 - Live Migration fails with ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
Summary: RHEV 3.3 - Live Migration fails with ERROR: insert or update on table "step" ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: 3.4.0
Assignee: Liran Zelkha
QA Contact: Tareq Alayan
URL:
Whiteboard: infra
Depends On:
Blocks: 1083423 rhev3.4snapshot2
TreeView+ depends on / blocked
 
Reported: 2014-03-13 21:51 UTC by Robert McSwain
Modified: 2019-04-28 10:44 UTC (History)
17 users (show)

Fixed In Version: org.ovirt.engine-root-3.3.0-51
Doc Type: Bug Fix
Doc Text:
Previously, jobs that took a long time to complete would fail under certain conditions due to database errors. This was caused by the logic used to handle jobs, which would remove jobs with no steps or asynchronous tasks. This logic has now been revised so that such jobs are no longer removed, allowing jobs that take a long time to complete to be processed correctly.
Clone Of:
: 1083423 (view as bug list)
Environment:
Last Closed: 2014-06-09 15:05:24 UTC
oVirt Team: Infra
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 789403 None None None Never
Red Hat Product Errata RHSA-2014:0506 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 18:55:38 UTC
oVirt gerrit 26291 None None None Never
oVirt gerrit 26292 None None None Never
oVirt gerrit 26293 None None None Never

Description Robert McSwain 2014-03-13 21:51:22 UTC
Description of problem:
After upgrading to RHEV-M 3.3 from 3.2, live migrating Windows 7 x86_64 VMs to update the RHEV-H nodes with the latest RHEV-H images fails. After patching one hypervisor to the 20140217 build not all VMs can migrate. 

From the engine log:
2014-03-11 12:22:16,542 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-33) RefreshVmList vm id 7937173d-14fb-42cf-9002-1f9167806fd3 is migrating to vds usorla7hp204x.ww007.siemens.net ignoring it in the refresh until migration is done
2014-03-11 12:22:20,104 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-87) VM usorla7vq201w 7937173d-14fb-42cf-9002-1f9167806fd3 moved from MigratingFrom --> Up
2014-03-11 12:22:20,105 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-87) Adding VM 7937173d-14fb-42cf-9002-1f9167806fd3 to re-run list
2014-03-11 12:22:20,109 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-87) Rerun vm 7937173d-14fb-42cf-9002-1f9167806fd3. Called from vds usorla7hp212x.ww007.siemens.net
2014-03-11 12:22:20,122 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) START, MigrateStatusVDSCommand(HostName = usorla7hp212x.ww007.siemens.net, HostId = 7ddd913e-adcf-11e2-99b9-0017a477ec28, vmId=7937173d-14fb-42cf-9002-1f9167806fd3), log id: 286cea39
2014-03-11 12:22:20,133 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Failed in MigrateStatusVDS method
2014-03-11 12:22:20,133 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Error code MIGRATION_CANCEL_ERROR and error message VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Migration canceled
2014-03-11 12:22:20,133 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=47, mMessage=Migration canceled]]
2014-03-11 12:22:20,133 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) HostName = usorla7hp212x.ww007.siemens.net
2014-03-11 12:22:20,133 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) Command MigrateStatusVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Migration canceled
2014-03-11 12:22:20,133 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-48) FINISH, MigrateStatusVDSCommand, log id: 286cea39
2014-03-11 12:22:20,136 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-48) Correlation ID: 3fd0dcfb, Job ID: a9b2e45d-1bf0-4748-b048-fa5662bfe88b, Call Stack: null, Custom Event ID: -1, Message: Migration failed due to Error: Migration not in progress. Trying to migrate to another Host (VM: usorla7vq201w, Source: usorla7hp212x.ww007.siemens.net, Destination: usorla7hp204x.ww007.siemens.net).
2014-03-11 12:22:20,141 ERROR [org.ovirt.engine.core.bll.job.JobRepositoryImpl] (pool-4-thread-48) Failed to save step f7aac82d-4050-4150-b486-620375d7c435, VALIDATING.: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(a9b2e45d-1bf0-4748-b048-fa5662bfe88b) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type, is_external) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 ,  $13 )"
PL/pgSQL function "insertstep" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(a9b2e45d-1bf0-4748-b048-fa5662bfe88b) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type, is_external) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 ,  $13 )"
PL/pgSQL function "insertstep" line 2 at SQL statement
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:245) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1030) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:1064) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.executeCallInternal(AbstractJdbcCall.java:388) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.doExecute(AbstractJdbcCall.java:351) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:181) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:147) [dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeModification(SimpleJdbcCallsHandler.java:74) [dal.jar:]
        at org.ovirt.engine.core.dao.DefaultGenericDaoDbFacade.save(DefaultGenericDaoDbFacade.java:93) [dal.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:54) [bll.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:48) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl.saveStep(JobRepositoryImpl.java:48) [bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addStep(ExecutionHandler.java:289) [bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addStep(ExecutionHandler.java:253) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommandBase.rerun(RunVmCommandBase.java:119) [bll.jar:]
        at org.ovirt.engine.core.bll.MigrateVmCommand.rerun(MigrateVmCommand.java:347) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsEventListener$3.run(VdsEventListener.java:343) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_25]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_25]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_25]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_25]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_25]
        at java.lang.Thread.run(Thread.java:724) [rt.jar:1.7.0_25]
Caused by: org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(a9b2e45d-1bf0-4748-b048-fa5662bfe88b) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type, is_external) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 ,  $13 )"
PL/pgSQL function "insertstep" line 2 at SQL statement
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2101)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1834)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:379)
        at org.jboss.jca.adapters.jdbc.CachedPreparedStatement.execute(CachedPreparedStatement.java:297)
        at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.execute(WrappedPreparedStatement.java:404)
        at org.springframework.jdbc.core.JdbcTemplate$6.doInCallableStatement(JdbcTemplate.java:1066) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate$6.doInCallableStatement(JdbcTemplate.java:1) [spring-jdbc.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1014) [spring-jdbc.jar:3.1.1.RELEASE]
        ... 24 more

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Michal Skrivanek 2014-03-25 08:49:35 UTC
as per initial investigation the error happens after the migration fails. It's not the reason for it. Flagging for 3.4 for now

Comment 3 Arik 2014-03-30 12:31:19 UTC
The bug reproduced on u/s: the migration job was removed before the rerun procedure managed to add a step for the next migration attempt so we got the same exception. It seems to be related to the additional section that was added to the stored procedure 'DeleteCompletedJobsOlderThanDate' as part of the solution for https://bugzilla.redhat.com/show_bug.cgi?id=1055162: it removes the job as MigrateVmCommand doesn't have async tasks.
The bug occurred on rhevm 3.3.1 which includes this fix, so we're suspecting that the same thing happened in the reported scenario.

Comment 6 Tareq Alayan 2014-05-08 12:03:56 UTC
couldn't reproduce.
Did migrate a windows guest succesfuly ..
rhevm-3.4.0-0.16.rc.el6ev.noarch

Comment 7 errata-xmlrpc 2014-06-09 15:05:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html


Note You need to log in before you can comment on or make changes to this bug.