Bug 888199 - engine: Failed to save step during vm migration re-run
engine: Failed to save step during vm migration re-run
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
urgent Severity medium
: ---
: 3.2.0
Assigned To: Arik
Leonid Natapov
virt
: Regression
Depends On:
Blocks: 915537
  Show dependency treegraph
 
Reported: 2012-12-18 04:24 EST by Dafna Ron
Modified: 2013-06-11 04:24 EDT (History)
12 users (show)

See Also:
Fixed In Version: sf6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
log and db dump (557.18 KB, application/x-gzip)
2012-12-18 04:24 EST, Dafna Ron
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 11370 None None None Never

  None (edit)
Description Dafna Ron 2012-12-18 04:24:37 EST
Created attachment 665382 [details]
log and db dump

Description of problem:

in 3 hosts cluster with NFs storage I blocked connectivity to the storage from all the hosts. 
during vms migratioin we are getting sql error when trying to update step for re-run of vm migration

Version-Release number of selected component (if applicable):

si25.1

How reproducible:

100%

Steps to Reproduce:
1. in 3 hosts cluster with NFS storage create ~40 vm's and run them on all hosts
2. block connectivity to the storage domain from all the hosts
3.
  
Actual results:

when we try to re-run a vm we get sql error on  Failed to save step

Expected results:

we should be able to update the table on re-run 

Additional info:logs and db dump

2012-12-18 10:21:44,518 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         12
mMessage                      Fatal error during migration


2012-12-18 10:21:44,518 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) HostName = gold-vdsd
2012-12-18 10:21:44,518 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-50) Command MigrateStatusVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to MigrateStatusVD
S, error = Fatal error during migration
2012-12-18 10:21:44,518 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-50) FINISH, MigrateStatusVDSCommand, log id: 25f29f8d
2012-12-18 10:21:44,565 ERROR [org.ovirt.engine.core.bll.job.JobRepositoryImpl] (pool-4-thread-50) Failed to save step ae3965a6-da21-4a16-898f-ff68e685ec5e, VALIDATING.: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(699b46e5-8c91-454b-a6de-aba726e02ba6) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 )"
PL/pgSQL function "insertstep" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(699b46e5-8c91-454b-a6de-aba726e02ba6) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 )"
PL/pgSQL function "insertstep" line 2 at SQL statement
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:245) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1030) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:1064) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.executeCallInternal(AbstractJdbcCall.java:388) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.doExecute(AbstractJdbcCall.java:351) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:181) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:124) [engine-dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeModification(SimpleJdbcCallsHandler.java:37) [engine-dal.jar:]
        at org.ovirt.engine.core.dao.DefaultGenericDaoDbFacade.save(DefaultGenericDaoDbFacade.java:93) [engine-dal.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:55) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:49) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl.saveStep(JobRepositoryImpl.java:49) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addSubStep(ExecutionHandler.java:318) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addStep(ExecutionHandler.java:269) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:284) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommandBase.rerunInternal(RunVmCommandBase.java:241) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MigrateVmCommand.rerunInternal(MigrateVmCommand.java:222) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommandBase$1.run(RunVmCommandBase.java:212) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09-icedtea]
Comment 4 Arik 2013-01-15 11:43:15 EST
I tried couple of times to reproduce it with the latest build on rhev-3.1 branch d/s in a 3 hosts environment and pool of 30 VMs: I ran all VMs and block the connection to the storage from all hosts.

The result was:

1. SPM become non-operational

2. there're 8 rerun because of failed migrations. the cause to the failure is "Error creating the requested virtual machine" and not "Fatal error during migration" as in the log above though..

3. no DataIntegrityViolationException or error save step cannot be saved

Dafna - I'll need more help to reproduce it
Comment 5 Yair Zaslavsky 2013-01-15 14:57:46 EST
In reply to comment #4 -
Arik -I would also try to setup si25.1 - i.e -have its engine code and try to reproduce.
Maybe the bug got solved due to some other work on other bugs/features.
Comment 6 Arik 2013-01-24 08:41:06 EST
http://gerrit.ovirt.org/#/c/11370/
Comment 9 Leonid Natapov 2013-03-14 07:25:21 EDT
sf10. verifies on scale env. 50 vms running on 4 hosts, NFS SD. connectivity blocked. Rerunning VMs was successful. No sql errors.
Comment 10 Itamar Heim 2013-06-11 04:21:53 EDT
3.2 has been released
Comment 11 Itamar Heim 2013-06-11 04:24:29 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.