Bug 888199

Summary: engine: Failed to save step during vm migration re-run
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact:
Priority: urgent    
Version: 3.1.0CC: dyasny, hateya, iheim, lpeer, michal.skrivanek, ofrenkel, oramraz, Rhev-m-bugs, sgrinber, yeylon, ykaul, yzaslavs
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: sf6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 915537    
Attachments:
Description Flags
log and db dump none

Description Dafna Ron 2012-12-18 09:24:37 UTC
Created attachment 665382 [details]
log and db dump

Description of problem:

in 3 hosts cluster with NFs storage I blocked connectivity to the storage from all the hosts. 
during vms migratioin we are getting sql error when trying to update step for re-run of vm migration

Version-Release number of selected component (if applicable):

si25.1

How reproducible:

100%

Steps to Reproduce:
1. in 3 hosts cluster with NFS storage create ~40 vm's and run them on all hosts
2. block connectivity to the storage domain from all the hosts
3.
  
Actual results:

when we try to re-run a vm we get sql error on  Failed to save step

Expected results:

we should be able to update the table on re-run 

Additional info:logs and db dump

2012-12-18 10:21:44,518 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         12
mMessage                      Fatal error during migration


2012-12-18 10:21:44,518 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) HostName = gold-vdsd
2012-12-18 10:21:44,518 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-50) Command MigrateStatusVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to MigrateStatusVD
S, error = Fatal error during migration
2012-12-18 10:21:44,518 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-4-thread-50) FINISH, MigrateStatusVDSCommand, log id: 25f29f8d
2012-12-18 10:21:44,565 ERROR [org.ovirt.engine.core.bll.job.JobRepositoryImpl] (pool-4-thread-50) Failed to save step ae3965a6-da21-4a16-898f-ff68e685ec5e, VALIDATING.: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(699b46e5-8c91-454b-a6de-aba726e02ba6) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 )"
PL/pgSQL function "insertstep" line 2 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(699b46e5-8c91-454b-a6de-aba726e02ba6) is not present in table "job".
  Where: SQL statement "INSERT INTO step( step_id, parent_step_id, job_id, step_type, description, step_number, status, start_time, end_time, correlation_id, external_id, external_system_type) VALUES (  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 ,  $9 ,  $10 ,  $11 ,  $12 )"
PL/pgSQL function "insertstep" line 2 at SQL statement
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:245) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1030) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:1064) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.executeCallInternal(AbstractJdbcCall.java:388) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.AbstractJdbcCall.doExecute(AbstractJdbcCall.java:351) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:181) [spring-jdbc-3.1.1.RELEASE.jar:3.1.1.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:124) [engine-dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeModification(SimpleJdbcCallsHandler.java:37) [engine-dal.jar:]
        at org.ovirt.engine.core.dao.DefaultGenericDaoDbFacade.save(DefaultGenericDaoDbFacade.java:93) [engine-dal.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:55) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl$1.runInTransaction(JobRepositoryImpl.java:49) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.job.JobRepositoryImpl.saveStep(JobRepositoryImpl.java:49) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addSubStep(ExecutionHandler.java:318) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.job.ExecutionHandler.addStep(ExecutionHandler.java:269) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:284) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommandBase.rerunInternal(RunVmCommandBase.java:241) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MigrateVmCommand.rerunInternal(MigrateVmCommand.java:222) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.RunVmCommandBase$1.run(RunVmCommandBase.java:212) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09-icedtea]

Comment 4 Arik 2013-01-15 16:43:15 UTC
I tried couple of times to reproduce it with the latest build on rhev-3.1 branch d/s in a 3 hosts environment and pool of 30 VMs: I ran all VMs and block the connection to the storage from all hosts.

The result was:

1. SPM become non-operational

2. there're 8 rerun because of failed migrations. the cause to the failure is "Error creating the requested virtual machine" and not "Fatal error during migration" as in the log above though..

3. no DataIntegrityViolationException or error save step cannot be saved

Dafna - I'll need more help to reproduce it

Comment 5 Yair Zaslavsky 2013-01-15 19:57:46 UTC
In reply to comment #4 -
Arik -I would also try to setup si25.1 - i.e -have its engine code and try to reproduce.
Maybe the bug got solved due to some other work on other bugs/features.

Comment 6 Arik 2013-01-24 13:41:06 UTC
http://gerrit.ovirt.org/#/c/11370/

Comment 9 Leonid Natapov 2013-03-14 11:25:21 UTC
sf10. verifies on scale env. 50 vms running on 4 hosts, NFS SD. connectivity blocked. Rerunning VMs was successful. No sql errors.

Comment 10 Itamar Heim 2013-06-11 08:21:53 UTC
3.2 has been released

Comment 11 Itamar Heim 2013-06-11 08:24:29 UTC
3.2 has been released