Bug 851840

Summary: ovirt-engine-backend: we are trying to migrate vm's although GetCapabilitiesVDS returns with Recovering from crash or Initializing on NFS storage type
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Omer Frenkel <ofrenkel>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: dyasny, hateya, iheim, lpeer, michal.skrivanek, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: si18 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 20:07:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-08-26 11:59:37 UTC
Created attachment 607055 [details]
logs

Description of problem:

I blocked the storage domain from the host using iptables with storage type NFS.
backend attempts to migrate the vm's although GetCapabilitiesVDS returns with Error: VDSRecoveringException: Recovering from crash or Initializing
since vdsm is not responding yet the migration will fail so there is no point in sending MigrateBrokerVDSCommand until vdsm is responding. 

Version-Release number of selected component (if applicable):

si15
vdsm-4.9.6-30.0.el6_3.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create a setup with a two hosts cluster on NFS storage with running+writing vm's running in the SPM host. 
2. block connectivity to the Storage domain from the SPM
3.
  
Actual results:

backend is sending GetCapabilitiesVDS to the host which returns with Error since the vdsm is reinitializing and yet we are still sending a MigrateBrokerVDSCommand on all vm's (which will fail since vdsm is not responding). 

Expected results:

as long as we are getting error on GetCapabilitiesVDS there is no point in migrating. 

Additional info: logs 

2012-08-26 14:40:04,913 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-17) [56410677] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :  ID: 9c588ba2-ec35-11e1-a1e6-0
01a4a169741 Type: VDS
2012-08-26 14:40:04,913 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-17) [56410677] START, SetVdsStatusVDSCommand(vdsId = 9c588ba2-ec35-11e1-a1e6-001a4a169741, status=NonOperational, nonOperatio
nalReason=TIMEOUT_RECOVERING_FROM_CRASH), log id: 33f42fc3
2012-08-26 14:40:04,916 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-17) [56410677] FINISH, SetVdsStatusVDSCommand, log id: 33f42fc3
2012-08-26 14:40:05,146 INFO  [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (pool-4-thread-44) [3cb2804a] Running command: InternalMigrateVmCommand internal: true. Entities affected :  ID: 50737895-2cee-42aa-8aaf-734e7891a99b Typ
e: VM
2012-08-26 14:40:05,156 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-44) [3cb2804a] START, MigrateVDSCommand(vdsId = 9c588ba2-ec35-11e1-a1e6-001a4a169741, vmId=50737895-2cee-42aa-8aaf-734e7891a99b, srcHost=go
ld-vdsd.qa.lab.tlv.redhat.com, dstVdsId=35b5ed18-ed2a-11e1-b9a6-001a4a169741, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 5c00ed23
2012-08-26 14:40:05,157 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-44) [3cb2804a] VdsBroker::migrate::Entered (vm_guid=50737895-2cee-42aa-8aaf-734e7891a99b, srcHost=gold-vdsd.qa.lab.tlv.redh
at.com, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321,  method=online
2012-08-26 14:40:05,157 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-44) [3cb2804a] START, MigrateBrokerVDSCommand(vdsId = 9c588ba2-ec35-11e1-a1e6-001a4a169741, vmId=50737895-2cee-42aa-8aaf-73
4e7891a99b, srcHost=gold-vdsd.qa.lab.tlv.redhat.com, dstVdsId=35b5ed18-ed2a-11e1-b9a6-001a4a169741, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 6536193f
2012-08-26 14:40:05,178 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-44) [3cb2804a] Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         99
mMessage                      Recovering from crash or Initializing

Comment 2 Michal Skrivanek 2012-08-30 10:50:30 UTC
is this really regression?

Comment 3 Haim 2012-08-30 10:54:41 UTC
(In reply to comment #2)
> is this really regression?

not sure. removing this flag till proven otherwise.

Comment 4 Omer Frenkel 2012-08-30 13:57:36 UTC
[removed regression as this behaviour hasn't changed]

Comment 5 Omer Frenkel 2012-08-30 15:07:47 UTC
http://gerrit.ovirt.org/#/c/7638/

Comment 8 Dafna Ron 2012-09-19 12:09:10 UTC
verified on si18
migration started only after return getCapabilities
backend log shows that we are waiting for reinitialize: 
2012-09-19 14:59:59,147 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-53) [5c548fd0] Waiting to Host gold-vdsc to finish initialization for 50 Sec.