Bug 851840 - ovirt-engine-backend: we are trying to migrate vm's although GetCapabilitiesVDS returns with Recovering from crash or Initializing on NFS storage type
ovirt-engine-backend: we are trying to migrate vm's although GetCapabilitiesV...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Omer Frenkel
Dafna Ron
virt
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-26 07:59 EDT by Dafna Ron
Modified: 2012-12-04 15:07 EST (History)
9 users (show)

See Also:
Fixed In Version: si18
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 15:07:22 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (1.23 MB, application/x-gzip)
2012-08-26 07:59 EDT, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2012-08-26 07:59:37 EDT
Created attachment 607055 [details]
logs

Description of problem:

I blocked the storage domain from the host using iptables with storage type NFS.
backend attempts to migrate the vm's although GetCapabilitiesVDS returns with Error: VDSRecoveringException: Recovering from crash or Initializing
since vdsm is not responding yet the migration will fail so there is no point in sending MigrateBrokerVDSCommand until vdsm is responding. 

Version-Release number of selected component (if applicable):

si15
vdsm-4.9.6-30.0.el6_3.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create a setup with a two hosts cluster on NFS storage with running+writing vm's running in the SPM host. 
2. block connectivity to the Storage domain from the SPM
3.
  
Actual results:

backend is sending GetCapabilitiesVDS to the host which returns with Error since the vdsm is reinitializing and yet we are still sending a MigrateBrokerVDSCommand on all vm's (which will fail since vdsm is not responding). 

Expected results:

as long as we are getting error on GetCapabilitiesVDS there is no point in migrating. 

Additional info: logs 

2012-08-26 14:40:04,913 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-17) [56410677] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :  ID: 9c588ba2-ec35-11e1-a1e6-0
01a4a169741 Type: VDS
2012-08-26 14:40:04,913 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-17) [56410677] START, SetVdsStatusVDSCommand(vdsId = 9c588ba2-ec35-11e1-a1e6-001a4a169741, status=NonOperational, nonOperatio
nalReason=TIMEOUT_RECOVERING_FROM_CRASH), log id: 33f42fc3
2012-08-26 14:40:04,916 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-17) [56410677] FINISH, SetVdsStatusVDSCommand, log id: 33f42fc3
2012-08-26 14:40:05,146 INFO  [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (pool-4-thread-44) [3cb2804a] Running command: InternalMigrateVmCommand internal: true. Entities affected :  ID: 50737895-2cee-42aa-8aaf-734e7891a99b Typ
e: VM
2012-08-26 14:40:05,156 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (pool-4-thread-44) [3cb2804a] START, MigrateVDSCommand(vdsId = 9c588ba2-ec35-11e1-a1e6-001a4a169741, vmId=50737895-2cee-42aa-8aaf-734e7891a99b, srcHost=go
ld-vdsd.qa.lab.tlv.redhat.com, dstVdsId=35b5ed18-ed2a-11e1-b9a6-001a4a169741, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 5c00ed23
2012-08-26 14:40:05,157 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-44) [3cb2804a] VdsBroker::migrate::Entered (vm_guid=50737895-2cee-42aa-8aaf-734e7891a99b, srcHost=gold-vdsd.qa.lab.tlv.redh
at.com, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321,  method=online
2012-08-26 14:40:05,157 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (pool-4-thread-44) [3cb2804a] START, MigrateBrokerVDSCommand(vdsId = 9c588ba2-ec35-11e1-a1e6-001a4a169741, vmId=50737895-2cee-42aa-8aaf-73
4e7891a99b, srcHost=gold-vdsd.qa.lab.tlv.redhat.com, dstVdsId=35b5ed18-ed2a-11e1-b9a6-001a4a169741, dstHost=nott-vds2.qa.lab.tlv.redhat.com:54321, migrationMethod=ONLINE), log id: 6536193f
2012-08-26 14:40:05,178 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-44) [3cb2804a] Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         99
mMessage                      Recovering from crash or Initializing
Comment 2 Michal Skrivanek 2012-08-30 06:50:30 EDT
is this really regression?
Comment 3 Haim 2012-08-30 06:54:41 EDT
(In reply to comment #2)
> is this really regression?

not sure. removing this flag till proven otherwise.
Comment 4 Omer Frenkel 2012-08-30 09:57:36 EDT
[removed regression as this behaviour hasn't changed]
Comment 5 Omer Frenkel 2012-08-30 11:07:47 EDT
http://gerrit.ovirt.org/#/c/7638/
Comment 8 Dafna Ron 2012-09-19 08:09:10 EDT
verified on si18
migration started only after return getCapabilities
backend log shows that we are waiting for reinitialize: 
2012-09-19 14:59:59,147 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-53) [5c548fd0] Waiting to Host gold-vdsc to finish initialization for 50 Sec.

Note You need to log in before you can comment on or make changes to this bug.