Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 909932

Summary: ovirt-engine-backend: VM are changing theirs status to UNKNOWN during migration
Product: Red Hat Enterprise Virtualization Manager Reporter: Oded Ramraz <oramraz>
Component: ovirt-engineAssignee: Martin Pavlik <mpavlik>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, dyasny, gklein, iheim, lpeer, michal.skrivanek, mpavlik, ofrenkel, rgolan, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: sf13.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs none

Description Oded Ramraz 2013-02-11 13:32:11 UTC
Description of problem:

The usual status flow during migration is Up -> Migrating -> Up . 
Recently I noticed a flow change and some VM's are changing theirs status to UNKNOWN : Up -> Migrating -> UNKNOWN -> Up
( see Attached logs ) 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Oded Ramraz 2013-02-11 13:37:50 UTC
Created attachment 696086 [details]
engine and vdsm logs

Comment 3 Roy Golan 2013-02-12 08:52:01 UTC
we always set the VM status to UKNOWN once the src VM is deleted from the src host cache. next in the flow will be the VM transited to UP

I guess you will sometime see it now because the time between the UNKNOWN and the UP is variable and subjected to performance of the engine, db and so.

suggesting to first remove the 'regression' flag. 

is there any issue with an admin seeing a migrated VM for a fraction in UKNOWN status?

Comment 4 Michal Skrivanek 2013-02-12 14:21:50 UTC
would it be possible to wait one more poll-period to actually set it to UNKNOWN? This should pretty much handle this case. Any drawbacks?

Comment 5 Simon Grinberg 2013-02-12 16:38:55 UTC
(In reply to comment #3)
> is there any issue with an admin seeing a migrated VM for a fraction in
> UKNOWN status?

Yes there is, it raises questions.
You'll get needless support tickets asking about the reason, and we'll have to explain over and over again why. 

> we always set the VM status to UKNOWN once the src VM is deleted from the src host cache. next in the flow will be the VM transited to UP

No , at some point is was not like it since I specifically remember few years back when I've complained (it was going through 'down' back then) and the agreement was to wait until the VM is detected on the destination host or migration fail indication before actually changing the status.

Comment 6 Roy Golan 2013-02-13 15:21:23 UTC
here are some options:

1. move the status to MigratingTo 
PROS: clearer status for this specific scenario
CONS: we take a risk of the host going to NonResponsive and the VM will stay MigratingTo. we need to add treatment to respect both UKNOWN and MigratingTo

2. add context(reason if you like) to the status and use is for display
 VM {
     status : UNKOWN
     reason : MIGRATION_HAND_OVER
    }

PROS: less chances for regressions
CONS: extending the entity with another field just for this specific scenario
 - we may use it for more stuff?

3. new status field
PROS: ? - don't feel strong about this...
CONS: risk of regressions. needs to refactor all places to treat/ignore it

Comment 7 Simon Grinberg 2013-02-17 19:41:42 UTC
Let's start with (In reply to comment #6)
> here are some options:
> 
> 1. move the status to MigratingTo 
> PROS: clearer status for this specific scenario
> CONS: we take a risk of the host going to NonResponsive and the VM will stay
> MigratingTo. we need to add treatment to respect both UKNOWN and MigratingTo
> 

Let's do this, low probability of happening, and there should be no problem to move the VM to unknown once the host moved to non-responsive

Comment 8 Roy Golan 2013-02-26 13:53:25 UTC
(In reply to comment #7)
> Let's start with (In reply to comment #6)
> > here are some options:
> > 
> > 1. move the status to MigratingTo 
> > PROS: clearer status for this specific scenario
> > CONS: we take a risk of the host going to NonResponsive and the VM will stay
> > MigratingTo. we need to add treatment to respect both UKNOWN and MigratingTo
> > 
> 
> Let's do this, low probability of happening, and there should be no problem
> to move the VM to unknown once the host moved to non-responsive

I'm seeing a risk of many regression in my solution because we need to handle MigrationTo and Uknown in every flow now. 
solution 2 looks less risky (although it fells hackish) but will actually preserve the behavior. 
I've started working also on re-factoring those hard parts of the migration code to be more friendly for changes.

Comment 9 Michal Skrivanek 2013-02-26 14:02:40 UTC
how about internally do 2 and at GUI level handle a specific case of UNKNOWN with MIGRATION reason in a different way, just to show something like "about to start" or "migrating to". Just in the first pass. Once we get another update from the host it should be Up or a failure - and we can show the real status
Simon?

Comment 10 Simon Grinberg 2013-03-06 19:04:19 UTC
(In reply to comment #9)
> how about internally do 2 and at GUI level handle a specific case of UNKNOWN
> with MIGRATION reason in a different way, just to show something like "about
> to start" or "migrating to". Just in the first pass. Once we get another
> update from the host it should be Up or a failure - and we can show the real
> status
> Simon?

Do what you feel best as long as both REST and GUI will not present Unknown while it may be just waiting for the handover. This may cause questions from GUI users, and worse for scripts that are not aware of this and may try recovery actions.

Comment 11 Roy Golan 2013-04-17 08:36:22 UTC
just to clarify what the solution is:

for a fraction, the VM handover period where the VM on engine is started to be monitored on it new, destination Host, the user might see that the VM status and host is changed from "Host: HostA, Status: Migrating from" to "Host: Host B, Status: Migrating To"

Comment 12 Martin Pavlik 2013-04-17 09:13:46 UTC
(In reply to comment #11)
> just to clarify what the solution is:
> 
> for a fraction, the VM handover period where the VM on engine is started to
> be monitored on it new, destination Host, the user might see that the VM
> status and host is changed from "Host: HostA, Status: Migrating from" to
> "Host: Host B, Status: Migrating To"

works as described on SF13.1

Comment 13 Itamar Heim 2013-06-11 08:55:30 UTC
3.2 has been released

Comment 14 Itamar Heim 2013-06-11 08:55:34 UTC
3.2 has been released

Comment 15 Itamar Heim 2013-06-11 08:55:35 UTC
3.2 has been released

Comment 16 Itamar Heim 2013-06-11 08:58:22 UTC
3.2 has been released

Comment 17 Itamar Heim 2013-06-11 09:27:54 UTC
3.2 has been released