Bug 909932
| Summary: | ovirt-engine-backend: VM are changing theirs status to UNKNOWN during migration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Oded Ramraz <oramraz> | ||||
| Component: | ovirt-engine | Assignee: | Martin Pavlik <mpavlik> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.2.0 | CC: | acathrow, dyasny, gklein, iheim, lpeer, michal.skrivanek, mpavlik, ofrenkel, rgolan, Rhev-m-bugs, sgrinber, yeylon, ykaul | ||||
| Target Milestone: | --- | Keywords: | Regression | ||||
| Target Release: | 3.2.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | virt | ||||||
| Fixed In Version: | sf13.1 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | Bug | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Oded Ramraz
2013-02-11 13:32:11 UTC
Created attachment 696086 [details]
engine and vdsm logs
we always set the VM status to UKNOWN once the src VM is deleted from the src host cache. next in the flow will be the VM transited to UP I guess you will sometime see it now because the time between the UNKNOWN and the UP is variable and subjected to performance of the engine, db and so. suggesting to first remove the 'regression' flag. is there any issue with an admin seeing a migrated VM for a fraction in UKNOWN status? would it be possible to wait one more poll-period to actually set it to UNKNOWN? This should pretty much handle this case. Any drawbacks? (In reply to comment #3) > is there any issue with an admin seeing a migrated VM for a fraction in > UKNOWN status? Yes there is, it raises questions. You'll get needless support tickets asking about the reason, and we'll have to explain over and over again why. > we always set the VM status to UKNOWN once the src VM is deleted from the src host cache. next in the flow will be the VM transited to UP No , at some point is was not like it since I specifically remember few years back when I've complained (it was going through 'down' back then) and the agreement was to wait until the VM is detected on the destination host or migration fail indication before actually changing the status.
here are some options:
1. move the status to MigratingTo
PROS: clearer status for this specific scenario
CONS: we take a risk of the host going to NonResponsive and the VM will stay MigratingTo. we need to add treatment to respect both UKNOWN and MigratingTo
2. add context(reason if you like) to the status and use is for display
VM {
status : UNKOWN
reason : MIGRATION_HAND_OVER
}
PROS: less chances for regressions
CONS: extending the entity with another field just for this specific scenario
- we may use it for more stuff?
3. new status field
PROS: ? - don't feel strong about this...
CONS: risk of regressions. needs to refactor all places to treat/ignore it
Let's start with (In reply to comment #6) > here are some options: > > 1. move the status to MigratingTo > PROS: clearer status for this specific scenario > CONS: we take a risk of the host going to NonResponsive and the VM will stay > MigratingTo. we need to add treatment to respect both UKNOWN and MigratingTo > Let's do this, low probability of happening, and there should be no problem to move the VM to unknown once the host moved to non-responsive (In reply to comment #7) > Let's start with (In reply to comment #6) > > here are some options: > > > > 1. move the status to MigratingTo > > PROS: clearer status for this specific scenario > > CONS: we take a risk of the host going to NonResponsive and the VM will stay > > MigratingTo. we need to add treatment to respect both UKNOWN and MigratingTo > > > > Let's do this, low probability of happening, and there should be no problem > to move the VM to unknown once the host moved to non-responsive I'm seeing a risk of many regression in my solution because we need to handle MigrationTo and Uknown in every flow now. solution 2 looks less risky (although it fells hackish) but will actually preserve the behavior. I've started working also on re-factoring those hard parts of the migration code to be more friendly for changes. how about internally do 2 and at GUI level handle a specific case of UNKNOWN with MIGRATION reason in a different way, just to show something like "about to start" or "migrating to". Just in the first pass. Once we get another update from the host it should be Up or a failure - and we can show the real status Simon? (In reply to comment #9) > how about internally do 2 and at GUI level handle a specific case of UNKNOWN > with MIGRATION reason in a different way, just to show something like "about > to start" or "migrating to". Just in the first pass. Once we get another > update from the host it should be Up or a failure - and we can show the real > status > Simon? Do what you feel best as long as both REST and GUI will not present Unknown while it may be just waiting for the handover. This may cause questions from GUI users, and worse for scripts that are not aware of this and may try recovery actions. just to clarify what the solution is: for a fraction, the VM handover period where the VM on engine is started to be monitored on it new, destination Host, the user might see that the VM status and host is changed from "Host: HostA, Status: Migrating from" to "Host: Host B, Status: Migrating To" (In reply to comment #11) > just to clarify what the solution is: > > for a fraction, the VM handover period where the VM on engine is started to > be monitored on it new, destination Host, the user might see that the VM > status and host is changed from "Host: HostA, Status: Migrating from" to > "Host: Host B, Status: Migrating To" works as described on SF13.1 3.2 has been released 3.2 has been released 3.2 has been released 3.2 has been released 3.2 has been released |