Description of problem: I've migrated a VM from host A to host B. Regretfully, migration failed (due to timeout). This has caused the VM on the destination to exit: Thread-146826::DEBUG::2011-04-17 14:55:44,912::vm::1434::vds.vmlog.514e0257-3f28-4f39-9a44-d2b786146675::Changed state to Down: Migration failed The event on the RHEVM is: VM <vmname> is down. Exit message Migration failed. while technically it is somewhat correct (the destination is down), the reality is that it should be still up (and hopefully running!) on the source! The message, as is, is quite confusing and alarming. Version-Release number of selected component (if applicable): 2.2.7 How reproducible: Steps to Reproduce: 1. Cause migration to fail due to timeout (make the VM do a lot of IO, for example, or memory scanning). 2. 3. Actual results: Expected results: Additional info:
When running a VM on a host RHEVM collects statistics from which it learns the status of the VM. ATM VDSM returns on Down Vms one of two exit status: Normal / ERROR, and for error adding a String with exit message. For having a special message on migration failed, RHEVM+VDSM need to support another exit code or have obligation that the message can not be changed and RHEVM can base logic on the content of the message. I personally prefer new exit code ERROR_ON_MIGRATION Anyway moving to RFE to support either of the two options.
(In reply to comment #0) > while technically it is somewhat correct (the destination is down), the reality > is that it should be still up (and hopefully running!) on the source! > The message, as is, is quite confusing and alarming. > Kaul, as I understand this is the case, IE the source machine is up and running as it should. The problem is the event that declares the machine went down, while the status in the GUI returns to up - right? (In reply to comment #1) > When running a VM on a host RHEVM collects statistics from which it learns the > status of the VM. > > ATM VDSM returns on Down Vms one of two exit status: Normal / ERROR, and for > error adding a String with exit message. > > For having a special message on migration failed, RHEVM+VDSM need to support > another exit code or have obligation that the message can not be changed and > RHEVM can base logic on the content of the message. > I personally prefer new exit code ERROR_ON_MIGRATION > > Anyway moving to RFE to support either of the two options. Livnat, if Kaul's response to my question is positive the problem is that RHEV Manager collects the status from both destination as source hosts and while encountering the down message from the destination it logs it, even though the VM itself is up and running in the source. In this case it is a bug and not an RFE. The backend should cross this message from the destination with the status in the source and conclude migration failure. Unless you are not maintaining a thread that follows migration from start to finish.
while waiting on engine side refactoring, comment #1 makes sense to implement independently. Once we have a differentiation in VM's exit code the engine side change would be trivial to do
seems related: https://bugzilla.redhat.com/show_bug.cgi?id=557125 , considering the amendements suggested by Federico and implemented in the last patchset: http://gerrit.ovirt.org/#/c/22631/5/
This independent fix: http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=68aba2b12b90a997cee0f1e0221eb6f48eb8fd35 for this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1104195 should have solved this problem as well. Moving to MODIFIED and abandoning my patch.
fixed in vt3, moving to on_qa. if you believe this bug isn't released in vt3, please report to rhev-integ
Verified in rhevm-3.5.0-0.14.beta.el6ev.noarch (vt5). Verification steps follow the reproducer: 1. Have a VM with high CPU & IO load. 2. Start migration of the VM from host A to host B. Result: Migrations is cancelled due to timeout. VM event message says: "Migration failed due to Error: Migration not in progress (VM: user-vm01, Source: A, Destination: B)." The VM then remains running on the source host A.
RHEV-M 3.5.0 has been released