Bug 1478518 - CFME reports VM migration passed when it fails on RHV side
Summary: CFME reports VM migration passed when it fails on RHV side
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: GA
: 5.8.3
Assignee: Piotr Kliczewski
QA Contact: Ilanit Stein
URL:
Whiteboard: vm:migrate:rhev
Depends On: 1448023
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-04 18:08 UTC by Satoe Imaishi
Modified: 2022-07-09 09:07 UTC (History)
11 users (show)

Fixed In Version: 5.8.3.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1448023
Environment:
Last Closed: 2018-02-28 13:04:21 UTC
Category: ---
Cloudforms Team: RHEVM
Target Upstream Version:
Embargoed:
istein: needinfo-


Attachments (Terms of Use)
engine.log (212.68 KB, application/x-gzip)
2018-01-23 12:06 UTC, Ilanit Stein
no flags Details
evm.log (334.92 KB, application/x-gzip)
2018-01-23 12:07 UTC, Ilanit Stein
no flags Details
automation.log (30.69 KB, application/x-gzip)
2018-01-23 12:08 UTC, Ilanit Stein
no flags Details
rhevm.log (30.35 KB, application/x-gzip)
2018-01-23 12:09 UTC, Ilanit Stein
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ManageIQ manageiq-providers-ovirt pull 182 0 None None None 2017-12-22 12:46:35 UTC
Red Hat Bugzilla 1537953 0 medium CLOSED [RHV] Unclear error further to VM migration failure. 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2018:0374 0 normal SHIPPED_LIVE Important: Red Hat CloudForms security, bug fix, and enhancement update 2018-02-28 18:04:37 UTC

Internal Links: 1537953

Comment 2 CFME Bot 2017-08-04 18:13:46 UTC
New commit detected on ManageIQ/manageiq-content/fine:
https://github.com/ManageIQ/manageiq-content/commit/c511fc7b1b0380fc5de858045b2e5f8667aebcae

commit c511fc7b1b0380fc5de858045b2e5f8667aebcae
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Fri Jun 23 14:51:02 2017 -0400
Commit:     Satoe Imaishi <simaishi>
CommitDate: Fri Aug 4 14:09:09 2017 -0400

    Merge pull request #135 from billfitzgerald0120/check_migrate
    
    Updated vm migration to report when an error occurs.
    (cherry picked from commit 61793a6a2a1e7aa0b7061d6986a27af84d4052ff)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1478518

 .../Methods.class/__methods__/checkmigration.rb    | 68 ++++++++++++-----
 .../__methods__/checkmigration_spec.rb             | 88 ++++++++++++++++++++++
 2 files changed, 139 insertions(+), 17 deletions(-)
 create mode 100644 spec/content/automate/ManageIQ/Infrastructure/VM/Migrate/StateMachines/Methods.class/__methods__/checkmigration_spec.rb

Comment 3 Ilanit Stein 2017-09-03 11:16:55 UTC
Tested on CFME-5.8.2/RHV-4.1.3:

Started VM migration from CFME side.
After RHV UI showed VM is migrating, destination host was rebooted, 
and in RHV events, it was reported that migration failed.

On CFME side,
Shortly after VM migration was started, the Request state turned into "Migrated" - Which is not true.
and the request few minutes later failed on email send.
Request details:

====================================== 
Status Ok
Request State Migrated
Requester Administrator
Request Type VM Migrate
Description VM Migrate for: <vm name> - Host: host_mixed_2
Last Message Server [EVM] VM [cfme_5815_bug_1478108] Step [EmailOwner] Status [Error Emailing Owner] Message [Emailing Owner]
Created On Sun, 03 Sep 2017 10:34:07 +0000
Last Update Sun, 03 Sep 2017 10:36:37 +0000
Approval State Approved
Approved/Denied by admin (Administrator)
Approved/Denied on Sun, 03 Sep 2017 10:34:28 +0000
Reason Auto-Approved
====================================== 

As Migration failed on RHV side, on a problem occurred during the VM migration,
the expected behavior is that on CFME side, the Migration VM request will fail, 
and the Migration error will be propagated from RHV side to CFME side.
Thus, moving bug back to assigned.

Comment 4 Oved Ourfali 2017-09-18 07:51:41 UTC
William, can you take a look?

Comment 5 William Fitzgerald 2017-09-18 14:30:54 UTC
Oved,

Can you re-create this and let me have access to your appliance? 

Thanks

Billy

Comment 6 Ilanit Stein 2017-09-26 12:10:30 UTC
Recreated migration failure, and sent details in private.

Comment 7 William Fitzgerald 2017-09-26 13:29:07 UTC
llanit,

You are getting an error trying to send the email.

You need this fix.   https://github.com/ManageIQ/manageiq-content/pull/177

Thanks

Billy

Comment 8 Ilanit Stein 2017-09-26 14:03:03 UTC
The Request state is updated to Migrated, status: Ok,
while VM was still migrating (seen on RHV UI),
before getting the last message on email error.

Therefore, I would suspect that even if we'll have that email send fix,
We'll still have the same behavior: From CFME side seem as if VM Migrated, while migration actually failed.

Comment 9 William Fitzgerald 2017-09-26 16:30:53 UTC
I added a billy domain and I commented out the email.  Can you try it again ?

This should have the same affect as applying the fix: https://github.com/ManageIQ/manageiq-content/pull/177


Thanks

Billy

Comment 10 William Fitzgerald 2017-09-26 21:39:33 UTC
Automate thinks the migration is good

[----] I, [2017-09-26T13:32:36.197916 #18149:1269340]  INFO -- : Q-task_id([vm_migrate_task_46]) <AEMethod checkmigration> CheckMigration returned <ok> for state <migrated> and status <Ok>


And 29 seconds later, I see this Error in the evm.log


log/evm.log:[----] I, [2017-09-26T13:32:55.620861 #28993:126f13c]  INFO -- : MIQ(MiqQueue.put) Message id: [253685],  id: [], Zone: [default], Role: [event], Server: [], Ident: [ems], Target id: [14], Instance id: [], Task id: [], Command: [EmsEvent.add], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [{:event_type=>"VM_MIGRATION_FAILED_FROM_TO", :source=>"RHEVM", :message=>"Migration failed  (VM: cfme_5.8.2_Juan, Source: host_mixed_2, Destination: host_mixed_1).", :timestamp=>2017-09-26 13:32:39 -0400, :username=>"admin@internal-authz", :full_data=>{:id=>"966647", :href=>"/ovirt-engine/api/events/966647", :cluster=>{:id=>"e8271a4c-ad88-4bd7-8c46-f7c9efde5e9c", :href=>"/ovirt-engine/api/clusters/e8271a4c-ad88-4bd7-8c46-f7c9efde5e9c"}, :data_center=>{:id=>"56099573-9402-4da4-8c84-c8decdc94a95", :href=>"/ovirt-engine/api/datacenters/56099573-9402-4da4-8c84-c8decdc94a95"}, :host=>{:id=>"3b2f783a-0168-4629-b8ef-721bd3517b2b", :href=>"/ovirt-engine/api/hosts/3b2f783a-0168-4629-b8ef-721bd3517b2b"}, :template=>{:id=>"0d514194-19a9-426b-a123-36962c1de240", :href=>"/ovirt-engine/api/templates/0d514194-19a9-426b-a123-36962c1de240"}, :user=>{:id=>"5947acbb-0172-0324-016a-000000000217", :href=>"/ovirt-engine/api/users/5947acbb-0172-0324-016a-000000000217"}, :vm=>{:id=>"f1ce76e5-bb73-4aa9-82f4-7473418cfab5", :href=>"/ovirt-engine/api/vms/f1ce76e5-bb73-4aa9-82f4-7473418cfab5"}, :description=>"Migration failed  (VM: cfme_5.8.2_Juan, Source: host_mixed_2, Destination: host_mixed_1).", :severity=>"error", :code=>120, :time=>2017-09-26 13:32:39 -0400, :name=>"VM_MIGRATION_FAILED_FROM_TO"}, :ems_id=>14, :vm_ems_ref=>"/api/vms/f1ce76e5-bb73-4aa9-82f4-7473418cfab5", :host_ems_ref=>"/api/hosts/3b2f783a-0168-4629-b8ef-721bd3517b2b", :ems_cluster_ems_ref=>"/api/clusters/e8271a4c-ad88-4bd7-8c46-f7c9efde5e9c"}]

It appears that we are not waiting for the migration to finish successfully.
  
Still looking into this ...

Thanks

Billy

Comment 11 William Fitzgerald 2017-10-02 19:04:12 UTC
The state machine is now correct but if the migration fails the state machine doesn't know it failed.  This probably needs to go to the providers group.

Comment 13 Ilanit Stein 2017-10-24 06:51:41 UTC
for the record adding answer to the need info in comment #9,
that was answered offline:

I tested RHV VM migration with the email fix, mentioned in comment #9.
Now it is passing the email stage, and end up with Last message: "[EVM] VM Migrated Successfully".
On RHV side, the VM migration failed, since I rebooted the destination host, while VM was migrating.
It should fail also on CFME side.

Comment 14 CFME Bot 2018-01-10 14:31:17 UTC
New commit detected on ManageIQ/manageiq/fine:
https://github.com/ManageIQ/manageiq/commit/c69473642892a0ffb30fc0c797196e99177dee0f

commit c69473642892a0ffb30fc0c797196e99177dee0f
Author:     Greg McCullough <gmccullo>
AuthorDate: Thu Dec 21 09:30:51 2017 -0500
Commit:     Satoe Imaishi <simaishi>
CommitDate: Wed Jan 10 09:25:15 2018 -0500

    Merge pull request #16705 from bzwei/migrate_with_error
    
    Rescue migration error and update status
    (cherry picked from commit 805e02333a8bfa3b697b691057d1d0a506b43037)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1478518

 app/models/vm_migrate_task.rb       | 14 ++++++++++----
 spec/models/vm_migrate_task_spec.rb | 18 ++++++++++++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 spec/models/vm_migrate_task_spec.rb

Comment 15 CFME Bot 2018-01-15 16:11:43 UTC
New commit detected on ManageIQ/manageiq/fine:
https://github.com/ManageIQ/manageiq/commit/323eabede56ccd804ab5ef35fcb859ba72f0e16d

commit 323eabede56ccd804ab5ef35fcb859ba72f0e16d
Author:     Moti Asayag <masayag>
AuthorDate: Wed Jan 10 16:05:35 2018 +0200
Commit:     Satoe Imaishi <simaishi>
CommitDate: Mon Jan 15 11:03:29 2018 -0500

    Merge pull request #182 from pkliczewski/master
    
    Block migration call
    (cherry picked from commit bc7972b4ba1c85bb8b66340e7c7d656e2f35dded)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1478518

 .../manageiq/providers/redhat/infra_manager.rb     | 18 +++++++++++++-
 .../providers/redhat/infra_manager_spec.rb         | 29 ++++++++++++++++++++++
 2 files changed, 46 insertions(+), 1 deletion(-)

Comment 16 Ilanit Stein 2018-01-17 13:41:37 UTC
Tested on CFME-5.8.3.1/rhv-4.1.8

From CFME side:
Run VM migration.
Reboot destination host, while migration is running.
The migration Request failed on: 

"Server [EVM] VM [test-scat-uz4w] Step [CheckMigration] Status [Error Migrating VM] Message [Error: Validation failed: Status is not included in the list]"

Piotr,

The VM migration failure on RHV side, is now reflected on CFME,
however the error contain "Status is not included in the list" -
Can that be changed?

Thanks.

Comment 17 Piotr Kliczewski 2018-01-17 17:33:18 UTC
I am not sure from where this message is coming. Will check whether it is possible.

Comment 18 Ilanit Stein 2018-01-23 11:54:08 UTC
Retested on CFME-5.8.3.1/rhv-4.1.8:

Case 1:
======
Test:
While VM migration was running, destination host was powered OFF.
On RHV UI, it was seen that VM migration actually started, 
and the VM's host was already switched to the destination host.

This is actually interrupting the VM migration in a __later__ stage (where VM was already passed to destination host).

Results:
On RHV logs, there wasn't a VM migration failure, but (destination) host failure (that was powered off).

On CFME, 
after ~10 min, this error appeared in the VM migration request (under Services-> Requests):

"Server [EVM] VM [golden_env_mixed_virtio_0] Step [CheckMigration] Status [Error Migrating VM] Message [Error: timed out after 600.008622942 seconds. Timeout threshold [600]]"

Case 2:
======
Test:
While VM migration was running, destination host was powered OFF, and then powered back ON.
On RHV UI, it was seen that VM migration actually started, 
BUT the VM's host was yet switched to the destination host.

This is actually interrupting the VM migration in an __early__ stage (where VM was not yet passed to destination host).

Results:
On RHV logs, there was a VM migration failure.

On CFME, 
after ~10 min, this error appeared in the VM migration request (under Services-> Requests):

"Server [EVM] VM [cfme_57_vm_0] Step [CheckMigration] Status [Error Migrating VM] Message [Error: Validation failed: Status is not included in the list]"

** Logs attached.

Piotr,

From the testing detailed on comment 16, RHV didn't reflect VM migration even started, so I was under the assumption that we got the unclear error message:
"Error: Validation failed: Status is not included in the list"
since the VM migration didn;t even start, 

However, case 2, above, show that even when on RHV side, VM migration was STARTED and failed, we still get on CFME side, this SAME unclear error.

Would you please check if this error message can be found, and maybe changed to something more clear?

thanks.

Comment 19 Ilanit Stein 2018-01-23 12:06:47 UTC
Created attachment 1384815 [details]
engine.log

See "case 1" VM migration started January 23 @12:08 of VM "golden_env_mixed_virtio_0", and ended @12:18.

See "case 2" VM migration started January 23 @12:22 of VM 
"cfme_57_vm_0", and ended @12:24

Comment 20 Ilanit Stein 2018-01-23 12:07:51 UTC
Created attachment 1384816 [details]
evm.log

See "case 1" VM migration of VM "golden_env_mixed_virtio_0".
See "case 2" VM migration of VM "cfme_57_vm_0".

Comment 21 Ilanit Stein 2018-01-23 12:08:17 UTC
Created attachment 1384818 [details]
automation.log

See "case 1" VM migration of VM "golden_env_mixed_virtio_0".
See "case 2" VM migration of VM "cfme_57_vm_0".

Comment 22 Ilanit Stein 2018-01-23 12:09:47 UTC
Created attachment 1384819 [details]
rhevm.log

Comment 23 Ilanit Stein 2018-01-24 07:33:10 UTC
Moving bug to Verified as on CFME-5.8.3.1/rhv-4.1.8 VM migration that fail on RHV side, generate a failure on CFME side.

Opened a separate bug 1537953, for the unclear error mentioned above:
"[Error: Validation failed: Status is not included in the list]"

Comment 26 errata-xmlrpc 2018-02-28 13:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0374


Note You need to log in before you can comment on or make changes to this bug.