Bug 1377994 - Migration stuck in "Migrating From" or "Migrating To" state
Summary: Migration stuck in "Migrating From" or "Migrating To" state
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.0.3
Hardware: x86_64
OS: Linux
high
medium vote
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-21 09:33 UTC by Vladimir Rulev
Modified: 2016-11-11 09:14 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-11 09:14:20 UTC
oVirt Team: Virt
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
Screenshot of running tasks (25.43 KB, image/png)
2016-09-22 08:43 UTC, Vladimir Rulev
no flags Details
Sreenshot of running task detail (45.44 KB, image/png)
2016-09-22 08:44 UTC, Vladimir Rulev
no flags Details
vdsm log from host icgs-hv1 (destination) (196.14 KB, application/x-gzip)
2016-09-22 08:48 UTC, Vladimir Rulev
no flags Details
vdsm log from host icgs-hv2 (source) (136.62 KB, application/x-gzip)
2016-09-22 08:49 UTC, Vladimir Rulev
no flags Details
engine log (7.37 KB, application/x-gzip)
2016-09-22 08:50 UTC, Vladimir Rulev
no flags Details

Description Vladimir Rulev 2016-09-21 09:33:59 UTC
Description of problem:
Eventually migration of VM stuck in "Migrating From" or "Migrating To" state. In 1st case migrating VM continues to run on source host. In 2nd case VM actually migrates to destination host (verified by vdsClient -s 0 list table). State of migrating VM keeps until I restart ovirt-engine.

Version-Release number of selected component (if applicable):
Hosted engine setup, 2 hosts.

Hosts software:
OS Version: RHEL - 7 - 2.1511.el7.centos.2.10
OS Description: CentOS Linux 7 (Core)
Kernel Version: 3.10.0 - 327.28.3.el7.x86_64
KVM Version: 2.3.0 - 31.el7.16.1
LIBVIRT Version: libvirt-1.2.17-13.el7_2.5
VDSM Version: vdsm-4.18.11-1.el7.centos
SPICE Version: 0.12.4 - 15.el7_2.1
GlusterFS Version: glusterfs-3.7.15-1.el7
CEPH Version: librbd1-0.80.7-3.el7

Hosted engine:
ovirt-engine-4.0.3-1.el7.centos.noarch

Initial setup used ovirt 3.5, than upgraded to 3.6, 4.0, 4.0.3.

How reproducible:
Always

Steps to Reproduce:
1. Migrate VM from one host to another and back several times.
2. Eventually VM stuck in "Migrating ..." state.
3. Restart ovirt-engine service.
4. Ensure VM state is UP.

Expected results:
VM should migrate to another host and status should be updated accordingly.

Additional info:

Comment 1 Michal Skrivanek 2016-09-22 04:59:01 UTC
can you please attach vdsm.logs from source and destination and some description of a specific attempt so it can be identified?
Thanks!

Comment 2 Vladimir Rulev 2016-09-22 08:42:51 UTC
Hi Michal!

It is strange, but I can not reproduce my issue today. I observed it for 2 days before. Today I tried to migrate my VM from one host to another several times, one by one and simultaneously and migrations was successful.

Eventually I tried to switch one of my host (icgs-hv2), running all VMs, to maintenance. VMs migrates to another host (icgs-hv1) successfuly, but host stuck in "Preparing for maintenance" state. Task "Moving host to maintenance" has not been completed. Several new tasks "Moving host to maintenance" appeared in task list (please see screenshot attached), but I don't started those tasks. In original task details there is few "Moving VM" subtasks in progress (see another screenshot). Original task was started at 11:01:17.

I think my issue is related to ovirt-engine or DB, not to vdsm.

Comment 3 Vladimir Rulev 2016-09-22 08:43:48 UTC
Created attachment 1203632 [details]
Screenshot of running tasks

Comment 4 Vladimir Rulev 2016-09-22 08:44:31 UTC
Created attachment 1203633 [details]
Sreenshot of running task detail

Comment 5 Vladimir Rulev 2016-09-22 08:48:57 UTC
Created attachment 1203634 [details]
vdsm log from host icgs-hv1 (destination)

Comment 6 Vladimir Rulev 2016-09-22 08:49:46 UTC
Created attachment 1203635 [details]
vdsm log from host icgs-hv2 (source)

Comment 7 Vladimir Rulev 2016-09-22 08:50:12 UTC
Created attachment 1203636 [details]
engine log

Comment 8 Vladimir Rulev 2016-09-23 10:09:58 UTC
Hi!

Task "Moving host to maintenance" has been completed after destroying VMs on icgs-hv2 host by vdsClient. Before destroying this VMs was in down state.

Comment 9 Tomas Jelinek 2016-11-10 13:46:43 UTC
Hi,

I see 2 strange things in the logs:
1: you have an old guest agent (that should not have any influence on the migration, but you see this "Invalid or unknown guest architecture type 'i686' received from guest agent" caused by https://bugzilla.redhat.com/show_bug.cgi?id=1332723)

2: all migrations end with:
2016-09-22 11:01:53,441 WARN  [org.ovirt.engine.core.bll.ChangeVMClusterCommand] (DefaultQuartzScheduler7) [8a727a8] Validation of action 'ChangeVMCluster' failed for user SYSTEM. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__VM__CLUSTER,VM_CLUSTER_IS_NOT_VALID

which seems more suspicious.

So, some questions:
- are you migrating your VMs to a different cluster when this happens?
- by any chance does this happen only when you do a migration using the REST API or normally using the webadmin?

Comment 10 Vladimir Rulev 2016-11-11 07:18:57 UTC
Hi!

No, I have only one cluster in this setup.
I tryed migration only using webadmin.

Unfortunately I rebuilt this setup from scratch and can not reproduce this issue more.

Comment 11 Michal Skrivanek 2016-11-11 09:14:20 UTC
Well, I guess that's good news. 
I've only seen another minor issue in logs about migration stats monitoring fixed in 4.0.4, so not relevant any more either

Please feel free to reopen if you see it again


Note You need to log in before you can comment on or make changes to this bug.